Dynamic arrays, yet another pitfall

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Dynamic arrays, yet another pitfall

Jürgen Hestermann
There is another possible pitfall with dynamic arrays I trapped into which I would like to share.
Those who handle dynamic arrays in a virtuoso manner daily may skip this posting of course.
But I think for those who never used them before it may be of use.
Maybe it can be added to the Wiki (although again I had to make assumptions which may be wrong):



With the following declaration and code:

-------------------
var A,B: array of integer;
...
SetLength(A,10);
B := A;
SetLength(B,20);
-------------------

both variables A and B point to the same array with 20 Elements.
Changing A changes B and vice versa.
But a slight modification

-------------------
SetLength(A,0);
B := A;
SetLength(B,20);
-------------------

makes both variables A and B totaly decoupled! Although B is still assigned to be the same as A each variable is now a separate array with individual lengths and elements. Variable A has the length 0 and variable B is of length 20. Changing the length for one of them does no longer change the length of the other. If someone thinks about dynamic arrays as black boxes without the need to know the details because they are handled in the background then he will certainly be baffled by this.

Why is this so? Actually the variables A and B are just (4 byte) pointers to a structure which holds further information of the array (reference counter, length, pointer to the elements...). Setting the length of such a dynamic array to zero means that all data it pointed to will be freed and the pointer is set to NIL. Assigning this NIL-value to another variable of the same type sets this one to NIL too (after removing any potentially existing data of it). Changes for each of these variables no longer affect the other one because they do not point to the same structure anymore. Each has its own structure when adding elements with SetLength. This is especially a pitfall when the dynamic array is part of a complex structure and the reference to it should be given back by function or procedure parameters. It will work as long as the array has elements but fail when all elements have been (temporarily) removed!

Under the hood dynamic arrays are similar to pointers to records:

-------------------
type MyType = record
               ....
               end;
var A,B : ^MyType;
-------------------

After

-------------------
new(A);
B := A;
-------------------

B points to the same structure as A and changing A^ changed B^ too.
But after

-------------------
dispose(A);
A := nil;
B := A;
-------------------

A and B are both nil and

-------------------
new(A);
new(B);
-------------------

would create two different structures with each variable pointing to one of them. The same happens for dynamic arrays. Each length-command changes the structure where the pointer points to but when set to length zero it does not point to anything anymore (is NIL). Afterwards both variables are independend from another.

I think this is an important information about dynamic arrays which I was missing from the documentation.
Without this knowledge it can be hard to find subtle errors.


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

etrusco
On Sun, Feb 9, 2014 at 12:34 PM, Jürgen Hestermann
<[hidden email]> wrote:

> (...)
> With the following declaration and code:
>
> -------------------
> var A,B: array of integer;
> ...
> SetLength(A,10);
> B := A;
> SetLength(B,20);
> -------------------
>
> both variables A and B point to the same array with 20 Elements.
> Changing A changes B and vice versa.
> But a slight modification
>
> -------------------
> SetLength(A,0);
> B := A;
> SetLength(B,20);
> -------------------
>
> makes both variables A and B totaly decoupled! Although B is still assigned
> to be the same as A each variable is now a separate array with individual
> lengths and elements. Variable A has the length 0 and variable B is of
> length 20. Changing the length for one of them does no longer change the
> length of the other. If someone thinks about dynamic arrays as black boxes
> without the need to know the details because they are handled in the
> background then he will certainly be baffled by this.
> (...)

In other words: dynamic arrays are like AnsiStrings without the
copy-on-write semantics. I'd certainly wish Borland copied the COW
semantics :-/

-Flávio
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Sven Barth-2
In reply to this post by Jürgen Hestermann
On 09.02.2014 15:34, Jürgen Hestermann wrote:

> With the following declaration and code:
>
> -------------------
> var A,B: array of integer;
> ...
> SetLength(A,10);
> B := A;
> SetLength(B,20);
> -------------------
>
> both variables A and B point to the same array with 20 Elements.
> Changing A changes B and vice versa.

And again you did not provide a full compilable example. The following code

=== code begin ===

program tdynarrtest;

var
   a, b: array of LongInt;
begin
   SetLength(a, 10);
   b := a;
   SetLength(b, 20);
   Writeln(Length(a));
   Writeln(Length(b));
end.

=== code end ===

results in

=== output begin ===

10
20

=== output end ===

So exactly what I would expect from the COW mechanism.

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Sven Barth-2
In reply to this post by etrusco
On 09.02.2014 16:34, Flávio Etrusco wrote:
>
> In other words: dynamic arrays are like AnsiStrings without the
> copy-on-write semantics. I'd certainly wish Borland copied the COW
> semantics :-/

Dynamic arrays have full COW semantics. If Jürgen would have provided a
full compilable example we could check whether he has a bug in his own
code or there is a bug in the compiler as certainly the result of his
first code snipped must be length 10 for array "A" and length 20 for
array "B". His second example is as expected and as designed.

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by etrusco
Am 2014-02-09 16:34, schrieb Flávio Etrusco:
 > In other words: dynamic arrays are like AnsiStrings without the
 > copy-on-write semantics. I'd certainly wish Borland copied the COW
 > semantics :-/

Yes. Therefore the issue that I described for dynamic arrays cannot occur for ansistrings. But COW can also be a drawback. Imagine an ansistring within a record of a heavily pointer-connected deeply nested data structure and I want to write a function that gives me back a reference to this string:

---------------------------------
function FindMyStringToModify : AnsiString;
...
var A : AnsiString;
A := FindMyStringToModify;
A := 'New String';
---------------------------------

then the change to A will not affect the ansistring in the nested data structure (because of COW).
But this is possible for dynamic arrays *if* they are not empty.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by Sven Barth-2
Am 2014-02-09 17:47, schrieb Sven Barth:
> On 09.02.2014 16:34, Flávio Etrusco wrote:
>>
>> In other words: dynamic arrays are like AnsiStrings without the
>> copy-on-write semantics. I'd certainly wish Borland copied the COW
>> semantics :-/
>
> Dynamic arrays have full COW semantics.

Now I am fully confused.
It's hard to find out how things work in Free Pascal given the poor documentation.
This sentence from http://www.freepascal.org/docs-html/ref/refsu18.html is *very* misleading:

"Dynamic arrays are reference counted: assignment of one dynamic array-type variable to another will let both variables point to the same array. Contrary to ansistrings, an assignment to an element of one array will be reflected in the other: there is no copy-on-write."

Reference counted? Point to the same array? No copy-on-write?
It seems you cannot believe anything from the documentation.
You must question every statement.
*Everything* can only be determined by writing test programs.
I now found a similar posting 5 years ago regarding the same issue:

http://lists.freepascal.org/lists/fpc-pascal/2009-February/020198.html

Why ist this not part of the documentation?
So it seems there is a copy-on-write *but* only when using SetLength.
What a very consistent design!

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

patspiper
In reply to this post by Sven Barth-2
On 09/02/14 18:47, Sven Barth wrote:
> On 09.02.2014 16:34, Flávio Etrusco wrote:
>>
>> In other words: dynamic arrays are like AnsiStrings without the
>> copy-on-write semantics. I'd certainly wish Borland copied the COW
>> semantics :-/
>
> Dynamic arrays have full COW semantics.

It seems not:
    SetLength(A,10);
    A[0]:=33;
    B:=A;
    A[0]:=31;
    b[0]:=9;
    WriteLn(a[0], b[0]); // prints 9 and 9 (fpc 2.6.3)

> If Jürgen would have provided a full compilable example we could check
> whether he has a bug in his own code or there is a bug in the compiler
> as certainly the result of his first code snipped must be length 10
> for array "A" and length 20 for array "B". His second example is as
> expected and as designed.

True on both counts.

Regards,
Stephano
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Florian Klämpfl
In reply to this post by Jürgen Hestermann
Am 09.02.2014 18:41, schrieb Jürgen Hestermann:
> So it seems there is a copy-on-write *but* only when using SetLength.

No. There is no COW, only ref. counting. SetLength just forces an unique
instance of the array if needed.

> What a very consistent design!

There is a very good reason for this design: speed. COW would mean that
at every write to a dyn. array element, ref. count must be checked. This
would render dyn. arrays useless for a lot of applications due to poor
speed. So dyn. arrays with COW simply make no sense.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Sven Barth-2
In reply to this post by patspiper
On 09.02.2014 19:10, patspiper wrote:

> On 09/02/14 18:47, Sven Barth wrote:
>> On 09.02.2014 16:34, Flávio Etrusco wrote:
>>>
>>> In other words: dynamic arrays are like AnsiStrings without the
>>> copy-on-write semantics. I'd certainly wish Borland copied the COW
>>> semantics :-/
>>
>> Dynamic arrays have full COW semantics.
>
> It seems not:
>     SetLength(A,10);
>     A[0]:=33;
>     B:=A;
>     A[0]:=31;
>     b[0]:=9;
>     WriteLn(a[0], b[0]); // prints 9 and 9 (fpc 2.6.3)

Ehm yes... seems that I was mistaken a bit by the dynamic array
implementation. *blush*

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Michael Van Canneyt
In reply to this post by Florian Klämpfl


On Sun, 9 Feb 2014, Florian Klämpfl wrote:

> Am 09.02.2014 18:41, schrieb Jürgen Hestermann:
>> So it seems there is a copy-on-write *but* only when using SetLength.
>
> No. There is no COW, only ref. counting. SetLength just forces an unique
> instance of the array if needed.

The documentation explicitly mentions that:

'Dynamic arrays are reference counted: assignment of one dynamic array-type
variable to another will let both variables point to the same array.
Contrary to ansistrings, an assignment to an element of one array will
be reflected in the other: there is no copy-on-write.'

But I have added a paragraph about the "reset ref. count" behaviour of setlength.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by Florian Klämpfl
Am 2014-02-09 19:29, schrieb Florian Klämpfl:
 > Am 09.02.2014 18:41, schrieb Jürgen Hestermann:
 >> So it seems there is a copy-on-write *but* only when using SetLength.
 > No. There is no COW, only ref. counting. SetLength just forces an unique
 > instance of the array if needed.

But this *is* Copy-On-Write! When I set the length it is not different to changing elements.
In both cases I write to the array (modify it).
But suddenly I have a duplicate.
I would not expect such a behaviour as I don't get a dublicate when changing elements either.



 >> What a very consistent design!
 > There is a very good reason for this design: speed. COW would mean that
 > at every write to a dyn. array element, ref. count must be checked. This
 > would render dyn. arrays useless for a lot of applications due to poor
 > speed. So dyn. arrays with COW simply make no sense.

Yes. Therefore I was expecting that *never* a copy is made.
While Ansistrings *do* have COW dynamic arrays do *not* have it.
If that would be true then variables pointing to the same dynamic
array should do this forever (unless the programmer assigns a new array with := ).
Setlength should *not* be considered different to modifying elements.
For what reason? Both are writes to the array.

As this seems not to be true it should be mentinoned in the documentation with big exclamation marks!!
Otherwise *everybody* who is using dynamic arrays for the first time will assume Setlength beeing
treated the same as element changes. Why should someone consider these things as different?
This is a source of errors and a huge confusion as discussed here too:
http://www.delphitools.info/2011/06/15/poll-dynamic-arrays-as-reference-or-value-type
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by Michael Van Canneyt
Am 2014-02-10 09:13, schrieb Michael Van Canneyt:
 >> No. There is no COW, only ref. counting. SetLength just forces an unique
 >> instance of the array if needed.
 > The documentation explicitly mentions that:
 > 'Dynamic arrays are reference counted: assignment of one dynamic array-type
 > variable to another will let both variables point to the same array.
 > Contrary to ansistrings, an assignment to an element of one array will
 > be reflected in the other: there is no copy-on-write.'

But what do these sentences say?
They say that there is *no* copy-on-write.
So why does it suddenly do copy-on-write when using SetLength?
SetLength is a write to the array. It adds (or removes) elements.
So why do I get a copy? The documentation says that I will *not* get a copy (no COW).
So the documentation is wrong (or at least totaly misleading/incomplete).


 > But I have added a paragraph about the "reset ref. count" behaviour of setlength.

What has a reference counter to do with the behaviour I see?
If SetLength would just simply change the number of elements it would not change a reference counter.
So nothing else happens, just elements are added or removed.
The same as when I change an element.
At least that's what the documentation says (to me).
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Michael Van Canneyt


On Sat, 15 Feb 2014, Jürgen Hestermann wrote:

> Am 2014-02-10 09:13, schrieb Michael Van Canneyt:
>>> No. There is no COW, only ref. counting. SetLength just forces an unique
>>> instance of the array if needed.
>> The documentation explicitly mentions that:
>> 'Dynamic arrays are reference counted: assignment of one dynamic array-type
>> variable to another will let both variables point to the same array.
>> Contrary to ansistrings, an assignment to an element of one array will
>> be reflected in the other: there is no copy-on-write.'
>
> But what do these sentences say?
> They say that there is *no* copy-on-write.
Correct.

> So why does it suddenly do copy-on-write when using SetLength?

Copy-On-Write means that if you write to *an element* of the array,
the array is duplicated if it has a ref count>0. As with strings.

Since this does not happen for dynamic arrays, there is no copy on write.

That setlength behaves rather freakish for dynamic arrays, does not
mean we have copy-on-write.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann

Am 2014-02-15 19:41, schrieb Michael Van Canneyt:
> That setlength behaves rather freakish for dynamic arrays, does not mean we have copy-on-write.

Well, setlength *is* a write.
What else is it?
Doesn't it write to the array?
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Michael Van Canneyt


On Sun, 16 Feb 2014, Jürgen Hestermann wrote:

>
> Am 2014-02-15 19:41, schrieb Michael Van Canneyt:
>> That setlength behaves rather freakish for dynamic arrays, does not mean we
>> have copy-on-write.
>
> Well, setlength *is* a write.
> What else is it?
> Doesn't it write to the array?

Depends on how you define 'write to the array'.

"Writing to the array" = "Changing the value of one of the elements in the array."

In this sense, setlength does not write to the array, which would mean changing the value of
one or more of the elements in the array. It does not do that. It resizes the array and
the result is by definition a different array.

But taking the above definition, with "element" = "character", you see that ansistrings do
have copy-on-write.

It's largely a question of semantics, I suppose.

Michael.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Florian Klämpfl
In reply to this post by Michael Van Canneyt
Am 15.02.2014 19:41, schrieb Michael Van Canneyt:
>
> That setlength behaves rather freakish for dynamic arrays, does not
> mean we have copy-on-write.

setlength does not behave freaky but its behaviour is well designed. The
reason why setlength does a deep copy is simple: multithreading. If
setlength had no deep copy semantics, it would need locking of the whole
array data, not only locked access to the ref. counter.
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by Michael Van Canneyt
Am 2014-02-16 13:20, schrieb Michael Van Canneyt:
 > Depends on how you define 'write to the array'.
 > "Writing to the array" = "Changing the value of one of the elements in the array."
 > In this sense, setlength does not write to the array, which would mean changing the value of
 > one or more of the elements in the array. It does not do that. It resizes the array and
 > the result is by definition a different array.

But what else than writing is it to add or remove elements?
It's definitely not reading.
Changing the length of an ansistring is writing too IMO.


 > But taking the above definition, with "element" = "character", you see that ansistrings do have copy-on-write.

Nobody doubt that.
But dynamic arrays claim to *not* have copy-on-write although SetLength *does* a copy-on-write.
When comparing ansistrings to dynamic arrays:
Setlength on ansistrings does copy-on-write but the documentation says that copy-on-write does not exist for dynamic arrays.
Still Setlength on dynamic arrays does a copy-on-write in the same way as it does for ansistrings.


 > It's largely a question of semantics, I suppose.

Well, IMO "writing" is doing changes to the stored data.
In opposite to reading it writes bytes to memory.
Reading the variable before and after a SetLength show different results (length() reports different values).

If Setlength should *not* be considered "writing" then the documenation needs to make this very clear.
It should not leave any doubts nor allow different interpretation of the behaviour.
Especially when such illogical and schizophrenic behaviour exists.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann
In reply to this post by Florian Klämpfl
Am 2014-02-16 13:47, schrieb Florian Klämpfl:
 > setlength does not behave freaky but its behaviour is well designed. The
 > reason why setlength does a deep copy is simple: multithreading. If
 > setlength had no deep copy semantics, it would need locking of the whole
 > array data, not only locked access to the ref. counter.

There may be good reasons for doing it the way it has been done.
But for a programmer (who has not written the compiler) this behaviour is totaly unexpected.
When using unknown features of a programming language for the first time
then the documentaion should tell all aspects in detail and describe the exact behaviour.

_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Michael Van Canneyt


On Sun, 16 Feb 2014, Jürgen Hestermann wrote:

> Am 2014-02-16 13:47, schrieb Florian Klämpfl:
>> setlength does not behave freaky but its behaviour is well designed. The
>> reason why setlength does a deep copy is simple: multithreading. If
>> setlength had no deep copy semantics, it would need locking of the whole
>> array data, not only locked access to the ref. counter.
>
> There may be good reasons for doing it the way it has been done.
> But for a programmer (who has not written the compiler) this behaviour is
> totaly unexpected.
> When using unknown features of a programming language for the first time
> then the documentaion should tell all aspects in detail and describe the
> exact behaviour.
It does exactly that, it says:
1) No COW
and I added:
2) SetLength enforces unique ref. count.

Michael.
>
> _______________________________________________
> fpc-pascal maillist  -  [hidden email]
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>
_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Reply | Threaded
Open this post in threaded view
|

Re: Dynamic arrays, yet another pitfall

Jürgen Hestermann

Am 2014-02-16 17:16, schrieb Michael Van Canneyt:
> It does exactly that, it says:
> 1) No COW

As said already: SetLength *is* a write!


_______________________________________________
fpc-pascal maillist  -  [hidden email]
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
123