digitalmars.D - 2.011 invariant question

=?ISO-8859-15?Q?S=F6nke_Ludwig?= (41/41) Feb 23 2008 I'm still trying to upgrade my code base from 2.008 and it's getting clo...

Janice Caron (35/70) Feb 23 2008 That's not quite correct. In a declaration like:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (23/76) Feb 23 2008 That's right, of course, that this is a problem of lifetime. The differe...

Janice Caron (56/58) Feb 23 2008 You know, the more I think about it, the more I come to the conclusion

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (6/78) Feb 23 2008 Sounds like a good approach. The rare cases where the in-place resizing
Denton Cockburn (6/77) Feb 23 2008 Is it possible for you to actually benchmark the performance of the two
Frits van Bommel (35/62) Feb 24 2008 A hashtable could get you an average O(1) for that lookup (the worst

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (11/51) Feb 27 2008 This approach would probably be the way to go - don't know how bad the

Janice Caron (4/5) Feb 23 2008 Except of course I meant
=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/14) Feb 23 2008 Incidentally, this idea just came up in a thread on D.learn (although th...
Janice Caron (12/22) Feb 23 2008 Actually, I think we could simplify that interface to:

Walter Bright (2/22) Feb 24 2008 Compiler bug.
=?ISO-8859-15?Q?S=F6nke_Ludwig?= (2/20) Feb 27 2008 added bugzilla #1873 for reference.

=?ISO-8859-15?Q?S=F6nke_Ludwig?= writes:

I'm still trying to upgrade my code base from 2.008 and it's getting closer
with 2.011. However, there are some points left which make the porting quite
annoying, unfortunately. On of those points are structs with invariant members.

Maybe I'm missing something - but I don't get what's the reasoning behind the
following error?

-----------------
struct FOO {
	invariant int x;
	int y;
}

int main()
{
	FOO bar;
	bar.y = 2;
	return 0;
}
-----------------
gives the error:
test4.d(9): variable test4.main.bar cannot modify struct with immutable members



But on a related note on invariant - I think I have mentioned this already at
some point - but just in case. If 'invariant' should be theoretically safe in
the future, invariant values may not be allowed on the stack (the stack is never
invariant) - maybe something like "scope invariant" can be permitted - or some
heuristic which puts such values on the heap if a reference is takeen out of
scope (similar to closures). The same goes for invariant arrays and opCatAssign:

-----------------
import std.stdio;

int main()
{
	invariant(int)[] arr = [1, 2, 3];
	invariant(int)* someref = &arr[2];
	arr.length = 2;
	arr ~= 4;
	writefln("array: %s - %s", arr, *someref);
	return 0;
}
-----------------

prints "array: [1 2 4] - 4"

So currently any references to invariant data can be changed legally with
this stack principle. (Invariant arrays could be fixed by always reallocating on
opCatAssign)

Feb 23 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
  Maybe I'm missing something - but I don't get what's the reasoning behind the
  following error?

  -----------------
  struct FOO {
         invariant int x;
         int y;
  }

  int main()
  {
         FOO bar;
         bar.y = 2;
         return 0;
  }
  -----------------
  gives the error:
  test4.d(9): variable test4.main.bar cannot modify struct with immutable
members

Looks like a bug to me.


  But on a related note on invariant - I think I have mentioned this already at
  some point - but just in case. If 'invariant' should be theoretically safe in
  the future, invariant values may not be allowed on the stack (the stack is
never
  invariant)

That's not quite correct. In a declaration like:

    invariant int x = whatever;

x will be invariant, and on the stack, and this is perfectly
reasonable and safe because it's a local variable, and so won't exist
when the function goes out of scope. If you're going to say "the stack
is never invariant" because it gets reused, then you might as well say
"the heap is never invariant", because it, too, gets reused. It's all
about understanding the lifetime of an object.


  int main()
  {
         invariant(int)[] arr = [1, 2, 3];
         invariant(int)* someref = &arr[2];
         arr.length = 2;
         arr ~= 4;
         writefln("array: %s - %s", arr, *someref);
         return 0;
  }
  -----------------

  prints "array: [1 2 4] - 4"

  So currently any references to invariant data can be changed legally with
  this stack principle.


This has nothing to do with the stack. The same bug would manifest if
arr were on the heap.

It certainly does look like a bug though.


 (Invariant arrays could be fixed by always reallocating on
  opCatAssign)

No, that would be too inefficient. Appending to the end of a buffer is
too common an operation to demand a reallocation every time. It is
/much/ more sensible to allow the compiler to know the "reserved" size
of an array, and only reallocate if the "reserved" area becomes full.

I think the same problem will arise if you say

    arr = arr[0..2];

instead of

    arr.length = 2;

I'm fairly sure this happens because arr.ptr points to the first byte
of a resizable buffer. I'm fairly sure this is because ~=
(opCatAssign) is making the /assumption/ that the array is unique. As
has been mentioned before, there is no way to express uniqueness in D.

The problem also exists in non-invariant arrays too. Both mutable and
const arrays suffer from the same effect, whenever the arrays are
non-unique.

Perhaps D could partially work around the problem by banning ~= on all
but mutable arrays (although as noted, that still wouldn't solve it
completely), and then code which wants to append to a buffer would
have to declare the buffer mutable, but could still use assumeUnique()
to return it. But I'm not sure I like that idea.

I think it just needs to be extremely well documented: /Do not use ~=
on a non-unique array/.

Feb 23 2008

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:

Janice Caron wrote:
 Looks like a bug to me.

I'd think so, too.

 That's not quite correct. In a declaration like:
 
     invariant int x = whatever;
 
 x will be invariant, and on the stack, and this is perfectly
 reasonable and safe because it's a local variable, and so won't exist
 when the function goes out of scope. If you're going to say "the stack
 is never invariant" because it gets reused, then you might as well say
 "the heap is never invariant", because it, too, gets reused. It's all
 about understanding the lifetime of an object.
 

That's right, of course, that this is a problem of lifetime. The difference
is that objects on the heap are garbage collected and their lifetime is
always long enough not to disturb invariant references (as long as delete
is not used explicitly).

The reason why I think that maybe there should be a difference between
invariant and non-invariant stack values is that when you take a reference
to a non-invariant value, you are not guaranteed to get the same data when
you dereference it later anyways. Invariant, however, makes that claim. So
ideally, whenever you get hold of a reference/pointer to invariant data, you 
should be able to store it and assume that it will never change.
Is change in semantics (work on the compiler) is worth it in this case?
I don't know. I can live with it this way, too - but nevertheless, it would
be a similar generalization like with full-closures (and might be nice even
for non-invariant data, or maybe it's overkill).

 
 This has nothing to do with the stack. The same bug would manifest if
 arr were on the heap.
 
 It certainly does look like a bug though.
 

Just wanted to say that the array is used here like a stack.

 
 (Invariant arrays could be fixed by always reallocating on
  opCatAssign)

 
 No, that would be too inefficient. Appending to the end of a buffer is
 too common an operation to demand a reallocation every time. It is
 /much/ more sensible to allow the compiler to know the "reserved" size
 of an array, and only reallocate if the "reserved" area becomes full.
 

Right, I didn't think of incremental string building.. but maybe
"always reallocate on length = X;" would be an OK solution?

 I think the same problem will arise if you say
 
     arr = arr[0..2];
 
 instead of
 
     arr.length = 2;
 
 I'm fairly sure this happens because arr.ptr points to the first byte
 of a resizable buffer. I'm fairly sure this is because ~=
 (opCatAssign) is making the /assumption/ that the array is unique. As
 has been mentioned before, there is no way to express uniqueness in D.
 
 The problem also exists in non-invariant arrays too. Both mutable and
 const arrays suffer from the same effect, whenever the arrays are
 non-unique.

... but it's particularly bad when the invariant promise is broken at
the same time.

 
 Perhaps D could partially work around the problem by banning ~= on all
 but mutable arrays (although as noted, that still wouldn't solve it
 completely), and then code which wants to append to a buffer would
 have to declare the buffer mutable, but could still use assumeUnique()
 to return it. But I'm not sure I like that idea.
 
 I think it just needs to be extremely well documented: /Do not use ~=
 on a non-unique array/.
 

I think that if it's fixed, it has to be done transparently or it my be
difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

Feb 23 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion
that you're right, and that

    x ~= y;

should behave /exactly/ like

    x = x ~ y;

and cause a reallocation every time. Incremental string building is
the usual reason given why this isn't the case, but I think there's
another way round that problem.

Think about what happens at runtime when x ~= y; is executed. First,
the compiler has to decide whether or not x is resizable (that is,
does it point to the start of a resizable block?). Well - how does it
know that? Answer: it has to ask the garbage collector. OK, so how
does the garbage collector know that? Well, presumably, the gc has to
have some sort of map or table, so it can look up the address x, and
if it's in the table, then the answer is yes. But that lookup
operation has to be slower that O(1). No matter how you look at it,
there has to be at least O(log N) CPU time being consumed there, where
N is the number of gc allocated blocks.

It seems to me that a better approach would be to ditch that strategy
altogether, let ~= have predictable behavior even when the array is
non-unique, and use a /different/ strategy for incremental array
building. I would suggest something as simple as:

    struct Buffer!(T)
    {
        private T[] array;
        private uint reserved;
        void opCatAssign() {...}
        T[] toArray() {...}
        const(T)[] toCArray() {...}
        invariant(T)[] toIArray() {...}
        /*etc*/
    }

would do the job. Then instead of doing

    void buildString()
    {
        string r;
        r ~= something;
        r ~= somethingElse;
        return r;
    }

you would do

    void buildString()
    {
        Buffer!(char) r;
        r ~= something;
        r ~= somethingElse;
        return r.toIArray;
    }

toIArray() would have to call either assumeUnique or idup under the
hood. We could argue about which was the most sensible choice. But
either way, you saved a lot of CPU time because you no longer have to
decide at runtime whether or not this is a resizable buffer - that's
now something you know at compile time. And even when you do know
that, finding the reserved size is O(1), not O(log N). It's gotta be a
winner all round.

Feb 23 2008

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:

Janice Caron wrote:
 On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

 
 You know, the more I think about it, the more I come to the conclusion
 that you're right, and that
 
     x ~= y;
 
 should behave /exactly/ like
 
     x = x ~ y;
 
 and cause a reallocation every time. Incremental string building is
 the usual reason given why this isn't the case, but I think there's
 another way round that problem.
 
 Think about what happens at runtime when x ~= y; is executed. First,
 the compiler has to decide whether or not x is resizable (that is,
 does it point to the start of a resizable block?). Well - how does it
 know that? Answer: it has to ask the garbage collector. OK, so how
 does the garbage collector know that? Well, presumably, the gc has to
 have some sort of map or table, so it can look up the address x, and
 if it's in the table, then the answer is yes. But that lookup
 operation has to be slower that O(1). No matter how you look at it,
 there has to be at least O(log N) CPU time being consumed there, where
 N is the number of gc allocated blocks.
 
 It seems to me that a better approach would be to ditch that strategy
 altogether, let ~= have predictable behavior even when the array is
 non-unique, and use a /different/ strategy for incremental array
 building. I would suggest something as simple as:
 
     struct Buffer!(T)
     {
         private T[] array;
         private uint reserved;
         void opCatAssign() {...}
         T[] toArray() {...}
         const(T)[] toCArray() {...}
         invariant(T)[] toIArray() {...}
         /*etc*/
     }
 
 would do the job. Then instead of doing
 
     void buildString()
     {
         string r;
         r ~= something;
         r ~= somethingElse;
         return r;
     }
 
 you would do
 
     void buildString()
     {
         Buffer!(char) r;
         r ~= something;
         r ~= somethingElse;
         return r.toIArray;
     }
 
 toIArray() would have to call either assumeUnique or idup under the
 hood. We could argue about which was the most sensible choice. But
 either way, you saved a lot of CPU time because you no longer have to
 decide at runtime whether or not this is a resizable buffer - that's
 now something you know at compile time. And even when you do know
 that, finding the reserved size is O(1), not O(log N). It's gotta be a
 winner all round.
 

Sounds like a good approach. The rare cases where the in-place resizing
behavior of the build-in arrays is needed for performance could very
well be handled with a standard Buffer/vector/whatever. To me, it feels a bit
strange to use such an optimization without an explicit reserve()/capacity()
interface anyway... and the lookup performance argument is sound, too.

Feb 23 2008

Denton Cockburn <diboss hotmail.com> writes:

On Sat, 23 Feb 2008 13:37:10 +0000, Janice Caron wrote:

 On 23/02/2008, S�nke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

 
 You know, the more I think about it, the more I come to the conclusion
 that you're right, and that
 
     x ~= y;
 
 should behave /exactly/ like
 
     x = x ~ y;
 
 and cause a reallocation every time. Incremental string building is
 the usual reason given why this isn't the case, but I think there's
 another way round that problem.
 
 Think about what happens at runtime when x ~= y; is executed. First,
 the compiler has to decide whether or not x is resizable (that is,
 does it point to the start of a resizable block?). Well - how does it
 know that? Answer: it has to ask the garbage collector. OK, so how
 does the garbage collector know that? Well, presumably, the gc has to
 have some sort of map or table, so it can look up the address x, and
 if it's in the table, then the answer is yes. But that lookup
 operation has to be slower that O(1). No matter how you look at it,
 there has to be at least O(log N) CPU time being consumed there, where
 N is the number of gc allocated blocks.
 
 It seems to me that a better approach would be to ditch that strategy
 altogether, let ~= have predictable behavior even when the array is
 non-unique, and use a /different/ strategy for incremental array
 building. I would suggest something as simple as:
 
     struct Buffer!(T)
     {
         private T[] array;
         private uint reserved;
         void opCatAssign() {...}
         T[] toArray() {...}
         const(T)[] toCArray() {...}
         invariant(T)[] toIArray() {...}
         /*etc*/
     }
 
 would do the job. Then instead of doing
 
     void buildString()
     {
         string r;
         r ~= something;
         r ~= somethingElse;
         return r;
     }
 
 you would do
 
     void buildString()
     {
         Buffer!(char) r;
         r ~= something;
         r ~= somethingElse;
         return r.toIArray;
     }
 
 toIArray() would have to call either assumeUnique or idup under the
 hood. We could argue about which was the most sensible choice. But
 either way, you saved a lot of CPU time because you no longer have to
 decide at runtime whether or not this is a resizable buffer - that's
 now something you know at compile time. And even when you do know
 that, finding the reserved size is O(1), not O(log N). It's gotta be a
 winner all round.

Is it possible for you to actually benchmark the performance of the two
approaches? (We don't know what else happens behind the scenes, so
analysis alone isn't enough)

I don't think Walter would be able to turn down your approach if you have
evidence to back it up.

Feb 23 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Janice Caron wrote:
 On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

 
 You know, the more I think about it, the more I come to the conclusion
 that you're right, and that
 
     x ~= y;
 
 should behave /exactly/ like
 
     x = x ~ y;
 
 and cause a reallocation every time. Incremental string building is
 the usual reason given why this isn't the case, but I think there's
 another way round that problem.
 
 Think about what happens at runtime when x ~= y; is executed. First,
 the compiler has to decide whether or not x is resizable (that is,
 does it point to the start of a resizable block?). Well - how does it
 know that? Answer: it has to ask the garbage collector. OK, so how
 does the garbage collector know that? Well, presumably, the gc has to
 have some sort of map or table, so it can look up the address x, and
 if it's in the table, then the answer is yes. But that lookup
 operation has to be slower that O(1). No matter how you look at it,
 there has to be at least O(log N) CPU time being consumed there, where
 N is the number of gc allocated blocks.

A hashtable could get you an average O(1) for that lookup (the worst 
case could still be O(N) or O(log N) though, depending on the 
implementation). You wouldn't need to enter every allocation in there, 
by the way. If GC pools are always allocated on some aligned address, 
the start of one can be found by bitwise-anding out the lower bits of 
the pointer, and that address could be checked in the hashtable. "is 
this the start of an allocation" then becomes a hashtable lookup plus a 
check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may 
need to be adjusted to allocation_header.sizeof)

[1]: For pool_objsize == 2^N for some N no actual division needs to be 
performed; "& ((1 << N) - 1)" has the same effect.


On another note, how about this for a rule on concatenating to invariant 
arrays:
For each heap-allocated array, put both the reserved size and *the 
maximum used size* in the header of the allocation. Whenever arr.ptr + 
arr.length is equal to allocation.start + allocation.max_size_ever, it's 
safe to append to it in-place as long as you increase the maximum size 
accordingly.
(This'll need some synchronization to be thread-safe, but when there are 
multiple threads allocations are currently already synchronized anyway)

Note that the rule above allows appending to trailing slices; this is 
unsafe for mutable arrays since they traditionally reallocated in this 
instance and programs may rely on them not sharing data with the 
original. However, this is safe for invariant arrays since the shared 
prefix can't change -- the only way to detect sharing would be pointer 
comparisons which are not required to provide meaningful information 
anyway (IIRC the spec allows "moving" garbage collectors, so pointer 
comparisons aren't reliable anyway since you can't count on your data 
staying where it is).

This rule would allow incremental string building to be fairly efficient 
even for invariant arrays; at worst, the initial append will trigger a 
reallocation before an allocated block of memory runs out. The rest will 
only reallocate when it's actually needed just like with mutable-string 
building. (At least, as long as you don't slice away elements from the end)

Feb 24 2008

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:

Frits van Bommel wrote:
 
 A hashtable could get you an average O(1) for that lookup (the worst 
 case could still be O(N) or O(log N) though, depending on the 
 implementation). You wouldn't need to enter every allocation in there, 
 by the way. If GC pools are always allocated on some aligned address, 
 the start of one can be found by bitwise-anding out the lower bits of 
 the pointer, and that address could be checked in the hashtable. "is 
 this the start of an allocation" then becomes a hashtable lookup plus a 
 check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may 
 need to be adjusted to allocation_header.sizeof)
 
 [1]: For pool_objsize == 2^N for some N no actual division needs to be 
 performed; "& ((1 << N) - 1)" has the same effect.
 
 
 On another note, how about this for a rule on concatenating to invariant 
 arrays:
 For each heap-allocated array, put both the reserved size and *the 
 maximum used size* in the header of the allocation. Whenever arr.ptr + 
 arr.length is equal to allocation.start + allocation.max_size_ever, it's 
 safe to append to it in-place as long as you increase the maximum size 
 accordingly.
 (This'll need some synchronization to be thread-safe, but when there are 
 multiple threads allocations are currently already synchronized anyway)
 
 Note that the rule above allows appending to trailing slices; this is 
 unsafe for mutable arrays since they traditionally reallocated in this 
 instance and programs may rely on them not sharing data with the 
 original. However, this is safe for invariant arrays since the shared 
 prefix can't change -- the only way to detect sharing would be pointer 
 comparisons which are not required to provide meaningful information 
 anyway (IIRC the spec allows "moving" garbage collectors, so pointer 
 comparisons aren't reliable anyway since you can't count on your data 
 staying where it is).
 
 This rule would allow incremental string building to be fairly efficient 
 even for invariant arrays; at worst, the initial append will trigger a 
 reallocation before an allocated block of memory runs out. The rest will 
 only reallocate when it's actually needed just like with mutable-string 
 building. (At least, as long as you don't slice away elements from the end)

This approach would probably be the way to go - don't know how bad the
implementation overhead would be, but probably not too high. In any case,
the idea sounds very appealing and provides a very unintrusive behaviour.

The block look-up currently seems to cache the last accessed block, so
for successively appending to a single array it's always O(1) - only when
building two or more arrays are built in parallel, a real look-up has to be
made. This is probably sufficient for most use cases and I think the
maintenance costs for a hashtable would probably be higher than the benefit.

So a dedicated container for array building can of course never hurt, but
this scheme does at least not make it any more necessary than it is now.

Feb 27 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 23/02/2008, Janice Caron <caron800 googlemail.com> wrote:
     void buildString()

Except of course I meant

    string buildString()

Ho hum.

Feb 23 2008

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:

Sönke Ludwig wrote:
 So ideally, whenever you get hold of a reference/pointer to invariant data, 
 you should be able to store it and assume that it will never change.
 Is change in semantics (work on the compiler) is worth it in this case?
 I don't know. I can live with it this way, too - but nevertheless, it would
 be a similar generalization like with full-closures (and might be nice even
 for non-invariant data, or maybe it's overkill).
 

Incidentally, this idea just came up in a thread on D.learn (although the
post contains some oversights):
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=11410

Although this would create some performance pitfalls, it would definitely make
the language much more robust and remove a whole class of bugs. And those
kind of bugs are pretty frequent mainly among beginners of C-like languages.

Feb 23 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 23/02/2008, Janice Caron <caron800 googlemail.com> wrote:
     struct Buffer!(T)
     {
         private T[] array;
         private uint reserved;
         void opCatAssign() {...}
         T[] toArray() {...}
         const(T)[] toCArray() {...}
         invariant(T)[] toIArray() {...}
         /*etc*/
     }

Actually, I think we could simplify that interface to:

    struct Buffer!(T)
    {
        private T[] array;
        private uint reserved;
        void opCatAssign() {...}
        T[] toArray() {...}
    }

because if you want to return a const or invariant array, you'd just
have to change the initial declaration of the Buffer. As in
Buffer!(invariant(char)) instead of Buffer!(char).

Feb 23 2008

Walter Bright <newshound1 digitalmars.com> writes:

S�nke Ludwig wrote:
 Maybe I'm missing something - but I don't get what's the reasoning 
 behind the
 following error?
 
 -----------------
 struct FOO {
     invariant int x;
     int y;
 }
 
 int main()
 {
     FOO bar;
     bar.y = 2;
     return 0;
 }
 -----------------
 gives the error:
 test4.d(9): variable test4.main.bar cannot modify struct with immutable 
 members

Compiler bug.

Feb 24 2008

=?ISO-8859-15?Q?S=F6nke_Ludwig?= writes:

S�nke Ludwig wrote:

 
 -----------------
 struct FOO {
     invariant int x;
     int y;
 }
 
 int main()
 {
     FOO bar;
     bar.y = 2;
     return 0;
 }
 -----------------
 gives the error:
 test4.d(9): variable test4.main.bar cannot modify struct with immutable 
 members

Feb 27 2008

D Programming

C/C++ Programming

Other

digitalmars.D - 2.011 invariant question