www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - 2.011 invariant question

reply =?ISO-8859-15?Q?S=F6nke_Ludwig?= writes:
I'm still trying to upgrade my code base from 2.008 and it's getting closer
with 2.011. However, there are some points left which make the porting quite
annoying, unfortunately. On of those points are structs with invariant members.

Maybe I'm missing something - but I don't get what's the reasoning behind the
following error?

-----------------
struct FOO {
	invariant int x;
	int y;
}

int main()
{
	FOO bar;
	bar.y = 2;
	return 0;
}
-----------------
gives the error:
test4.d(9): variable test4.main.bar cannot modify struct with immutable members



But on a related note on invariant - I think I have mentioned this already at
some point - but just in case. If 'invariant' should be theoretically safe in
the future, invariant values may not be allowed on the stack (the stack is never
invariant) - maybe something like "scope invariant" can be permitted - or some
heuristic which puts such values on the heap if a reference is takeen out of
scope (similar to closures). The same goes for invariant arrays and opCatAssign:

-----------------
import std.stdio;

int main()
{
	invariant(int)[] arr = [1, 2, 3];
	invariant(int)* someref = &arr[2];
	arr.length = 2;
	arr ~= 4;
	writefln("array: %s - %s", arr, *someref);
	return 0;
}
-----------------

prints "array: [1 2 4] - 4"

So currently any references to invariant data can be changed legally with
this stack principle. (Invariant arrays could be fixed by always reallocating on
opCatAssign)
Feb 23 2008
next sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
  Maybe I'm missing something - but I don't get what's the reasoning behind the
  following error?

  -----------------
  struct FOO {
         invariant int x;
         int y;
  }

  int main()
  {
         FOO bar;
         bar.y = 2;
         return 0;
  }
  -----------------
  gives the error:
  test4.d(9): variable test4.main.bar cannot modify struct with immutable
members

Looks like a bug to me.
  But on a related note on invariant - I think I have mentioned this already at
  some point - but just in case. If 'invariant' should be theoretically safe in
  the future, invariant values may not be allowed on the stack (the stack is
never
  invariant)

That's not quite correct. In a declaration like: invariant int x = whatever; x will be invariant, and on the stack, and this is perfectly reasonable and safe because it's a local variable, and so won't exist when the function goes out of scope. If you're going to say "the stack is never invariant" because it gets reused, then you might as well say "the heap is never invariant", because it, too, gets reused. It's all about understanding the lifetime of an object.
  int main()
  {
         invariant(int)[] arr = [1, 2, 3];
         invariant(int)* someref = &arr[2];
         arr.length = 2;
         arr ~= 4;
         writefln("array: %s - %s", arr, *someref);
         return 0;
  }
  -----------------

  prints "array: [1 2 4] - 4"

  So currently any references to invariant data can be changed legally with
  this stack principle.

This has nothing to do with the stack. The same bug would manifest if arr were on the heap. It certainly does look like a bug though.
 (Invariant arrays could be fixed by always reallocating on
  opCatAssign)

No, that would be too inefficient. Appending to the end of a buffer is too common an operation to demand a reallocation every time. It is /much/ more sensible to allow the compiler to know the "reserved" size of an array, and only reallocate if the "reserved" area becomes full. I think the same problem will arise if you say arr = arr[0..2]; instead of arr.length = 2; I'm fairly sure this happens because arr.ptr points to the first byte of a resizable buffer. I'm fairly sure this is because ~= (opCatAssign) is making the /assumption/ that the array is unique. As has been mentioned before, there is no way to express uniqueness in D. The problem also exists in non-invariant arrays too. Both mutable and const arrays suffer from the same effect, whenever the arrays are non-unique. Perhaps D could partially work around the problem by banning ~= on all but mutable arrays (although as noted, that still wouldn't solve it completely), and then code which wants to append to a buffer would have to declare the buffer mutable, but could still use assumeUnique() to return it. But I'm not sure I like that idea. I think it just needs to be extremely well documented: /Do not use ~= on a non-unique array/.
Feb 23 2008
parent reply =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:
Janice Caron wrote:
 Looks like a bug to me.

I'd think so, too.
 That's not quite correct. In a declaration like:
 
     invariant int x = whatever;
 
 x will be invariant, and on the stack, and this is perfectly
 reasonable and safe because it's a local variable, and so won't exist
 when the function goes out of scope. If you're going to say "the stack
 is never invariant" because it gets reused, then you might as well say
 "the heap is never invariant", because it, too, gets reused. It's all
 about understanding the lifetime of an object.
 

That's right, of course, that this is a problem of lifetime. The difference is that objects on the heap are garbage collected and their lifetime is always long enough not to disturb invariant references (as long as delete is not used explicitly). The reason why I think that maybe there should be a difference between invariant and non-invariant stack values is that when you take a reference to a non-invariant value, you are not guaranteed to get the same data when you dereference it later anyways. Invariant, however, makes that claim. So ideally, whenever you get hold of a reference/pointer to invariant data, you should be able to store it and assume that it will never change. Is change in semantics (work on the compiler) is worth it in this case? I don't know. I can live with it this way, too - but nevertheless, it would be a similar generalization like with full-closures (and might be nice even for non-invariant data, or maybe it's overkill).
 
 This has nothing to do with the stack. The same bug would manifest if
 arr were on the heap.
 
 It certainly does look like a bug though.
 

Just wanted to say that the array is used here like a stack.
 
 (Invariant arrays could be fixed by always reallocating on
  opCatAssign)

No, that would be too inefficient. Appending to the end of a buffer is too common an operation to demand a reallocation every time. It is /much/ more sensible to allow the compiler to know the "reserved" size of an array, and only reallocate if the "reserved" area becomes full.

Right, I didn't think of incremental string building.. but maybe "always reallocate on length = X;" would be an OK solution?
 I think the same problem will arise if you say
 
     arr = arr[0..2];
 
 instead of
 
     arr.length = 2;
 
 I'm fairly sure this happens because arr.ptr points to the first byte
 of a resizable buffer. I'm fairly sure this is because ~=
 (opCatAssign) is making the /assumption/ that the array is unique. As
 has been mentioned before, there is no way to express uniqueness in D.
 
 The problem also exists in non-invariant arrays too. Both mutable and
 const arrays suffer from the same effect, whenever the arrays are
 non-unique.

... but it's particularly bad when the invariant promise is broken at the same time.
 
 Perhaps D could partially work around the problem by banning ~= on all
 but mutable arrays (although as noted, that still wouldn't solve it
 completely), and then code which wants to append to a buffer would
 have to declare the buffer mutable, but could still use assumeUnique()
 to return it. But I'm not sure I like that idea.
 
 I think it just needs to be extremely well documented: /Do not use ~=
 on a non-unique array/.
 

I think that if it's fixed, it has to be done transparently or it my be difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.
Feb 23 2008
next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:
Janice Caron wrote:
 On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion that you're right, and that x ~= y; should behave /exactly/ like x = x ~ y; and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem. Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks. It seems to me that a better approach would be to ditch that strategy altogether, let ~= have predictable behavior even when the array is non-unique, and use a /different/ strategy for incremental array building. I would suggest something as simple as: struct Buffer!(T) { private T[] array; private uint reserved; void opCatAssign() {...} T[] toArray() {...} const(T)[] toCArray() {...} invariant(T)[] toIArray() {...} /*etc*/ } would do the job. Then instead of doing void buildString() { string r; r ~= something; r ~= somethingElse; return r; } you would do void buildString() { Buffer!(char) r; r ~= something; r ~= somethingElse; return r.toIArray; } toIArray() would have to call either assumeUnique or idup under the hood. We could argue about which was the most sensible choice. But either way, you saved a lot of CPU time because you no longer have to decide at runtime whether or not this is a resizable buffer - that's now something you know at compile time. And even when you do know that, finding the reserved size is O(1), not O(log N). It's gotta be a winner all round.

Sounds like a good approach. The rare cases where the in-place resizing behavior of the build-in arrays is needed for performance could very well be handled with a standard Buffer/vector/whatever. To me, it feels a bit strange to use such an optimization without an explicit reserve()/capacity() interface anyway... and the lookup performance argument is sound, too.
Feb 23 2008
prev sibling next sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:
Sönke Ludwig wrote:
 So ideally, whenever you get hold of a reference/pointer to invariant data, 
 you should be able to store it and assume that it will never change.
 Is change in semantics (work on the compiler) is worth it in this case?
 I don't know. I can live with it this way, too - but nevertheless, it would
 be a similar generalization like with full-closures (and might be nice even
 for non-invariant data, or maybe it's overkill).
 

Incidentally, this idea just came up in a thread on D.learn (although the post contains some oversights): http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=11410 Although this would create some performance pitfalls, it would definitely make the language much more robust and remove a whole class of bugs. And those kind of bugs are pretty frequent mainly among beginners of C-like languages.
Feb 23 2008
prev sibling next sibling parent Denton Cockburn <diboss hotmail.com> writes:
On Sat, 23 Feb 2008 13:37:10 +0000, Janice Caron wrote:

 On 23/02/2008, Snke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion that you're right, and that x ~= y; should behave /exactly/ like x = x ~ y; and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem. Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks. It seems to me that a better approach would be to ditch that strategy altogether, let ~= have predictable behavior even when the array is non-unique, and use a /different/ strategy for incremental array building. I would suggest something as simple as: struct Buffer!(T) { private T[] array; private uint reserved; void opCatAssign() {...} T[] toArray() {...} const(T)[] toCArray() {...} invariant(T)[] toIArray() {...} /*etc*/ } would do the job. Then instead of doing void buildString() { string r; r ~= something; r ~= somethingElse; return r; } you would do void buildString() { Buffer!(char) r; r ~= something; r ~= somethingElse; return r.toIArray; } toIArray() would have to call either assumeUnique or idup under the hood. We could argue about which was the most sensible choice. But either way, you saved a lot of CPU time because you no longer have to decide at runtime whether or not this is a resizable buffer - that's now something you know at compile time. And even when you do know that, finding the reserved size is O(1), not O(log N). It's gotta be a winner all round.

Is it possible for you to actually benchmark the performance of the two approaches? (We don't know what else happens behind the scenes, so analysis alone isn't enough) I don't think Walter would be able to turn down your approach if you have evidence to back it up.
Feb 23 2008
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Janice Caron wrote:
 On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion that you're right, and that x ~= y; should behave /exactly/ like x = x ~ y; and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem. Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks.

A hashtable could get you an average O(1) for that lookup (the worst case could still be O(N) or O(log N) though, depending on the implementation). You wouldn't need to enter every allocation in there, by the way. If GC pools are always allocated on some aligned address, the start of one can be found by bitwise-anding out the lower bits of the pointer, and that address could be checked in the hashtable. "is this the start of an allocation" then becomes a hashtable lookup plus a check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may need to be adjusted to allocation_header.sizeof) [1]: For pool_objsize == 2^N for some N no actual division needs to be performed; "& ((1 << N) - 1)" has the same effect. On another note, how about this for a rule on concatenating to invariant arrays: For each heap-allocated array, put both the reserved size and *the maximum used size* in the header of the allocation. Whenever arr.ptr + arr.length is equal to allocation.start + allocation.max_size_ever, it's safe to append to it in-place as long as you increase the maximum size accordingly. (This'll need some synchronization to be thread-safe, but when there are multiple threads allocations are currently already synchronized anyway) Note that the rule above allows appending to trailing slices; this is unsafe for mutable arrays since they traditionally reallocated in this instance and programs may rely on them not sharing data with the original. However, this is safe for invariant arrays since the shared prefix can't change -- the only way to detect sharing would be pointer comparisons which are not required to provide meaningful information anyway (IIRC the spec allows "moving" garbage collectors, so pointer comparisons aren't reliable anyway since you can't count on your data staying where it is). This rule would allow incremental string building to be fairly efficient even for invariant arrays; at worst, the initial append will trigger a reallocation before an allocated block of memory runs out. The rest will only reallocate when it's actually needed just like with mutable-string building. (At least, as long as you don't slice away elements from the end)
Feb 24 2008
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= writes:
Frits van Bommel wrote:
 
 A hashtable could get you an average O(1) for that lookup (the worst 
 case could still be O(N) or O(log N) though, depending on the 
 implementation). You wouldn't need to enter every allocation in there, 
 by the way. If GC pools are always allocated on some aligned address, 
 the start of one can be found by bitwise-anding out the lower bits of 
 the pointer, and that address could be checked in the hashtable. "is 
 this the start of an allocation" then becomes a hashtable lookup plus a 
 check "(cast(size_t)(p - pool_start) % pool_objsize == 0"[1]. (The 0 may 
 need to be adjusted to allocation_header.sizeof)
 
 [1]: For pool_objsize == 2^N for some N no actual division needs to be 
 performed; "& ((1 << N) - 1)" has the same effect.
 
 
 On another note, how about this for a rule on concatenating to invariant 
 arrays:
 For each heap-allocated array, put both the reserved size and *the 
 maximum used size* in the header of the allocation. Whenever arr.ptr + 
 arr.length is equal to allocation.start + allocation.max_size_ever, it's 
 safe to append to it in-place as long as you increase the maximum size 
 accordingly.
 (This'll need some synchronization to be thread-safe, but when there are 
 multiple threads allocations are currently already synchronized anyway)
 
 Note that the rule above allows appending to trailing slices; this is 
 unsafe for mutable arrays since they traditionally reallocated in this 
 instance and programs may rely on them not sharing data with the 
 original. However, this is safe for invariant arrays since the shared 
 prefix can't change -- the only way to detect sharing would be pointer 
 comparisons which are not required to provide meaningful information 
 anyway (IIRC the spec allows "moving" garbage collectors, so pointer 
 comparisons aren't reliable anyway since you can't count on your data 
 staying where it is).
 
 This rule would allow incremental string building to be fairly efficient 
 even for invariant arrays; at worst, the initial append will trigger a 
 reallocation before an allocated block of memory runs out. The rest will 
 only reallocate when it's actually needed just like with mutable-string 
 building. (At least, as long as you don't slice away elements from the end)

This approach would probably be the way to go - don't know how bad the implementation overhead would be, but probably not too high. In any case, the idea sounds very appealing and provides a very unintrusive behaviour. The block look-up currently seems to cache the last accessed block, so for successively appending to a single array it's always O(1) - only when building two or more arrays are built in parallel, a real look-up has to be made. This is probably sufficient for most use cases and I think the maintenance costs for a hashtable would probably be higher than the benefit. So a dedicated container for array building can of course never hurt, but this scheme does at least not make it any more necessary than it is now.
Feb 27 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 23/02/2008, Sönke Ludwig <ludwig informatik_dot_uni-luebeck.de> wrote:
 I think that if it's fixed, it has to be done transparently or it my be
  difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.

You know, the more I think about it, the more I come to the conclusion that you're right, and that x ~= y; should behave /exactly/ like x = x ~ y; and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem. Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks. It seems to me that a better approach would be to ditch that strategy altogether, let ~= have predictable behavior even when the array is non-unique, and use a /different/ strategy for incremental array building. I would suggest something as simple as: struct Buffer!(T) { private T[] array; private uint reserved; void opCatAssign() {...} T[] toArray() {...} const(T)[] toCArray() {...} invariant(T)[] toIArray() {...} /*etc*/ } would do the job. Then instead of doing void buildString() { string r; r ~= something; r ~= somethingElse; return r; } you would do void buildString() { Buffer!(char) r; r ~= something; r ~= somethingElse; return r.toIArray; } toIArray() would have to call either assumeUnique or idup under the hood. We could argue about which was the most sensible choice. But either way, you saved a lot of CPU time because you no longer have to decide at runtime whether or not this is a resizable buffer - that's now something you know at compile time. And even when you do know that, finding the reserved size is O(1), not O(log N). It's gotta be a winner all round.
Feb 23 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 23/02/2008, Janice Caron <caron800 googlemail.com> wrote:
     void buildString()

Except of course I meant string buildString() Ho hum.
Feb 23 2008
prev sibling next sibling parent "Janice Caron" <caron800 googlemail.com> writes:
On 23/02/2008, Janice Caron <caron800 googlemail.com> wrote:
     struct Buffer!(T)
     {
         private T[] array;
         private uint reserved;
         void opCatAssign() {...}
         T[] toArray() {...}
         const(T)[] toCArray() {...}
         invariant(T)[] toIArray() {...}
         /*etc*/
     }

Actually, I think we could simplify that interface to: struct Buffer!(T) { private T[] array; private uint reserved; void opCatAssign() {...} T[] toArray() {...} } because if you want to return a const or invariant array, you'd just have to change the initial declaration of the Buffer. As in Buffer!(invariant(char)) instead of Buffer!(char).
Feb 23 2008
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Snke Ludwig wrote:
 Maybe I'm missing something - but I don't get what's the reasoning 
 behind the
 following error?
 
 -----------------
 struct FOO {
     invariant int x;
     int y;
 }
 
 int main()
 {
     FOO bar;
     bar.y = 2;
     return 0;
 }
 -----------------
 gives the error:
 test4.d(9): variable test4.main.bar cannot modify struct with immutable 
 members

Compiler bug.
Feb 24 2008
prev sibling parent =?ISO-8859-15?Q?S=F6nke_Ludwig?= writes:
Snke Ludwig wrote:

 
 -----------------
 struct FOO {
     invariant int x;
     int y;
 }
 
 int main()
 {
     FOO bar;
     bar.y = 2;
     return 0;
 }
 -----------------
 gives the error:
 test4.d(9): variable test4.main.bar cannot modify struct with immutable 
 members
 

added bugzilla #1873 for reference.
Feb 27 2008