www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - Array bounds check on array slices.

reply Dave <Dave_member pathlink.com> writes:
Shouldn't:

int[] i = a[length .. length];

produce an ArrayBoundsError?
Nov 27 2005
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Why should it?  I can see plenty of cases where I would want that to 
work, being that it's a slice of zero elements at the end of the array. 
  That's not out of bounds, is it?

However, these:

int[] i = a[length + 1 .. length + 1];
int[] i = a[length .. length + 1];

Definitely should not work.  And they don't.  Sounds perfect to me 8).

-[Unknown]


 Shouldn't:
 
 int[] i = a[length .. length];
 
 produce an ArrayBoundsError?
 
 

Nov 27 2005
parent reply "Dave" <Dave_member pathlink.com> writes:
"Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
news:dmdtm2$18d3$1 digitaldaemon.com...
 Why should it?  I can see plenty of cases where I would want that to work, 
 being that it's a slice of zero elements at the end of the array. That's 
 not out of bounds, is it?

 However, these:

 int[] i = a[length + 1 .. length + 1];
 int[] i = a[length .. length + 1];

 Definitely should not work.  And they don't.  Sounds perfect to me 8).

 -[Unknown]

Hmm, doesn't seem perfect to me <g> Consider: Since array[length .. length] actually points *past* the end of the array, for non-GC allocated memory, it is currently legal through slicing to point into memory not 'owned' by the array, which to me really should not be legal (and the rule for slice leading indexes, whatever it turns out to be, really should to be consistent between D array and pointer slicing). Also, with GC allocated memory, allocations using power-of-2 bytes have to be doubled because of this so the next larger 'bucket' is assigned and initialized (see phobos/internal/gc/gc.d and gcx.d). Seems pretty wasteful.. Finally, it just seems semantically inconsistent to me, and will probably lead to a lot of D newbie problems and hard to find bugs. [Since this seems legal and not a bug, I'm cross-posting to digitalmars.D for more discussion..]
 Shouldn't:

 int[] i = a[length .. length];

 produce an ArrayBoundsError?
 


Nov 28 2005
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Dave wrote:
<snip>
 Since array[length .. length] actually points *past* the end of the array, 
 for non-GC allocated memory, it is currently legal through slicing to point 
 into memory not 'owned' by the array, which to me really should not be legal 
 (and the rule for slice leading indexes, whatever it turns out to be, really 
 should to be consistent between D array and pointer slicing).

The problem is that it is convenient to write something like qwert = qwert[1..$]; to cut off the first element of an array, regardless of whether anything remains. But I'm guessing the problem to which you refer is that the beginning of such a slice might be the beginning of another array altogether, and so later changing its .length can overwrite the contents of the other array. Can this actually happen with the current DMD/GDC? Or does the structure of the GC heap somehow prevent it? -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Nov 28 2005
parent "Walter Bright" <newshound digitalmars.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:dmfc73$2hec$1 digitaldaemon.com...
 But I'm guessing the problem to which you refer is that the beginning of
 such a slice might be the beginning of another array altogether, and so
 later changing its .length can overwrite the contents of the other
 array.  Can this actually happen with the current DMD/GDC?  Or does the
 structure of the GC heap somehow prevent it?

You're right that this can be a problem. The solution currently used is to allocate a bit more than necessary for the array, so an end slice will not be at the beginning of some other array later in memory.
Dec 01 2005
prev sibling next sibling parent reply Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Dave wrote:
 "Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
 news:dmdtm2$18d3$1 digitaldaemon.com...
 
Why should it?  I can see plenty of cases where I would want that to work, 
being that it's a slice of zero elements at the end of the array. That's 
not out of bounds, is it?

However, these:

int[] i = a[length + 1 .. length + 1];
int[] i = a[length .. length + 1];

Definitely should not work.  And they don't.  Sounds perfect to me 8).

-[Unknown]

Hmm, doesn't seem perfect to me <g> Consider: Since array[length .. length] actually points *past* the end of the array, for non-GC allocated memory, it is currently legal through slicing to point into memory not 'owned' by the array, which to me really should not be legal

legal to use the array no that it has length 0.
Shouldn't:

int[] i = a[length .. length];

produce an ArrayBoundsError?



-- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Nov 29 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Bruno Medeiros wrote:
 Dave wrote:

 Hmm, doesn't seem perfect to me <g> Consider:

 Since array[length .. length] actually points *past* the end of the 
 array, for non-GC allocated memory, it is currently legal through 
 slicing to point into memory not 'owned' by the array, which to me 
 really should not be legal 

legal to use the array no that it has length 0.

You miss the point. Suppose you have int[] qwert, yuiop; qwert.length = 16; yuiop.length = 16; and the two arrays happen to end up adjacent in memory. Now you set int[] asdfg = qwert[16..16]; then asdfg will point to the beginning of yuiop. Therefore if you then try to set asdfg.length, then because asdfg.ptr is at the beginning of a heap-allocated block, it will try to lengthen asdfg in place, thereby overwriting the contents of yuiop. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Nov 30 2005
next sibling parent Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Stewart Gordon wrote:
 
 You miss the point.
 
 Suppose you have
 
    int[] qwert, yuiop;
    qwert.length = 16;
    yuiop.length = 16;
 
 and the two arrays happen to end up adjacent in memory.  Now you set
 
    int[] asdfg = qwert[16..16];
 
 then asdfg will point to the beginning of yuiop.  Therefore if you then 
 try to set asdfg.length, then because asdfg.ptr is at the beginning of a 
 heap-allocated block, it will try to lengthen asdfg in place, thereby 
 overwriting the contents of yuiop.
 
 Stewart.
 

Dynamic Array Length" (where a similar example is presented) on http://www.digitalmars.com/d/arrays.html . In particular : "To guarantee copying behavior, use the .dup property to ensure a unique array that can be resized." I do agree that overall this doesn't seem a very elegant, clean behaviour. -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Nov 30 2005
prev sibling parent Dave <Dave_member pathlink.com> writes:
In article <dmk21u$dqt$1 digitaldaemon.com>, Stewart Gordon says...
Bruno Medeiros wrote:
 Dave wrote:

 Hmm, doesn't seem perfect to me <g> Consider:

 Since array[length .. length] actually points *past* the end of the 
 array, for non-GC allocated memory, it is currently legal through 
 slicing to point into memory not 'owned' by the array, which to me 
 really should not be legal 

legal to use the array no that it has length 0.

You miss the point. Suppose you have int[] qwert, yuiop; qwert.length = 16; yuiop.length = 16; and the two arrays happen to end up adjacent in memory. Now you set int[] asdfg = qwert[16..16]; then asdfg will point to the beginning of yuiop. Therefore if you then try to set asdfg.length, then because asdfg.ptr is at the beginning of a heap-allocated block, it will try to lengthen asdfg in place, thereby overwriting the contents of yuiop. Stewart.

Well put, and to cover cases like that the GC ABI becomes more complicated and less efficient, not to mention the pain this could cause when people start doing custom memory management in earnest. Not to mention my biggest nit, which is that it is inconsistent with the rest of the array bounds rules and doesn't make sense semantically. This is how most newbies will envision array slicing for [length .. length]: for(int i = length; i < length; i++) { ... } // WTF? - Dave
-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:- C++  a->--- UB  P+ L E  W++  N+++ o K-  w++  O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Nov 30 2005
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Dave wrote:
 "Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
 news:dmdtm2$18d3$1 digitaldaemon.com...
 
 Why should it? I can see plenty of cases where I would want that to
 work, being that it's a slice of zero elements at the end of the
 array. That's not out of bounds, is it?
 
 However, these: >>
int[] i = a[length + 1 .. length + 1];
int[] i = a[length .. length + 1];

 Definitely should not work.  And they don't.  Sounds perfect to me
 8).


Yes, this is counter-intuitive. OTOH, the C specification (since decades ago) specified that (IIRC) you should always be able to point "one past" _anything_. The case (specifically) being even one past end-of-memory. Adding to this (in a massive way, too) is the STL. There the whole idea of handling _any_ collection is _based_ on the ability to point one-past the end. Following from that: it should be okay to point to one-past. At the same time, obviously, it should be illegal to actually dereference that address, since "everyone knows" that it either contains garbage, unrelated stuff, or causes a hardware trap, killing the program. Now, the example code does pointing, but not dereferencing. Hence, no error.
 Hmm, doesn't seem perfect to me <g> Consider:
 
 Since array[length .. length] actually points *past* the end of the
 array, for non-GC allocated memory, it is currently legal through
 slicing to point into memory not 'owned' by the array, which to me
 really should not be legal (and the rule for slice leading indexes,
 whatever it turns out to be, really should to be consistent between D
 array and pointer slicing).
 
 Also, with GC allocated memory, allocations using power-of-2 bytes
 have to be doubled because of this so the next larger 'bucket' is
 assigned and initialized (see phobos/internal/gc/gc.d and gcx.d).
 Seems pretty wasteful..
 
 Finally, it just seems semantically inconsistent to me, and will
 probably lead to a lot of D newbie problems and hard to find bugs.
 
 [Since this seems legal and not a bug, I'm cross-posting to
 digitalmars.D for more discussion..]
 
 
 Shouldn't:
 
 int[] i = a[length .. length];
 
 produce an ArrayBoundsError?
 



Nov 29 2005
parent Dave <Dave_member pathlink.com> writes:
In article <438CDCD4.3020709 nospam.org>, Georg Wrede says...
Dave wrote:
 "Unknown W. Brackets" <unknown simplemachines.org> wrote in message 
 news:dmdtm2$18d3$1 digitaldaemon.com...
 
 Why should it? I can see plenty of cases where I would want that to
 work, being that it's a slice of zero elements at the end of the
 array. That's not out of bounds, is it?
 
 However, these: >>
int[] i = a[length + 1 .. length + 1];
int[] i = a[length .. length + 1];

 Definitely should not work.  And they don't.  Sounds perfect to me
 8).


Yes, this is counter-intuitive. OTOH, the C specification (since decades ago) specified that (IIRC) you should always be able to point "one past" _anything_. The case (specifically) being even one past end-of-memory.

Yes, but this is D, not C, and easy array slicing is a D feature <g>. There's nothing getting in the way of backwards compatibility with C since you can still point one past with naked pointers using memory allocated by a C compliant allocator (ie: std.c.stdlib.malloc). That doesn't mean D has to carry this legacy forward with any of the D specific features.
Adding to this (in a massive way, too) is the STL. There the whole idea 
of handling _any_ collection is _based_ on the ability to point one-past 
the end.

Following from that: it should be okay to point to one-past. At the same 
time, obviously, it should be illegal to actually dereference that 
address, since "everyone knows" that it either contains garbage, 
unrelated stuff, or causes a hardware trap, killing the program.

Now, the example code does pointing, but not dereferencing. Hence, no error.

 Hmm, doesn't seem perfect to me <g> Consider:
 
 Since array[length .. length] actually points *past* the end of the
 array, for non-GC allocated memory, it is currently legal through
 slicing to point into memory not 'owned' by the array, which to me
 really should not be legal (and the rule for slice leading indexes,
 whatever it turns out to be, really should to be consistent between D
 array and pointer slicing).
 
 Also, with GC allocated memory, allocations using power-of-2 bytes
 have to be doubled because of this so the next larger 'bucket' is
 assigned and initialized (see phobos/internal/gc/gc.d and gcx.d).
 Seems pretty wasteful..
 
 Finally, it just seems semantically inconsistent to me, and will
 probably lead to a lot of D newbie problems and hard to find bugs.
 
 [Since this seems legal and not a bug, I'm cross-posting to
 digitalmars.D for more discussion..]
 
 
 Shouldn't:
 
 int[] i = a[length .. length];
 
 produce an ArrayBoundsError?
 




Nov 30 2005