www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - array slice syntax

reply William Kilian <will tk2.com> writes:
The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

The reasons for the current syntax are clear. It allows the length 
property to be used as the second slice bound, e.g. a[3..a.length]. This 
saves us from typing length - 1 whenever we want a slice to extend to 
the end of an array. However, adding a read-only last property to arrays 
such that last := length - 1 would save us the repeated subtractions yet 
allow array slice syntax to have the intuitive meaning. Perhaps end or 
tail would be a better name than last. Regardless, the meaning of 
a[3..a.last] or a[3..a.end] is transparent. Besides, a[3..a.length] is 
confusing anyway because a[a.length] is always invalid.

I'm honestly surprised by the current design. Everything else in D is 
wonderfully intuitive.

Will
May 01 2005
next sibling parent reply "Hasan Aljudy" <hasan.aljudy gmail.com> writes:
Array indexing in counter-intiuitive anyway .. (in the C family of 
languages)
you declare an array int a[5] .. yet a[5] is not an element in the array
a[1] is not the first element, it's the second!
It's always confusing at the beginning ..

I think the current syntax does make sense: consider the situation where you 
want to slice 5 elements starting from element x: the syntax makes sense: 
array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

That comes up often in my opinion .. you don't really hardocde the slices, 
you want a starting point and a length, so the syntax becomes: 
array[start..start+length]

a[1..3] refers to two elents because it's actually a[1..1+2], or because 3-1 
= 2.
when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" and 
conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, so that 
must be 3 elements"
This is especially true when the numbers are largeer and the space between 
them is big, like a[34..67], there is no way that I'm gonna count all the 
way from 34 to 67! I would just subtract 67-34, and that gives me the number 
of elements, it wouldn't be very intuitive if I had to add 1 to that: 
67-34+1

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. a[3..a.length]. This 
 saves us from typing length - 1 whenever we want a slice to extend to the 
 end of an array. However, adding a read-only last property to arrays such 
 that last := length - 1 would save us the repeated subtractions yet allow 
 array slice syntax to have the intuitive meaning. Perhaps end or tail 
 would be a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is confusing anyway 
 because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D is 
 wonderfully intuitive.

 Will 

May 01 2005
parent reply William Kilian <will tk2.com> writes:
 Array indexing in counter-intiuitive anyway .. (in the C family of 
 languages)
 you declare an array int a[5] .. yet a[5] is not an element in the array
 a[1] is not the first element, it's the second!
 It's always confusing at the beginning ..

I always thought int a[5] being indexed from 0 to 4 was perfectly intuitive. The declarations are one-based, which is intuitive. The indexes are binary and go from all 0s to all 1s, just like all integers in a computer, which to me was intuitive from the beginning. We could make the indexes start at one, but my caveat with the slice operator would be the same. I'd want to use a[1..5] instead of a[1..6] to mean the whole array. But if we one-based our indexes and made both range endpoints inclusive, then a[1..length] would still be the whole array like it is now.
 I think the current syntax does make sense: consider the situation where you 
 want to slice 5 elements starting from element x: the syntax makes sense: 
 array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

 That comes up often in my opinion ..

It's easy to come up with valid examples to justify either range syntax. If I want to slice a tag out of an xml string, I would expect to do something like this: char[] tag = xml[xml.pos('<')..xml.pos('>')]; Right now, I would need this instead: char[] tag = xml[xml.pos('<') .. xml.pos('>') + 1]; Whether or not the syntax is transparent is not related to one example being more frequent than the other. Transparency is about the meanings the symbols have, not how frequent we use the symbols.
 you don't really hardocde the slices, 

No, but I can still have two absolute positions.
 you want a starting point and a length, so the syntax becomes: 
 array[start..start+length]

If array[b..c] is designed to mean array[start..start+length], then start is used redundantly. It should just be array[start..length]. Except of course, that is bad because ".." implies a range; b..c intuitively means from b to c, including b and c. If it's supposed to be a start and a length, the slice operator should use "..", it should be something that does not imply a range. Then your examples would read like array[x#5] or array[x,5] and be more compact than array[x..x+5].
 a[1..3] refers to two elents because it's actually a[1..1+2], or because 3-1 
 = 2.
 when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" and 
 conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, so that 
 must be 3 elements"
 This is especially true when the numbers are largeer and the space between 
 them is big, like a[34..67], there is no way that I'm gonna count all the 
 way from 34 to 67! I would just subtract 67-34, and that gives me the number 
 of elements, it wouldn't be very intuitive if I had to add 1 to that: 
 67-34+1
 

I disagree with that. If I was loading numbered boxes and on day 2 I loaded boxes numbered 34 through 67, how many boxes did I load? Not 67 - 34. Ranges naturally include their endpoints and the count of integers in a range is not high - low. I don't count to know that 10 through 19 is ten integers, nor would I count for 5 through 8. For 5..8 I would say 5..8 has the same number of elements as 1..4: 4 elements. So I do subtract, but not high - low. I subtract high - (low - 1) because that's more intuitive than the equivalent high - low + 1. So I loaded 67 - 33 = 34 boxes on day 2 because 34..67 has the same count as 1..34. Other languages assign the meaning I expect to range operators. In perl, 1..10 means a list with ten elements. In fact, perl slices work how I suggest they should work in D. Pascal uses 1..10 in array declarations to mean ten elements. PHP doesn't have a range operator, but it has a function range(low, high) that returns an array of numbers from low to high inclusive. Regular expressions use character ranges and a-z includes the z. Ranges intuitively include their endpoints. My point is that array[b..c] intuitively means array[start..end], not array[start..end+1] and not array[start..start+length]. b..c is a range and ranges intuitively include their endpoints. I agree start+length is a common use for slices, but that does not justify compromising the transparency of the range operator by making it mean something different than what a range intuitively means. Instead, slices should have an alternative syntax for start+length that doesn't use an operator that implies a range. Anyway, I can remember that array[b..c] means array[start..start+length], so I greatly appreciate your message because you pointed that out. But start..start+length is a mnemonic. If the intuitive meaning was used, I wouldn't need a memory trick like that. Of course, now that I've written all this, I'd remember it without a mnemonic anyway ;) Will
May 01 2005
parent reply "Hasan Aljudy" <hasan.aljudy gmail.com> writes:
 Array indexing in counter-intiuitive anyway .. (in the C family of
 languages)
 you declare an array int a[5] .. yet a[5] is not an element in the array
 a[1] is not the first element, it's the second!
 It's always confusing at the beginning ..

I always thought int a[5] being indexed from 0 to 4 was perfectly intuitive. The declarations are one-based, which is intuitive. The indexes are binary and go from all 0s to all 1s, just like all integers in a computer, which to me was intuitive from the beginning.

It depends on how you think about it .. For me, a[5] means a set with 5 elements (not the mathematical meaning of a set, but just a set of objects) Intuitivly, the first element is the element number one (hence 1st), so a[1] would make sense, however it turns out a[x] is just *(a+x) so a[1] becomes the second element.
 We could make the indexes start at one, but my caveat with the slice 
 operator would be the same. I'd want to use a[1..5] instead of a[1..6] to 
 mean the whole array. But if we one-based our indexes and made both range 
 endpoints inclusive, then a[1..length] would still be the whole array like 
 it is now.

Pascal lets you exoplicitly define the range of the array, you can start indexing from -2 to 5 where a[-2] becomes the first element, but I wouldn't really like that to be in D.
 I think the current syntax does make sense: consider the situation where 
 you want to slice 5 elements starting from element x: the syntax makes 
 sense: array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

 That comes up often in my opinion ..

It's easy to come up with valid examples to justify either range syntax. If I want to slice a tag out of an xml string, I would expect to do something like this: char[] tag = xml[xml.pos('<')..xml.pos('>')]; Right now, I would need this instead: char[] tag = xml[xml.pos('<') .. xml.pos('>') + 1]; Whether or not the syntax is transparent is not related to one example being more frequent than the other. Transparency is about the meanings the symbols have, not how frequent we use the symbols.

totally counter intuitive, and in my example, it would be intuitive. Anyway, dealing with thigs like string.indexof('e') has always confused me, I'm never sure whether I wanted the position itself or the position after it, so I always have to make some tests and adjust the code by adding - 1 or + 1 to the position.. I think off-by-one errors are somewhat common.
 you don't really hardocde the slices,

No, but I can still have two absolute positions.
 you want a starting point and a length, so the syntax becomes: 
 array[start..start+length]

If array[b..c] is designed to mean array[start..start+length], then start is used redundantly. It should just be array[start..length]. Except of course, that is bad because ".." implies a range; b..c intuitively means from b to c, including b and c. If it's supposed to be a start and a length, the slice operator should use "..", it should be something that does not imply a range. Then your examples would read like array[x#5] or array[x,5] and be more compact than array[x..x+5].

I guess you're right on this one, array[start->length] may be more appropriate.
 a[1..3] refers to two elents because it's actually a[1..1+2], or because 
 3-1 = 2.
 when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" 
 and conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, 
 so that must be 3 elements"
 This is especially true when the numbers are largeer and the space 
 between them is big, like a[34..67], there is no way that I'm gonna count 
 all the way from 34 to 67! I would just subtract 67-34, and that gives me 
 the number of elements, it wouldn't be very intuitive if I had to add 1 
 to that: 67-34+1

I disagree with that. If I was loading numbered boxes and on day 2 I loaded boxes numbered 34 through 67, how many boxes did I load? Not 67 - 34. Ranges naturally include their endpoints and the count of integers in a range is not high - low. I don't count to know that 10 through 19 is ten integers, nor would I count for 5 through 8. For 5..8 I would say 5..8 has the same number of elements as 1..4: 4 elements. So I do subtract, but not high - low. I subtract high - (low - 1) because that's more intuitive than the equivalent high - low + 1. So I loaded 67 - 33 = 34 boxes on day 2 because 34..67 has the same count as 1..34.

Again it depends on how you think about it. I personally was never comfortable with the a[0] being the first element, specially when dealing when strings: you always think about the nth letter in the work, yet you refer to it with string[n-1], I can never get that right in my head, and so I always have to make some extra effort when thinking about dealing with this kind of problems. So that's why I don't see a big deal in a[b..c] returning a range not including c. As for the a[start..start+length] thing, it just happened to come up yesterday in a small project I was working on.
May 02 2005
parent reply William Kilian <will tk2.com> writes:
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)
 I think off-by-one errors are somewhat common.

I agree... which is why I think the language should always be as intuitively obvious as possible without sacrificing other design goals.
 So that's why I don't see a big deal in a[b..c] returning a range not 
 including c.

It's not a big deal. I'm just not one to accept "because that's the way it is" when I don't understand why it's the way it is.
 As for the a[start..start+length] thing, it just happened to come up 
 yesterday in a small project I was working on. 
 
 

Cool. I appreciate the perspective. Will
May 02 2005
parent reply "Bob W" <nospam aol.com> writes:
"William Kilian" <will tk2.com> wrote in message 
news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way, therefore D is here to stay. ;-)
 So that's why I don't see a big deal in a[b..c] returning a range not 
 including c.

It's not a big deal. I'm just not one to accept "because that's the way it is" when I don't understand why it's the way it is.

Please refer to my post at the end of this thread. You'll understand and maybe even accept that D slicing is not "the way it is" - it is "the way it works".
May 02 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message 
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way, therefore D is here to stay. ;-)

Well I'd reword this as "The thing is - Walter doesn't think this way, therefore D is not going to change." ;-) -- Derek Parnell Melbourne, Australia 2/05/2005 9:35:49 PM
May 02 2005
parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way, therefore D is here to stay. ;-)

Well I'd reword this as "The thing is - Walter doesn't think this way, therefore D is not going to change." ;-)

If he starts changing basic things now, like indexing and/or slicing, this forum would be a very lonely place sooner than later. But we could read heaps of D-obituaries in the archives.
May 02 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 2 May 2005 18:12:35 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way, therefore D is here to stay. ;-)

Well I'd reword this as "The thing is - Walter doesn't think this way, therefore D is not going to change." ;-)

If he starts changing basic things now, like indexing and/or slicing, this forum would be a very lonely place sooner than later. But we could read heaps of D-obituaries in the archives.

Firstly, I did not just say I wanted any changes to D's indexing methodology. And yes, I agree that many things in D are now set in stone. But my point was that even if the vast majority of people wanted this change and were willing to put up with the grief that it would generate, it would not be changing because Walter is in control and he is not going to change his stance. -- Derek Parnell Melbourne, Australia 3/05/2005 6:57:23 AM
May 02 2005
parent "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:7elg863g6qgq.fkue2yc1jk8t.dlg 40tude.net...
 On Mon, 2 May 2005 18:12:35 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way, therefore D is here to stay. ;-)

Well I'd reword this as "The thing is - Walter doesn't think this way, therefore D is not going to change." ;-)

If he starts changing basic things now, like indexing and/or slicing, this forum would be a very lonely place sooner than later. But we could read heaps of D-obituaries in the archives.

Firstly, I did not just say I wanted any changes to D's indexing methodology. And yes, I agree that many things in D are now set in stone. But my point was that even if the vast majority of people wanted this change and were willing to put up with the grief that it would generate, it would not be changing because Walter is in control and he is not going to change his stance.

Don't forget: - D is HIS baby. - He gives it to you and me at a good price. - He is an experienced guy who wants to prevent D sharing the same fate as several other 'revolutionary' PLs which nowbody cares about. - Walter seems to listen. I remember to have seen improvements in new D releases which were apparently the result of a single newsgroup thread. - I personally doubt that the 'democratic' approach for developing D is the way to go (but, of course, I might be wrong). - Walter has integrated a zillion of improvements over comparable PLs, so even if I don't agree with some features, I'll be prepared to gladly accept their existence. Furthermore: - I guess if you really wanted him to change his stance he would finally give in. It just dpends on the number of digits on your donations check. (But again, I might be wrong.) ;-)
May 02 2005
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 01 May 2005 16:16:04 -0500, William Kilian wrote:

 The array slice syntax in D is unintuitive. a[1..3] should refer to 
 three elements, not two.

Yes, it is unintuitive. Nearly all new comers remark on this. I think its that way because it can allow the compiler to generate more efficient machine code. But you had better get used to it, because it isn't going to change anytime soon. Thus "unintuitive" == "must be learned" ;-) -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build/ v2.03 released 20/Apr/2005 http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage 2/05/2005 9:07:50 AM
May 01 2005
next sibling parent William Kilian <will tk2.com> writes:
Derek Parnell wrote:
 On Sun, 01 May 2005 16:16:04 -0500, William Kilian wrote:
 
 
The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

Yes, it is unintuitive. Nearly all new comers remark on this. I think its that way because it can allow the compiler to generate more efficient machine code. But you had better get used to it, because it isn't going to change anytime soon. Thus "unintuitive" == "must be learned" ;-)

Okeedokee
May 01 2005
prev sibling parent reply William Kilian <will tk2.com> writes:
Derek Parnell wrote:
 I think its
 that way because it can allow the compiler to generate more efficient
 machine code.

I have trouble believing that. The compiler should generate exactly the same machine code either way.
May 01 2005
parent reply "Andrew Fedoniouk" <news terrainformatica.com> writes:
D uses half-open ranges.

It is well recognized fact in informatics that ranges in the form [a,b) is 
the most convenient in various computations
and algorithms.

Please read this:
http://mathforum.org/library/drmath/view/52929.html

Andrew.



"William Kilian" <will tk2.com> wrote in message 
news:d53vfe$i2o$1 digitaldaemon.com...
 Derek Parnell wrote:
 I think its
 that way because it can allow the compiler to generate more efficient
 machine code.

I have trouble believing that. The compiler should generate exactly the same machine code either way.

May 01 2005
parent William Kilian <will tk2.com> writes:
Andrew Fedoniouk wrote:
 D uses half-open ranges.

I know -- that's what I'm complaining about! My whole point was that ranges are intuitively closed. I was avoiding using mathematical terminology because it's hard to prove mathematically something that is intuitive.
 It is well recognized fact in informatics that ranges in the form [a,b) is 
 the most convenient in various computations
 and algorithms.

Oh. Now that I did not know. My formal education is actually computer engineering. My programming knowledge is mostly from experience. I'm pretty weak on theory and algorithms. And most of my recent experience is web apps that never seem to require algorithms more sophisticated than if/then/else and foreach.
 Please read this:
 http://mathforum.org/library/drmath/view/52929.html

Basic stuff. Ironically, it said "One objection that some people have when they are learning this notation is that open sets are not necessary because every interval must end somewhere. In any finite set this is true ...". That seems to support my point since our arrays are always finite, your informatics point notwithstanding. Will
May 02 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
It's in keeping with the C-style iterations like
for (i = 0; i < 10; i++) {...}
which goes from 0 to 9. Plus it has the nice property that a[i .. j] ~ a[j 
.. k] == a[i .. k].
-Ben
ps - note the use of == and not "is" :-)

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. a[3..a.length]. This 
 saves us from typing length - 1 whenever we want a slice to extend to the 
 end of an array. However, adding a read-only last property to arrays such 
 that last := length - 1 would save us the repeated subtractions yet allow 
 array slice syntax to have the intuitive meaning. Perhaps end or tail 
 would be a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is confusing anyway 
 because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D is 
 wonderfully intuitive.

 Will 

May 01 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 1 May 2005 20:38:25 -0400, Ben Hinkle wrote:

 Plus it has the nice property that a[i .. j] ~ a[j .. k] == a[i .. k].
 -Ben
 ps - note the use of == and not "is" :-)

Huh? Maybe its because I'm not a C++ person that I don't see the significance of that, Ben. To me one would, of course, use '==' and not 'is' because how can the result of a concatenation ever be the same identity as the original array? Concatenations always produce a copy of the data. Maybe I'm thinking too much in D nowadays ;-) -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build/ v2.04 released 28/Apr/2005 http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage 2/May/2005 10:51:00 AM
May 01 2005
parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:fhauui8oid3w.1gb568l9mbbsl$.dlg 40tude.net...
 On Sun, 1 May 2005 20:38:25 -0400, Ben Hinkle wrote:

 Plus it has the nice property that a[i .. j] ~ a[j .. k] == a[i .. k].
 -Ben
 ps - note the use of == and not "is" :-)

Huh? Maybe its because I'm not a C++ person that I don't see the significance of that, Ben. To me one would, of course, use '==' and not 'is' because how can the result of a concatenation ever be the same identity as the original array? Concatenations always produce a copy of the data. Maybe I'm thinking too much in D nowadays ;-)

My point is that == is a natural syntax to test equality, not identity. As you say that's pretty obvious, I agree, but recent posts have debated using == for identity vs equality. I was trying to illustrate how nicely D expresses the relationship between indexing, concatentation and equality.
May 01 2005
prev sibling next sibling parent "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive

It's purely subjective. It 'feel's perfectly correct to me, for example.
. a[1..3] should refer to three elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. 
 a[3..a.length]. This saves us from typing length - 1 whenever we 
 want a slice to extend to the end of an array. However, adding a 
 read-only last property to arrays such that last := length - 1 
 would save us the repeated subtractions yet allow array slice 
 syntax to have the intuitive meaning. Perhaps end or tail would be 
 a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is 
 confusing anyway because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D 
 is wonderfully intuitive.

Well, again, these things must be highly dependent on one's particular experiences and fancies. Array slicing seems perfectly fine to me, but other parts of D seem quite strange and badly designed. Go figure!
May 01 2005
prev sibling next sibling parent reply "Bob W" <nospam aol.com> writes:
"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

  Everything else in D is wonderfully intuitive.

You need to be able to express a slice of zero length using two (unsigned) values. This is I think a very simple explanation. Compiler code generation or "syntax taste" have nothing to do with it. uint a,b; a=f(); // this may assign 0 to a b=g(); // g() could return 0 as well ... somearray[a..b] ... How would you possibly be able to generate valid code if the calculated length of your slice is zero using the more "intuitive" approach?
May 02 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message 
 news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

  Everything else in D is wonderfully intuitive.

You need to be able to express a slice of zero length using two (unsigned) values.

Why do we *need* to be constrained by uint?
This is I think a very simple
 explanation. Compiler code generation or "syntax taste"
 have nothing to do with it.
 

Code generation does have something to do with it. The languages that use a 0-base index are really using an offset rather than an index. In that its 'index' value is really how far off from the beginning of the array you are referring to. a[0] is a distance of zero elements from the start of 'a', thus it references the first element. Those languages that use a 1-based indexing are more closely using a true index. Code generation needs to calculate the address of an element, and that address if an offset on bytes from the start of the array. Thus 0-based indexing usually leads to more efficient code, as true indexing often needs to subtract one sizeof(element) from the calculated address.
 uint a,b;
 
 a=f();  // this may assign 0 to a
 b=g();  // g() could return 0 as well
 
 ... somearray[a..b] ...
 
 How would you possibly be able to generate valid code
 if the calculated length of your slice is zero using the
 more "intuitive" approach?

Have array indexing start at 1. Therefore a[1..0] is a zero-length array using only uints. There are languages that implement slices this way. -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build 2/05/2005 9:33:56 PM
May 02 2005
next sibling parent "Uwe Salomon" <post uwesalomon.de> writes:
 You need to be able to express a slice of zero length
 using two (unsigned) values.

Why do we *need* to be constrained by uint?

Because there was a huge discussion thread at Trolltech's about "Your containers use 'int' as the size and index type. Wouldn't it be better if it were 'unsigned int' because it would not be possible to write bugs with negative indizes???". The community is simply split about this issue, and it is about the slicing syntax as well. I remember i once wrote a container (pre-Qt :) where i thought (why not use unsigned indizes? They can't go beyond 0 anyways.) and later changed it into signed because there were some functions that really profited from negative indices, but both variants at the same time would have been worse. My current containers for D use unsigned again (actually they use size_t. Is it unsigned? Well, they don't care about the possibility.). Other languages use arrays which start at 1, and i have seen immense threads about this issue, too. What you can learn from all this, is that there is no solution, no best way -- just 2 possibilities. Remember the iterator concept of the STL, it uses open ranges as well. All i found is that the current implementation is very practical: in almost all algorithms i wrote yet in D, i work with lengths or search for the end of a range -- thus the current implementation was easier to use.
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

You know yourself that this is *not* the way to go!? D stands on the shoulders of C and C++, so what use is kicking downwards of? Ciao uwe
May 02 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 How would you possibly be able to generate valid code
 if the calculated length of your slice is zero using the
 more "intuitive" approach?

Have array indexing start at 1. Therefore a[1..0] is a zero-length array using only uints. There are languages that implement slices this way.

In particular Fortran and MATLAB use 1-based indexing and inclusive slices.
May 02 2005
parent Nick <Nick_member pathlink.com> writes:
In article <d556be$1jds$1 digitaldaemon.com>, Ben Hinkle says...
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

In particular Fortran and MATLAB use 1-based indexing and inclusive slices.

Fortran and MATLAB are made for mathematics and science, where arrays often represent vectors and double arrays represent matrices. Therefore it is natural to stick to the mathematical notation and count from one. D is a system language (like C and C++), where in many cases it makes more sense to think of arrays as memory buffers, and indices as offsets. As for the slicing issue, using [1..0] to represent an empty array doesn't seem very intuitive to me - looks more like the array of elements 0 and 1 in reverse order, or something. Nick
May 02 2005
prev sibling parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:11fe7yohtum20.si9yrukkmrgf.dlg 40tude.net...
 On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 You need to be able to express a slice of zero length
 using two (unsigned) values.

Why do we *need* to be constrained by uint?

It is perfectly natural using a unint for indexation. The hypothetical "intuitive" slicing just won't work for uints (it would in theory with ints).
 Code generation does have something to do with it. The languages that use 
 a
 0-base index are really using an offset rather than an index. In that its
 'index' value is really how far off from the beginning of the array you 
 are
 referring to.  a[0] is a distance of zero elements from the start of 'a',
 thus it references the first element. Those languages that use a 1-based
 indexing are more closely using a true index. Code generation needs to
 calculate the address of an element, and that address if an offset on 
 bytes
 from the start of the array. Thus 0-based indexing usually leads to more
 efficient code, as true indexing often needs to subtract one
 sizeof(element) from the calculated address.

Used to be in the past century. Nowadays there is virtually no penalty to use index plus offset for Pentiums. So you can base your arrays on whatever you like best and most of your programs won't experience any slowdowns at all. I have no idea if you have any practical assembly language experience, but addressing a 1-dimensional array element is usually performed by a single machine code instruction. Index and offset calculation is done by the CPU. Depending on the instruction pipeline you might get away with the minimum latency even for complex addressing methods.
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

Yeah, and throw away tons of programs, ported and to-be-ported C and Java code just because of a religious belief, which has no merits whatsoever.
May 02 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 2 May 2005 17:56:58 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:11fe7yohtum20.si9yrukkmrgf.dlg 40tude.net...
 On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 You need to be able to express a slice of zero length
 using two (unsigned) values.

Why do we *need* to be constrained by uint?

It is perfectly natural using a unint for indexation. The hypothetical "intuitive" slicing just won't work for uints (it would in theory with ints).

Well my answer to my own question is that 0-based indexing is a method of referencing elements based on an offset from the start of the array and there are no elements of the array before the first one, so a negative 'index' would never reference any of the array's elements. Thus a uint is a natural choice for 0-based indexing schemes.
 
 Code generation does have something to do with it. The languages that use 
 a
 0-base index are really using an offset rather than an index. In that its
 'index' value is really how far off from the beginning of the array you 
 are
 referring to.  a[0] is a distance of zero elements from the start of 'a',
 thus it references the first element. Those languages that use a 1-based
 indexing are more closely using a true index. Code generation needs to
 calculate the address of an element, and that address if an offset on 
 bytes
 from the start of the array. Thus 0-based indexing usually leads to more
 efficient code, as true indexing often needs to subtract one
 sizeof(element) from the calculated address.

Used to be in the past century. Nowadays there is virtually no penalty to use index plus offset for Pentiums. So you can base your arrays on whatever you like best and most of your programs won't experience any slowdowns at all.

Thank you. Its been a *long* time since I've done any assembler work and I haven't kept up with Intel advances.
 
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

Yeah, and throw away tons of programs, ported and to-be-ported C and Java code just because of a religious belief, which has no merits whatsoever.

"has no merits whatsoever" - which is also another 'religious belief'. The original poster was just saying that half-open slicing semantics is not intuitive to the average person, regardless of how practical it is for a programming language. Nearly everyone I know starts counting off items starting with one; no-one I know starts counting ... zero, one, two, three, ... And because it is unintuitive, new comers to the concept need to learn it, because it is not going to change. -- Derek Parnell Melbourne, Australia 3/05/2005 7:01:06 AM
May 02 2005
parent reply "Bob W" <nospam aol.com> writes:
 ......... no-one I know starts counting ... zero, one, two, three,

NASA does - they just don't get the direction right. More seriously: I have used both indexing methods a lot (Pascal and D's predecessors) and even after many years I could not tell which one is better for me. It varies from one application to the other. But I guess it is just mainly a matter of getting used to either one.
May 02 2005
parent Derek Parnell <derek psych.ward> writes:
On Tue, 3 May 2005 00:30:58 +0200, Bob W wrote:

 ......... no-one I know starts counting ... zero, one, two, three,

NASA does - they just don't get the direction right.

Well... actually they are announcing the number of seconds remaining, but I get your point ;-)
 More seriously: I have used both indexing methods a lot
 (Pascal and D's predecessors) and even after many years
 I could not tell which one is better for me. It varies from
 one application to the other. But I guess it is just mainly
 a matter of getting used to either one.

Yes. It is a learned behaviour. -- Derek Melbourne, Australia 3/05/2005 9:23:08 AM
May 02 2005
prev sibling parent Sean Kelly <sean f4.ca> writes:
In article <d53gug$7pf$2 digitaldaemon.com>, William Kilian says...
The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

The reasons for the current syntax are clear. It allows the length 
property to be used as the second slice bound, e.g. a[3..a.length]. This 
saves us from typing length - 1 whenever we want a slice to extend to 
the end of an array. However, adding a read-only last property to arrays 
such that last := length - 1 would save us the repeated subtractions yet 
allow array slice syntax to have the intuitive meaning. Perhaps end or 
tail would be a better name than last. Regardless, the meaning of 
a[3..a.last] or a[3..a.end] is transparent. Besides, a[3..a.length] is 
confusing anyway because a[a.length] is always invalid.

I like the current syntax, but then I've gotten quite used to this behavior with C++ containers. I suppose one could argue for the classic inclusion/exclusion syntax: a[0..5); // from 0 to 4 a[0..5]; // from 0 to 5 a(0..5]; // from 1 to 5 a(0..5); // from 1 to 4 But this would add undesirable complexity to the parser :) Sean
May 02 2005