digitalmars.D - array slice syntax

William Kilian (14/14) May 01 2005 The array slice syntax in D is unintuitive. a[1..3] should refer to

Hasan Aljudy (23/37) May 01 2005 Array indexing in counter-intiuitive anyway .. (in the C family of

William Kilian (57/80) May 01 2005 I always thought int a[5] being indexed from 0 to 4 was perfectly

Hasan Aljudy (28/87) May 02 2005 It depends on how you think about it ..

William Kilian (7/15) May 02 2005 I agree... which is why I think the language should always be as

Bob W (7/13) May 02 2005 The good thing is - Walter didn't think your way,

Derek Parnell (7/16) May 02 2005 Well I'd reword this as "The thing is - Walter doesn't think this way,

Bob W (6/18) May 02 2005 If he starts changing basic things now, like indexing and/or

Derek Parnell (12/33) May 02 2005 Firstly, I did not just say I wanted any changes to D's indexing

Bob W (23/52) May 02 2005 Don't forget:

Derek Parnell (12/14) May 01 2005 Yes, it is unintuitive. Nearly all new comers remark on this. I think it...

William Kilian (2/16) May 01 2005 Okeedokee
William Kilian (3/6) May 01 2005 I have trouble believing that. The compiler should generate exactly the

Andrew Fedoniouk (9/15) May 01 2005 D uses half-open ranges.

William Kilian (16/22) May 02 2005 I know -- that's what I'm complaining about! My whole point was that

Ben Hinkle (8/22) May 01 2005 It's in keeping with the C-style iterations like

Derek Parnell (12/15) May 01 2005 Huh? Maybe its because I'm not a C++ person that I don't see the

Ben Hinkle (6/16) May 01 2005 My point is that == is a natural syntax to test equality, not identity. ...

Matthew (8/22) May 01 2005 It's purely subjective. It 'feel's perfectly correct to me, for
Bob W (14/17) May 02 2005 ...........

Derek Parnell (19/43) May 02 2005 Code generation does have something to do with it. The languages that us...

Uwe Salomon (22/27) May 02 2005 Because there was a huge discussion thread at Trolltech's about "Your
Ben Hinkle (1/6) May 02 2005 In particular Fortran and MATLAB use 1-based indexing and inclusive slic...

Nick (10/14) May 02 2005 Fortran and MATLAB are made for mathematics and science, where arrays of...

Bob W (18/37) May 02 2005 It is perfectly natural using a unint for indexation.

Derek Parnell (19/57) May 02 2005 Well my answer to my own question is that 0-based indexing is a method o...

Bob W (6/7) May 02 2005 NASA does - they just don't get the direction right.

Derek Parnell (8/16) May 02 2005 Well... actually they are announcing the number of seconds remaining, bu...

Sean Kelly (10/21) May 02 2005 I like the current syntax, but then I've gotten quite used to this behav...

William Kilian <will tk2.com> writes:

The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

The reasons for the current syntax are clear. It allows the length 
property to be used as the second slice bound, e.g. a[3..a.length]. This 
saves us from typing length - 1 whenever we want a slice to extend to 
the end of an array. However, adding a read-only last property to arrays 
such that last := length - 1 would save us the repeated subtractions yet 
allow array slice syntax to have the intuitive meaning. Perhaps end or 
tail would be a better name than last. Regardless, the meaning of 
a[3..a.last] or a[3..a.end] is transparent. Besides, a[3..a.length] is 
confusing anyway because a[a.length] is always invalid.

I'm honestly surprised by the current design. Everything else in D is 
wonderfully intuitive.

Will

May 01 2005

"Hasan Aljudy" <hasan.aljudy gmail.com> writes:

Array indexing in counter-intiuitive anyway .. (in the C family of 
languages)
you declare an array int a[5] .. yet a[5] is not an element in the array
a[1] is not the first element, it's the second!
It's always confusing at the beginning ..

I think the current syntax does make sense: consider the situation where you 
want to slice 5 elements starting from element x: the syntax makes sense: 
array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

That comes up often in my opinion .. you don't really hardocde the slices, 
you want a starting point and a length, so the syntax becomes: 
array[start..start+length]

a[1..3] refers to two elents because it's actually a[1..1+2], or because 3-1 
= 2.
when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" and 
conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, so that 
must be 3 elements"
This is especially true when the numbers are largeer and the space between 
them is big, like a[34..67], there is no way that I'm gonna count all the 
way from 34 to 67! I would just subtract 67-34, and that gives me the number 
of elements, it wouldn't be very intuitive if I had to add 1 to that: 
67-34+1

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. a[3..a.length]. This 
 saves us from typing length - 1 whenever we want a slice to extend to the 
 end of an array. However, adding a read-only last property to arrays such 
 that last := length - 1 would save us the repeated subtractions yet allow 
 array slice syntax to have the intuitive meaning. Perhaps end or tail 
 would be a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is confusing anyway 
 because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D is 
 wonderfully intuitive.

 Will

May 01 2005

William Kilian <will tk2.com> writes:

 Array indexing in counter-intiuitive anyway .. (in the C family of 
 languages)
 you declare an array int a[5] .. yet a[5] is not an element in the array
 a[1] is not the first element, it's the second!
 It's always confusing at the beginning ..

I always thought int a[5] being indexed from 0 to 4 was perfectly 
intuitive. The declarations are one-based, which is intuitive. The 
indexes are binary and go from all 0s to all 1s, just like all integers 
in a computer, which to me was intuitive from the beginning. We could 
make the indexes start at one, but my caveat with the slice operator 
would be the same. I'd want to use a[1..5] instead of a[1..6] to mean 
the whole array. But if we one-based our indexes and made both range 
endpoints inclusive, then a[1..length] would still be the whole array 
like it is now.

 I think the current syntax does make sense: consider the situation where you 
 want to slice 5 elements starting from element x: the syntax makes sense: 
 array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

 That comes up often in my opinion ..

It's easy to come up with valid examples to justify either range syntax. 
If I want to slice a tag out of an xml string, I would expect to do 
something like this:

char[] tag = xml[xml.pos('<')..xml.pos('>')];

Right now, I would need this instead:

char[] tag = xml[xml.pos('<') .. xml.pos('>') + 1];

Whether or not the syntax is transparent is not related to one example 
being more frequent than the other. Transparency is about the meanings 
the symbols have, not how frequent we use the symbols.

 you don't really hardocde the slices, 

No, but I can still have two absolute positions.

 you want a starting point and a length, so the syntax becomes: 
 array[start..start+length]

If array[b..c] is designed to mean array[start..start+length], then 
start is used redundantly. It should just be array[start..length]. 
Except of course, that is bad because ".." implies a range; b..c 
intuitively means from b to c, including b and c. If it's supposed to be 
a start and a length, the slice operator should use "..", it should be 
something that does not imply a range. Then your examples would read 


 a[1..3] refers to two elents because it's actually a[1..1+2], or because 3-1 
 = 2.
 when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" and 
 conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, so that 
 must be 3 elements"
 This is especially true when the numbers are largeer and the space between 
 them is big, like a[34..67], there is no way that I'm gonna count all the 
 way from 34 to 67! I would just subtract 67-34, and that gives me the number 
 of elements, it wouldn't be very intuitive if I had to add 1 to that: 
 67-34+1
 

I disagree with that. If I was loading numbered boxes and on day 2 I 
loaded boxes numbered 34 through 67, how many boxes did I load? Not 67 - 
34. Ranges naturally include their endpoints and the count of integers 
in a range is not high - low. I don't count to know that 10 through 19 
is ten integers, nor would I count for 5 through 8. For 5..8 I would say 
5..8 has the same number of elements as 1..4: 4 elements. So I do 
subtract, but not high - low. I subtract high - (low - 1) because that's 
  more intuitive than the equivalent high - low + 1. So I loaded 67 - 33 
=  34 boxes on day 2 because 34..67 has the same count as 1..34.

Other languages assign the meaning I expect to range operators. In perl, 
1..10 means a list with ten elements. In fact, perl slices work how I 
suggest they should work in D. Pascal uses 1..10 in array declarations 
to mean ten elements. PHP doesn't have a range operator, but it has a 
function range(low, high) that returns an array of numbers from low to 
high inclusive. Regular expressions use character ranges and a-z 
includes the z. Ranges intuitively include their endpoints.

My point is that array[b..c] intuitively means array[start..end], not 
array[start..end+1] and not array[start..start+length]. b..c is a range 
and ranges intuitively include their endpoints. I agree start+length is 
a common use for slices, but that does not justify compromising the 
transparency of the range operator by making it mean something different 
than what a range intuitively means. Instead, slices should have an 
alternative syntax for start+length that doesn't use an operator that 
implies a range.

Anyway, I can remember that array[b..c] means 
array[start..start+length], so I greatly appreciate your message because 
you pointed that out. But start..start+length is a mnemonic. If the 
intuitive meaning was used, I wouldn't need a memory trick like that. Of 
course, now that I've written all this, I'd remember it without a 
mnemonic anyway ;)

Will

May 01 2005

"Hasan Aljudy" <hasan.aljudy gmail.com> writes:

 Array indexing in counter-intiuitive anyway .. (in the C family of
 languages)
 you declare an array int a[5] .. yet a[5] is not an element in the array
 a[1] is not the first element, it's the second!
 It's always confusing at the beginning ..

 I always thought int a[5] being indexed from 0 to 4 was perfectly 
 intuitive. The declarations are one-based, which is intuitive. The indexes 
 are binary and go from all 0s to all 1s, just like all integers in a 
 computer, which to me was intuitive from the beginning.

It depends on how you think about it ..
For me, a[5] means a set with 5 elements (not the mathematical meaning of a 
set, but just a set of objects)
Intuitivly, the first element is the element number one (hence 1st), so a[1] 
would make sense, however it turns out a[x] is just *(a+x) so a[1] becomes 
the second element.

 We could make the indexes start at one, but my caveat with the slice 
 operator would be the same. I'd want to use a[1..5] instead of a[1..6] to 
 mean the whole array. But if we one-based our indexes and made both range 
 endpoints inclusive, then a[1..length] would still be the whole array like 
 it is now.

Pascal lets you exoplicitly define the range of the array, you can start 
indexing from -2 to 5 where a[-2] becomes the first element, but I wouldn't 
really like that to be in D.

 I think the current syntax does make sense: consider the situation where 
 you want to slice 5 elements starting from element x: the syntax makes 
 sense: array[x .. x+5], as opposed to your suggestion: array[x..x+5-1]

 That comes up often in my opinion ..

 It's easy to come up with valid examples to justify either range syntax. 
 If I want to slice a tag out of an xml string, I would expect to do 
 something like this:

 char[] tag = xml[xml.pos('<')..xml.pos('>')];

 Right now, I would need this instead:

 char[] tag = xml[xml.pos('<') .. xml.pos('>') + 1];

 Whether or not the syntax is transparent is not related to one example 
 being more frequent than the other. Transparency is about the meanings the 
 symbols have, not how frequent we use the symbols.

yeah well, it depends on the context really. My point was that it's not 
totally counter intuitive, and in my example, it would be intuitive.
Anyway, dealing with thigs like string.indexof('e') has always confused me, 
I'm never sure whether I wanted the position itself or the position after 
it, so I always have to make some tests and adjust the code by adding  - 1 
or + 1 to the position..
I think off-by-one errors are somewhat common.

 you don't really hardocde the slices,

 No, but I can still have two absolute positions.

 you want a starting point and a length, so the syntax becomes: 
 array[start..start+length]

 If array[b..c] is designed to mean array[start..start+length], then start 
 is used redundantly. It should just be array[start..length]. Except of 
 course, that is bad because ".." implies a range; b..c intuitively means 
 from b to c, including b and c. If it's supposed to be a start and a 
 length, the slice operator should use "..", it should be something that 

 array[x,5] and be more compact than array[x..x+5].

I guess you're right on this one, array[start->length] may be more 
appropriate.

 a[1..3] refers to two elents because it's actually a[1..1+2], or because 
 3-1 = 2.
 when I see something like a[5..8] I wouldn't count in my head "5,6,7,8" 
 and conclude 4 elements .. I would look at it and say "hmm .. 8 - 5 = 3, 
 so that must be 3 elements"
 This is especially true when the numbers are largeer and the space 
 between them is big, like a[34..67], there is no way that I'm gonna count 
 all the way from 34 to 67! I would just subtract 67-34, and that gives me 
 the number of elements, it wouldn't be very intuitive if I had to add 1 
 to that: 67-34+1

 I disagree with that. If I was loading numbered boxes and on day 2 I 
 loaded boxes numbered 34 through 67, how many boxes did I load? Not 67 - 
 34. Ranges naturally include their endpoints and the count of integers in 
 a range is not high - low. I don't count to know that 10 through 19 is ten 
 integers, nor would I count for 5 through 8. For 5..8 I would say 5..8 has 
 the same number of elements as 1..4: 4 elements. So I do subtract, but not 
 high - low. I subtract high - (low - 1) because that's more intuitive than 
 the equivalent high - low + 1. So I loaded 67 - 33 =  34 boxes on day 2 
 because 34..67 has the same count as 1..34.

Again it depends on how you think about it.
I personally was never comfortable with the a[0] being the first element, 
specially when dealing when strings: you always think about the nth letter 
in the work, yet you refer to it with string[n-1], I can never get that 
right in my head, and so I always have to make some extra effort when 
thinking about dealing with this kind of problems.
So that's why I don't see a big deal in a[b..c] returning a range not 
including c.

As for the a[start..start+length] thing, it just happened to come up 
yesterday in a small project I was working on.

May 02 2005

William Kilian <will tk2.com> writes:

 It depends on how you think about it ..

Yes, but everybody should think about it the same way I do! ;)

 I think off-by-one errors are somewhat common.

I agree... which is why I think the language should always be as 
intuitively obvious as possible without sacrificing other design goals.

 So that's why I don't see a big deal in a[b..c] returning a range not 
 including c.

It's not a big deal. I'm just not one to accept "because that's the way 
it is" when I don't understand why it's the way it is.

 As for the a[start..start+length] thing, it just happened to come up 
 yesterday in a small project I was working on. 
 
 

Cool. I appreciate the perspective.

Will

May 02 2005

"Bob W" <nospam aol.com> writes:

"William Kilian" <will tk2.com> wrote in message 
news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

 Yes, but everybody should think about it the same way I do! ;)

The good thing is - Walter didn't think your way,
therefore D is here to stay.   ;-)



 So that's why I don't see a big deal in a[b..c] returning a range not 
 including c.

 It's not a big deal. I'm just not one to accept "because that's the way it 
 is" when I don't understand why it's the way it is.

Please refer to my post at the end of this thread.
You'll understand and maybe even accept that D slicing
is not "the way it is" - it is "the way it works".

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message 
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

 Yes, but everybody should think about it the same way I do! ;)

 
 The good thing is - Walter didn't think your way,
 therefore D is here to stay.   ;-)
 

Well I'd reword this as "The thing is - Walter doesn't think this way,
therefore D is not going to change." ;-)


-- 
Derek Parnell
Melbourne, Australia
2/05/2005 9:35:49 PM

May 02 2005

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

 Yes, but everybody should think about it the same way I do! ;)

 The good thing is - Walter didn't think your way,
 therefore D is here to stay.   ;-)

 Well I'd reword this as "The thing is - Walter doesn't think this way,
 therefore D is not going to change." ;-)

If he starts changing basic things now, like indexing and/or
slicing, this forum would be a very lonely place sooner
than later. But we could read heaps of D-obituaries in
the archives.

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 2 May 2005 18:12:35 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

 Yes, but everybody should think about it the same way I do! ;)

 The good thing is - Walter didn't think your way,
 therefore D is here to stay.   ;-)

 Well I'd reword this as "The thing is - Walter doesn't think this way,
 therefore D is not going to change." ;-)

 
 If he starts changing basic things now, like indexing and/or
 slicing, this forum would be a very lonely place sooner
 than later. But we could read heaps of D-obituaries in
 the archives.

Firstly, I did not just say I wanted any changes to D's indexing
methodology.

And yes, I agree that many things in D are now set in stone. But my point
was that even if the vast majority of people wanted this change and were
willing to put up with the grief that it would generate, it would not be
changing because Walter is in control and he is not going to change his
stance.

-- 
Derek Parnell
Melbourne, Australia
3/05/2005 6:57:23 AM

May 02 2005

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:7elg863g6qgq.fkue2yc1jk8t.dlg 40tude.net...
 On Mon, 2 May 2005 18:12:35 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:1w1eg0etuekis.fldighu3fj4w$.dlg 40tude.net...
 On Mon, 2 May 2005 13:04:18 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message
 news:d54q5r$1900$1 digitaldaemon.com...
 It depends on how you think about it ..

 Yes, but everybody should think about it the same way I do! ;)

 The good thing is - Walter didn't think your way,
 therefore D is here to stay.   ;-)

 Well I'd reword this as "The thing is - Walter doesn't think this way,
 therefore D is not going to change." ;-)

 If he starts changing basic things now, like indexing and/or
 slicing, this forum would be a very lonely place sooner
 than later. But we could read heaps of D-obituaries in
 the archives.

 Firstly, I did not just say I wanted any changes to D's indexing
 methodology.

 And yes, I agree that many things in D are now set in stone. But my point
 was that even if the vast majority of people wanted this change and were
 willing to put up with the grief that it would generate, it would not be
 changing because Walter is in control and he is not going to change his
 stance.


Don't forget:

- D is HIS baby.

- He gives it to you and me at a good price.

- He is an experienced guy who wants to prevent
  D sharing the same fate as several other
  'revolutionary' PLs which nowbody cares about.

- Walter seems to listen. I remember to have seen
  improvements in new D releases which were
  apparently the result of a single newsgroup thread.

- I personally doubt that the 'democratic' approach
  for developing D is the way to go (but, of course,
  I might be wrong).

- Walter has integrated a zillion of improvements
  over comparable PLs, so even if I don't agree
  with some features, I'll be prepared to gladly
  accept their existence.


Furthermore:

- I guess if you really wanted him to change his
  stance he would finally give in. It just dpends on
  the number of digits on your donations check.
  (But again, I might be wrong.)       ;-)

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Sun, 01 May 2005 16:16:04 -0500, William Kilian wrote:

 The array slice syntax in D is unintuitive. a[1..3] should refer to 
 three elements, not two.

Yes, it is unintuitive. Nearly all new comers remark on this. I think its
that way because it can allow the compiler to generate more efficient
machine code. But you had better get used to it, because it isn't going to
change anytime soon.

Thus "unintuitive" == "must be learned" ;-)

-- 
Derek Parnell
Melbourne, Australia
http://www.dsource.org/projects/build/ v2.03 released 20/Apr/2005
http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage
2/05/2005 9:07:50 AM

May 01 2005

William Kilian <will tk2.com> writes:

Derek Parnell wrote:
 On Sun, 01 May 2005 16:16:04 -0500, William Kilian wrote:
 
 
The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

 
 
 Yes, it is unintuitive. Nearly all new comers remark on this. I think its
 that way because it can allow the compiler to generate more efficient
 machine code. But you had better get used to it, because it isn't going to
 change anytime soon.
 
 Thus "unintuitive" == "must be learned" ;-)
 

Okeedokee

May 01 2005

William Kilian <will tk2.com> writes:

Derek Parnell wrote:
 I think its
 that way because it can allow the compiler to generate more efficient
 machine code.

I have trouble believing that. The compiler should generate exactly the 
same machine code either way.

May 01 2005

"Andrew Fedoniouk" <news terrainformatica.com> writes:

D uses half-open ranges.

It is well recognized fact in informatics that ranges in the form [a,b) is 
the most convenient in various computations
and algorithms.

Please read this:
http://mathforum.org/library/drmath/view/52929.html

Andrew.



"William Kilian" <will tk2.com> wrote in message 
news:d53vfe$i2o$1 digitaldaemon.com...
 Derek Parnell wrote:
 I think its
 that way because it can allow the compiler to generate more efficient
 machine code.

 I have trouble believing that. The compiler should generate exactly the 
 same machine code either way.

May 01 2005

William Kilian <will tk2.com> writes:

Andrew Fedoniouk wrote:
 D uses half-open ranges.

I know -- that's what I'm complaining about! My whole point was that 
ranges are intuitively closed. I was avoiding using mathematical 
terminology because it's hard to prove mathematically something that is 
intuitive.

 It is well recognized fact in informatics that ranges in the form [a,b) is 
 the most convenient in various computations
 and algorithms.

Oh. Now that I did not know. My formal education is actually computer 
engineering. My programming knowledge is mostly from experience. I'm 
pretty weak on theory and algorithms. And most of my recent experience 
is web apps that never seem to require algorithms more sophisticated 
than if/then/else and foreach.

 Please read this:
 http://mathforum.org/library/drmath/view/52929.html

Basic stuff. Ironically, it said "One objection that some people have 
when they are learning this notation is that open sets are not necessary 
because every interval must end somewhere.  In any finite set this is 
true ...". That seems to support my point since our arrays are always 
finite, your informatics point notwithstanding.

Will

May 02 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

It's in keeping with the C-style iterations like
for (i = 0; i < 10; i++) {...}
which goes from 0 to 9. Plus it has the nice property that a[i .. j] ~ a[j 
.. k] == a[i .. k].
-Ben
ps - note the use of == and not "is" :-)

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. a[3..a.length]. This 
 saves us from typing length - 1 whenever we want a slice to extend to the 
 end of an array. However, adding a read-only last property to arrays such 
 that last := length - 1 would save us the repeated subtractions yet allow 
 array slice syntax to have the intuitive meaning. Perhaps end or tail 
 would be a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is confusing anyway 
 because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D is 
 wonderfully intuitive.

 Will

May 01 2005

Derek Parnell <derek psych.ward> writes:

On Sun, 1 May 2005 20:38:25 -0400, Ben Hinkle wrote:

 Plus it has the nice property that a[i .. j] ~ a[j .. k] == a[i .. k].
 -Ben
 ps - note the use of == and not "is" :-)

Huh? Maybe its because I'm not a C++ person that I don't see the
significance of that, Ben. To me one would, of course, use '==' and not
'is' because how can the result of a concatenation ever be the same
identity as the original array? Concatenations always produce a copy of the
data. Maybe I'm thinking too much in D nowadays ;-)

-- 
Derek Parnell
Melbourne, Australia
http://www.dsource.org/projects/build/ v2.04 released 28/Apr/2005
http://www.prowiki.org/wiki4d/wiki.cgi?FrontPage
2/May/2005 10:51:00 AM

May 01 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:fhauui8oid3w.1gb568l9mbbsl$.dlg 40tude.net...
 On Sun, 1 May 2005 20:38:25 -0400, Ben Hinkle wrote:

 Plus it has the nice property that a[i .. j] ~ a[j .. k] == a[i .. k].
 -Ben
 ps - note the use of == and not "is" :-)

 Huh? Maybe its because I'm not a C++ person that I don't see the
 significance of that, Ben. To me one would, of course, use '==' and not
 'is' because how can the result of a concatenation ever be the same
 identity as the original array? Concatenations always produce a copy of 
 the
 data. Maybe I'm thinking too much in D nowadays ;-)

My point is that == is a natural syntax to test equality, not identity. As 
you say that's pretty obvious, I agree, but recent posts have debated using 
== for identity vs equality. I was trying to illustrate how nicely D 
expresses the relationship between indexing, concatentation and equality.

May 01 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive

It's purely subjective. It 'feel's perfectly correct to me, for 
example.

. a[1..3] should refer to three elements, not two.

 The reasons for the current syntax are clear. It allows the length 
 property to be used as the second slice bound, e.g. 
 a[3..a.length]. This saves us from typing length - 1 whenever we 
 want a slice to extend to the end of an array. However, adding a 
 read-only last property to arrays such that last := length - 1 
 would save us the repeated subtractions yet allow array slice 
 syntax to have the intuitive meaning. Perhaps end or tail would be 
 a better name than last. Regardless, the meaning of a[3..a.last] 
 or a[3..a.end] is transparent. Besides, a[3..a.length] is 
 confusing anyway because a[a.length] is always invalid.

 I'm honestly surprised by the current design. Everything else in D 
 is wonderfully intuitive.

Well, again, these things must be highly dependent on one's 
particular experiences and fancies. Array slicing seems perfectly 
fine to me, but other parts of D seem quite strange and badly 
designed. Go figure!

May 01 2005

"Bob W" <nospam aol.com> writes:

"William Kilian" <will tk2.com> wrote in message 
news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

...........
  Everything else in D is wonderfully intuitive.


You need to be able to express a slice of zero length
using two (unsigned) values. This is I think a very simple
explanation. Compiler code generation or "syntax taste"
have nothing to do with it.


uint a,b;

a=f();  // this may assign 0 to a
b=g();  // g() could return 0 as well

... somearray[a..b] ...

How would you possibly be able to generate valid code
if the calculated length of your slice is zero using the
more "intuitive" approach?

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 "William Kilian" <will tk2.com> wrote in message 
 news:d53gug$7pf$2 digitaldaemon.com...
 The array slice syntax in D is unintuitive. a[1..3] should refer to three 
 elements, not two.

 ...........
  Everything else in D is wonderfully intuitive.

 
 
 You need to be able to express a slice of zero length
 using two (unsigned) values. 

Why do we *need* to be constrained by uint?

This is I think a very simple
 explanation. Compiler code generation or "syntax taste"
 have nothing to do with it.
 

Code generation does have something to do with it. The languages that use a
0-base index are really using an offset rather than an index. In that its
'index' value is really how far off from the beginning of the array you are
referring to.  a[0] is a distance of zero elements from the start of 'a',
thus it references the first element. Those languages that use a 1-based
indexing are more closely using a true index. Code generation needs to
calculate the address of an element, and that address if an offset on bytes
from the start of the array. Thus 0-based indexing usually leads to more
efficient code, as true indexing often needs to subtract one
sizeof(element) from the calculated address.

 uint a,b;
 
 a=f();  // this may assign 0 to a
 b=g();  // g() could return 0 as well
 
 ... somearray[a..b] ...
 
 How would you possibly be able to generate valid code
 if the calculated length of your slice is zero using the
 more "intuitive" approach?

Have array indexing start at 1. Therefore a[1..0] is a zero-length array
using only uints. There are languages that implement slices this way. 

-- 
Derek Parnell
Melbourne, Australia
http://www.dsource.org/projects/build
2/05/2005 9:33:56 PM

May 02 2005

"Uwe Salomon" <post uwesalomon.de> writes:

 You need to be able to express a slice of zero length
 using two (unsigned) values.

 Why do we *need* to be constrained by uint?

Because there was a huge discussion thread at Trolltech's about "Your  
containers use 'int' as the size and index type. Wouldn't it be better if  
it were 'unsigned int' because it would not be possible to write bugs with  
negative indizes???". The community is simply split about this issue, and  
it is about the slicing syntax as well. I remember i once wrote a  
container (pre-Qt :) where i thought (why not use unsigned indizes? They  
can't go beyond 0 anyways.) and later changed it into signed because there  
were some functions that really profited from negative indices, but both  
variants at the same time would have been worse. My current containers for  
D use unsigned again (actually they use size_t. Is it unsigned? Well, they  
don't care about the possibility.).

Other languages use arrays which start at 1, and i have seen immense  
threads about this issue, too. What you can learn from all this, is that  
there is no solution, no best way -- just 2 possibilities. Remember the  
iterator concept of the STL, it uses open ranges as well.

All i found is that the current implementation is very practical: in  
almost all algorithms i wrote yet in D, i work with lengths or search for  
the end of a range -- thus the current implementation was easier to use.

 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

You know yourself that this is *not* the way to go!? D stands on the  
shoulders of C and C++, so what use is kicking downwards of?

Ciao
uwe

May 02 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 How would you possibly be able to generate valid code
 if the calculated length of your slice is zero using the
 more "intuitive" approach?

 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

In particular Fortran and MATLAB use 1-based indexing and inclusive slices.

May 02 2005

Nick <Nick_member pathlink.com> writes:

In article <d556be$1jds$1 digitaldaemon.com>, Ben Hinkle says...
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

In particular Fortran and MATLAB use 1-based indexing and inclusive slices. 

Fortran and MATLAB are made for mathematics and science, where arrays often
represent vectors and double arrays represent matrices. Therefore it is natural
to stick to the mathematical notation and count from one. D is a system language
(like C and C++), where in many cases it makes more sense to think of arrays as
memory buffers, and indices as offsets.

As for the slicing issue, using [1..0] to represent an empty array doesn't seem
very intuitive to me - looks more like the array of elements 0 and 1 in reverse
order, or something.

Nick

May 02 2005

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:11fe7yohtum20.si9yrukkmrgf.dlg 40tude.net...
 On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 You need to be able to express a slice of zero length
 using two (unsigned) values.

 Why do we *need* to be constrained by uint?

It is perfectly natural using a unint for indexation.
The hypothetical "intuitive" slicing just won't work
for uints (it would in theory with ints).



 Code generation does have something to do with it. The languages that use 
 a
 0-base index are really using an offset rather than an index. In that its
 'index' value is really how far off from the beginning of the array you 
 are
 referring to.  a[0] is a distance of zero elements from the start of 'a',
 thus it references the first element. Those languages that use a 1-based
 indexing are more closely using a true index. Code generation needs to
 calculate the address of an element, and that address if an offset on 
 bytes
 from the start of the array. Thus 0-based indexing usually leads to more
 efficient code, as true indexing often needs to subtract one
 sizeof(element) from the calculated address.

Used to be in the past century. Nowadays there is virtually
no penalty to use index plus offset for Pentiums. So you can
base your arrays on whatever you like best and most of your
programs won't experience any slowdowns at all.

I have no idea if you have any practical assembly language
experience, but addressing a 1-dimensional array element
is usually performed by a single machine code instruction.
Index and offset calculation is done by the CPU. Depending
on the instruction pipeline you might get away with the
minimum latency even for complex addressing methods.


 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

Yeah, and throw away tons of programs, ported and to-be-ported
C and Java code just because of a religious belief, which has no
merits whatsoever.

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 2 May 2005 17:56:58 +0200, Bob W wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:11fe7yohtum20.si9yrukkmrgf.dlg 40tude.net...
 On Mon, 2 May 2005 10:56:42 +0200, Bob W wrote:

 You need to be able to express a slice of zero length
 using two (unsigned) values.

 Why do we *need* to be constrained by uint?

 
 It is perfectly natural using a unint for indexation.
 The hypothetical "intuitive" slicing just won't work
 for uints (it would in theory with ints).

Well my answer to my own question is that 0-based indexing is a method of
referencing elements based on an offset from the start of the array and
there are no elements of the array before the first one, so a negative
'index' would never reference any of the array's elements. Thus a uint is a
natural choice for 0-based indexing schemes.

 
 Code generation does have something to do with it. The languages that use 
 a
 0-base index are really using an offset rather than an index. In that its
 'index' value is really how far off from the beginning of the array you 
 are
 referring to.  a[0] is a distance of zero elements from the start of 'a',
 thus it references the first element. Those languages that use a 1-based
 indexing are more closely using a true index. Code generation needs to
 calculate the address of an element, and that address if an offset on 
 bytes
 from the start of the array. Thus 0-based indexing usually leads to more
 efficient code, as true indexing often needs to subtract one
 sizeof(element) from the calculated address.

 
 Used to be in the past century. Nowadays there is virtually
 no penalty to use index plus offset for Pentiums. So you can
 base your arrays on whatever you like best and most of your
 programs won't experience any slowdowns at all.

Thank you. Its been a *long* time since I've done any assembler work and I
haven't kept up with Intel advances.


 
 Have array indexing start at 1. Therefore a[1..0] is a zero-length array
 using only uints. There are languages that implement slices this way.

 
 Yeah, and throw away tons of programs, ported and to-be-ported
 C and Java code just because of a religious belief, which has no
 merits whatsoever.

"has no merits whatsoever" - which is also another 'religious belief'.

The original poster was just saying that half-open slicing semantics is not
intuitive to the average person, regardless of how practical it is for a
programming language. Nearly everyone I know starts counting off items
starting with one; no-one I know starts counting ... zero, one, two, three,
... And because it is unintuitive, new comers to the concept need to learn
it, because it is not going to change.

-- 
Derek Parnell
Melbourne, Australia
3/05/2005 7:01:06 AM

May 02 2005

"Bob W" <nospam aol.com> writes:

 ......... no-one I know starts counting ... zero, one, two, three,


NASA does - they just don't get the direction right.


More seriously: I have used both indexing methods a lot
(Pascal and D's predecessors) and even after many years
I could not tell which one is better for me. It varies from
one application to the other. But I guess it is just mainly
a matter of getting used to either one.

May 02 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 3 May 2005 00:30:58 +0200, Bob W wrote:

 ......... no-one I know starts counting ... zero, one, two, three,

 
 NASA does - they just don't get the direction right.

Well... actually they are announcing the number of seconds remaining, but I
get your point ;-)

 More seriously: I have used both indexing methods a lot
 (Pascal and D's predecessors) and even after many years
 I could not tell which one is better for me. It varies from
 one application to the other. But I guess it is just mainly
 a matter of getting used to either one.

Yes. It is a learned behaviour.

-- 
Derek
Melbourne, Australia
3/05/2005 9:23:08 AM

May 02 2005

Sean Kelly <sean f4.ca> writes:

In article <d53gug$7pf$2 digitaldaemon.com>, William Kilian says...
The array slice syntax in D is unintuitive. a[1..3] should refer to 
three elements, not two.

The reasons for the current syntax are clear. It allows the length 
property to be used as the second slice bound, e.g. a[3..a.length]. This 
saves us from typing length - 1 whenever we want a slice to extend to 
the end of an array. However, adding a read-only last property to arrays 
such that last := length - 1 would save us the repeated subtractions yet 
allow array slice syntax to have the intuitive meaning. Perhaps end or 
tail would be a better name than last. Regardless, the meaning of 
a[3..a.last] or a[3..a.end] is transparent. Besides, a[3..a.length] is 
confusing anyway because a[a.length] is always invalid.

I like the current syntax, but then I've gotten quite used to this behavior with
C++ containers.  I suppose one could argue for the classic inclusion/exclusion
syntax:

a[0..5); // from 0 to 4
a[0..5]; // from 0 to 5
a(0..5]; // from 1 to 5
a(0..5); // from 1 to 4

But this would add undesirable complexity to the parser :)


Sean

May 02 2005

D Programming

C/C++ Programming

Other

digitalmars.D - array slice syntax