www.digitalmars.com         C & C++   DMDScript  

D - comments on m..n array index syntax. make it m through n inclusive

reply Chris Friesen <cfriesen nortelnetworks.com> writes:
On the whole, it looks pretty good.  I have already given my thoughts on generic
programming, but I wanted to make a comment on your array index range notation.


Quoted from your document:
   In general, (a[n..m] op e) is defined as: 

        for (i = n; i < m; i++)
            a[i] op e;


        s[] = t[];              the 3 elements of t[3] are copied into s[3]
        s[1..2] = t[0..1];      same as s[1] = t[0]
        s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]


While I can see how this came from C/C++, I think it's very confusing.  I think
it would make a whole lot more sense to read the [m..n] notation as being the
range of indices which are covered.  This would then be identical behaviour to
math programs such as maple.  Plus, it has the added advantage of being
syntactically similar to accessing a single array element.

Thus,

a[1] = b[1];       obvious
a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3]
a[1..3] = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]

Translating to english, the m..n notation converts to "take elements n through
m" which I think makes a lot more sense then "take elements n through m-1".

As a final piece of syntactical sugar, what about something like

a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]

where you specify a list of indices to copy?


Chris





-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen nortelnetworks.com
Aug 16 2001
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Chris Friesen" <cfriesen nortelnetworks.com> wrote in message
news:3B7C20B2.E2B8595F nortelnetworks.com...
 On the whole, it looks pretty good.  I have already given my thoughts on
generic
 programming, but I wanted to make a comment on your array index range
notation.
 Quoted from your document:
    In general, (a[n..m] op e) is defined as:
         for (i = n; i < m; i++)
             a[i] op e;
         s[] = t[];              the 3 elements of t[3] are copied into
s[3]
         s[1..2] = t[0..1];      same as s[1] = t[0]
         s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]
 While I can see how this came from C/C++, I think it's very confusing.  I
think
 it would make a whole lot more sense to read the [m..n] notation as being
the
 range of indices which are covered.  This would then be identical
behaviour to
 math programs such as maple.  Plus, it has the added advantage of being
 syntactically similar to accessing a single array element.

 Thus,

 a[1] = b[1];       obvious
 a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3]
 a[1..3] = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]

 Translating to english, the m..n notation converts to "take elements n
through
 m" which I think makes a lot more sense then "take elements n through
m-1". That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.
 As a final piece of syntactical sugar, what about something like
 a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]
 where you specify a list of indices to copy?
That does work pretty neat, but are there enough uses of this to justify the feature?
Aug 16 2001
next sibling parent Chris Friesen <chris_friesen sympatico.ca> writes:
Walter wrote:
 "Chris Friesen" <cfriesen nortelnetworks.com> wrote in message
 Translating to english, the m..n notation converts to "take elements n
through
 m" which I think makes a lot more sense then "take elements n through
m-1". That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.
Sure, but then you just write a[m..n-1] = b[m..n-1] Doesn't that make more sense than writing a[0..n] = b[0..n] when you only have n elements to begin with? Since its a whole new syntax anyways, I would like to make it something logical and obvious to a new user. Thinking back to my old programming days when loops were "for i = 1 to 10 do"... I think that having it obvious in the statemnet what the range of values is will end up being clearer in the end. I think the concept of ranges would be useful in switch statements as well, but I'll address that in another thread.
 As a final piece of syntactical sugar, what about something like
 a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]
 where you specify a list of indices to copy?
That does work pretty neat, but are there enough uses of this to justify the feature?
I kind of doubt it. Like I said, syntactical sugar. Chris
Aug 16 2001
prev sibling parent "Sheldon Simms" <sheldon semanticedge.com> writes:
Im Artikel <9lhqng$d9n$1 digitaldaemon.com> schrieb "Walter"
<walter digitalmars.com>:

 "Chris Friesen" <cfriesen nortelnetworks.com> wrote in message
 news:3B7C20B2.E2B8595F nortelnetworks.com...
 On the whole, it looks pretty good.  I have already given my thoughts
 on
generic
 programming, but I wanted to make a comment on your array index range
notation.
 Quoted from your document:
    In general, (a[n..m] op e) is defined as:
         for (i = n; i < m; i++)
             a[i] op e;
         s[] = t[];              the 3 elements of t[3] are copied into
s[3]
         s[1..2] = t[0..1];      same as s[1] = t[0] s[0..2] = t[1..3]; 
             same as s[0] = t[1], s[1] = t[2]
 While I can see how this came from C/C++, I think it's very confusing. 
 I
think
 it would make a whole lot more sense to read the [m..n] notation as
 being
the
 range of indices which are covered.  This would then be identical
behaviour to
 math programs such as maple.  Plus, it has the added advantage of being
 syntactically similar to accessing a single array element.

 Thus,

 a[1] = b[1];       obvious
 a[1..3] = b[1..3];      same as a[1]=b[1], a[2]=b[2], a[3]=b[3] a[1..3]
 = b[6..8];      same as a[1]=b[6], a[2]=b[7], a[3]=b[8]

 Translating to english, the m..n notation converts to "take elements n
through
 m" which I think makes a lot more sense then "take elements n through
m-1". That's a good point. But I am so used to writing loops that go from n to m-1, that diverging from that will cause a lot of inadvertant bugs.
I'm very used to writing loops like that too, but this notation in the D document really confused me at first. I think it's completely counterintuitive and agree with Chris 100%. -- Sheldon Simms / sheldon semanticedge.com
Aug 17 2001
prev sibling next sibling parent Christophe de Dinechin <descubes earthlink.net> writes:
Yes, this is counter intuitive. But it should not be in the language in the
first
place. The definition of something like indexing should be in the library.

What if I want range-checked indexes. What if I don't want them. What if I want
a
stride (that is, elements are A[0], A[4], A[8], but A[1] and A[0] are the same.)
What if I want a different base (indexes in range 1..100 rather than 0..99)?


Chris Friesen wrote:

 On the whole, it looks pretty good.  I have already given my thoughts on
generic
 programming, but I wanted to make a comment on your array index range notation.

 Quoted from your document:
    In general, (a[n..m] op e) is defined as:

         for (i = n; i < m; i++)
             a[i] op e;

         s[] = t[];              the 3 elements of t[3] are copied into s[3]
         s[1..2] = t[0..1];      same as s[1] = t[0]
         s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]

 While I can see how this came from C/C++, I think it's very confusing.  I think
 it would make a whole lot more sense to read the [m..n] notation as being the
 range of indices which are covered.  This would then be identical behaviour to
 math programs such as maple.  Plus, it has the added advantage of being
 syntactically similar to accessing a single array element.
I agree there. This notation is confusing to the extreme. If you want to have this behavior, use the mathematical notation: s[1..2[ = t[0..1[ For mathematicians, this means 1 to 2, 2 excluded. I can see that this might be difficult to parse, thou...
 As a final piece of syntactical sugar, what about something like

 a[1,4,7] = b[3,2,8];   same as a[1]=b[3], a[4]=b[2], a[7]=b[8]
This is problematic in the presence of multi-dimensional arrays.
Aug 17 2001
prev sibling parent reply reiter nomadics.com (Mac Reiter) writes:
(First, pardon me if the array slicing syntax debate is over.  I just
found out about D a few days ago, and just started looking at the spec
seriously today.)

On Thu, 16 Aug 2001 15:36:18 -0400, Chris Friesen
<cfriesen nortelnetworks.com> wrote:

Quoted from your document:
   In general, (a[n..m] op e) is defined as: 

        for (i = n; i < m; i++)
            a[i] op e;


        s[] = t[];              the 3 elements of t[3] are copied into s[3]
        s[1..2] = t[0..1];      same as s[1] = t[0]
        s[0..2] = t[1..3];      same as s[0] = t[1], s[1] = t[2]


While I can see how this came from C/C++, I think it's very confusing.  I think
Just to throw another vote in here - when I first read the description of slicing in the D spec, I assumed it was a typo. I then read a later line where a slice was described as: int a[10]; int b[] b = a; b = a[]; b = a[0 .. a.length]; This explains WHY the syntax is the way it is, but I must strenuously agree that it does not justify it. Since "off by one" errors are second only to pointer handling errors in programming, any new syntax should be very clear and intuitive in its use. I would also agree that some form of exclusive bound would be acceptable, though hard to parse: b = a[0 .. a.length-1]; replaced by: b = a[0 .. a.length); If the currently described exclusive ending bound remains in D, I would simply have to remove the slicing syntax from my set of tools, because I would always get it wrong -- I've switched from Basic to C/C++ enough times to know that much. Mac Reiter
Jan 11 2002
next sibling parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Mac Reiter" <reiter nomadics.com> wrote in message
news:3c3f702f.27813613 news.digitalmars.com...

 This explains WHY the syntax is the way it is, but I must strenuously
 agree that it does not justify it.  Since "off by one" errors are
 second only to pointer handling errors in programming, any new syntax
 should be very clear and intuitive in its use.
...
 If the currently described exclusive ending bound remains in D, I
 would simply have to remove the slicing syntax from my set of tools,
 because I would always get it wrong -- I've switched from Basic to
 C/C++ enough times to know that much.
I thought the same when I argued on the topic. Now, after I used it for a while, I have to agree with Walter that end-exclusive form is what you need in 90% cases. It's not so counter-intuitive as one might think, in fact, I didn't yet make any mistakes with this syntax so far! Just try to write something using slices heavily and you'll see it for yourself....
Jan 11 2002
next sibling parent "Walter" <walter digitalmars.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:a1o00d$31ed$1 digitaldaemon.com...
 "Mac Reiter" <reiter nomadics.com> wrote in message
 news:3c3f702f.27813613 news.digitalmars.com...

 This explains WHY the syntax is the way it is, but I must strenuously
 agree that it does not justify it.  Since "off by one" errors are
 second only to pointer handling errors in programming, any new syntax
 should be very clear and intuitive in its use.
...
 If the currently described exclusive ending bound remains in D, I
 would simply have to remove the slicing syntax from my set of tools,
 because I would always get it wrong -- I've switched from Basic to
 C/C++ enough times to know that much.
I thought the same when I argued on the topic. Now, after I used it for a while, I have to agree with Walter that end-exclusive form is what you need in 90% cases. It's not so counter-intuitive as one might think, in fact, I didn't yet make any mistakes with this syntax so far! Just try to write something using slices heavily and you'll see it for yourself....
Look at the string.d code for examples!
Jan 11 2002
prev sibling parent reply reiter nomadics.com (Mac Reiter) writes:
On Sat, 12 Jan 2002 03:30:04 +0300, "Pavel Minayev" <evilone omen.ru>
wrote:

I apologize up front for the length of this posting.  Unfortunately, I
do not have the time necessary to edit it down while maintaining the
points I am trying to make.

"Mac Reiter" <reiter nomadics.com> wrote in message
news:3c3f702f.27813613 news.digitalmars.com...

 This explains WHY the syntax is the way it is, but I must strenuously
 agree that it does not justify it.  Since "off by one" errors are
 second only to pointer handling errors in programming, any new syntax
 should be very clear and intuitive in its use.
...
 If the currently described exclusive ending bound remains in D, I
 would simply have to remove the slicing syntax from my set of tools,
 because I would always get it wrong -- I've switched from Basic to
 C/C++ enough times to know that much.
I thought the same when I argued on the topic. Now, after I used it for a while, I have to agree with Walter that end-exclusive form is what you need in 90% cases. It's not so counter-intuitive as one might think, in fact, I didn't yet make any mistakes with this syntax so far! Just try to write something using slices heavily and you'll see it for yourself....
The mere thought causes me to wake up at night in a cold sweat from maintenance nightmares. How many times do C programmers blow up stacks and heaps because they forget to allocate enough space for the NULL at the end of a C string? How many programmers are going to assume they need the -1 on the final bound and end up one element short all the time? How many programmers are going to think they copied the entire array and wonder why their code explodes or throws an exception when they try to access that last element? Any decision will work for people who program exclusively in the given language. Java people got used to January being 0 and December being 11, eventually. But a lot of programs got bad dates and lots of exceptions thrown regarding December, too. Experienced C programmers don't have problems remembering that scanf needs the address for all variable types EXCEPT strings (char arrays), but *EVERY* new and some intermediate C programmers have blown up programs because of it. This form saves typing "-1" 90% of the time. But it generates a giant blind spot when you have an off-by-one error the other 10% of the time, because when you're trying to do the code review you look at it and it *looks* like it does the right thing, but in reality it is leaving the last element off. Code should do what it says. I don't mind an end-exclusive form. I just don't think it should use the end-INCLUSIVE syntax. Most of us had some math thrown at us along the way, and we know that [] includes both endpoints and [) does not include the last endpoint. If nothing else, seeing the ) at the end of the range will make you stop and think about what you are looking at. a[5..7] should be a[5], a[6], and a[7] a[5..7) should be a[5] and a[6] If your newsreader font is really small, the second line used a closing parenthesis instead of a closing square bracket. This might be difficult to parse, but it really shouldn't be. I would expect some kind of "grouping stack" that keeps track of the most recent outstanding opening symbol. If that is the case, all you have to do is accept a closing parenthesis as a valid match to an opening square bracket. Even conversion of existing code to the new format *shouldn't* be too hard (says the non-compiler-writer, possibly with no ground to stand on). Make an intermediate version of the compiler that accepts either form, and treats both of them as end-exclusive. Have that compiler dump out a new file with the closing ] converted to a ). Users can use this compiler to convert code files and prepare for the new version of the compiler that supports end-exclusive AND end-inclusive. A conversion like this needs to be done as early as possible, because if you think it will be hard now, imagine how hard it will be as more and more code accumulates. Alternatively, some kind of #pragma-like device could be used, but then you have to choose which behavior is default, and code reviewers have to check for #pragmas to understand the code they are reading, and it all just gets nasty. Ultimately, since D is Mr. Bright's language, it will do whatever he wants it to do. I do know that I have had to do a LOT of maintenance programming and code reviews of other people's code, and that I rarely get the opportunity to work exclusively in one language for extended periods of time. My comments about blind spots and the principle of least astonishment -- A system and its commands should behave the way most people would predict, that is, the system should operate with "least astonishment" -- come from experience and practice. I realize that Mr. Bright also has tremendous experience, having viewed his substantial list of commercial programming successes. But if [] remains end-exclusive, and if our company eventually started using it for production work, the style policy would have to require that array slicing not be used unless no alternative was available, and require specific boilerplate commentary when it was used. This would be necessary to avoid astonishing new programmers and/or reviewers who came across the code. I already do similar things when I need to use <= in a for loop instead of <, especially if it is a nested loop and one loop uses < but the other uses <=. That looks like a bug, so I comment it to explain why it is that way. Every use of array slicing looks like a bug to me, so every use would require a comment explaining its purpose. Again, I apologize for the length of this posting. Mac
Jan 14 2002
parent "Pavel Minayev" <evilone omen.ru> writes:
"Mac Reiter" <reiter nomadics.com> wrote in message
news:3c42f52d.258466765 news.digitalmars.com...

 How many programmers are going to assume they need the -1 on the final
 bound and end up one element short all the time?
These were my words... earlier.
 This form saves typing "-1" 90% of the time.  But it generates a giant
 blind spot when you have an off-by-one error the other 10% of the
 time, because when you're trying to do the code review you look at it
 and it *looks* like it does the right thing, but in reality it is
 leaving the last element off.  Code should do what it says.
My reply is simple: RTFM first. Always! On other hand, Walter should have probably written it in red and bold, all capital: "array slices are end-exclusive!", at the beginning of the reference =)
Jan 14 2002
prev sibling next sibling parent reply Roland <rv ronetech.com> writes:
Mac Reiter a écrit :

                 b = a[0 .. a.length);
sorry but in french keyboard, ']' and ')' are the same keyboard key, just the first is AltGr'ed..dangerous isn't it ? Me, i still think that even if inclusive-exclusive form is more usable than it seems, it is hard to sell. I like the mathematical [a..b[ form, but i understand the parser don't. For new comers this topic had been discussed in "arrays slicing range" thread. Roland
Jan 16 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Roland" <rv ronetech.com> wrote in message
news:3C45A4FC.128C2C8F ronetech.com...

 I like the mathematical [a..b[ form, but i understand the parser don't.
I always thought (a..b] and [a..b) are mathematical forms, aren't they?
Jan 16 2002
parent Roland <rv ronetech.com> writes:
Pavel Minayev a écrit :

 "Roland" <rv ronetech.com> wrote in message
 news:3C45A4FC.128C2C8F ronetech.com...

 I like the mathematical [a..b[ form, but i understand the parser don't.
I always thought (a..b] and [a..b) are mathematical forms, aren't they?
Not as i was teached math. In fact i would'nt care notation style if ')' and ']' were not so close in my keyboard (same key). Roland
Jan 17 2002
prev sibling parent reply DrWhat? <DrWhat nospam.madscientist.co.uk> writes:
Mac Reiter wrote:

[snip]

 I would also agree that some form of exclusive bound would be
 acceptable, though hard to parse:
 
 b = a[0 .. a.length-1];
 replaced by:
 b = a[0 .. a.length);
that is rather unpleasant (and likely to produce typos), however that is commonly what programmers what, whould may solution be better ... b = a[0 .. a.last] where a.last == a.length-1
 If the currently described exclusive ending bound remains in D, I
 would simply have to remove the slicing syntax from my set of tools,
 because I would always get it wrong -- I've switched from Basic to
 C/C++ enough times to know that much.
I can imagine that could give errors, and D (IMHO) would be better without such gotchas.
 Mac Reiter
 
Feb 14 2002
parent "Pavel Minayev" <evilone omen.ru> writes:
"DrWhat?" <DrWhat nospam.madscientist.co.uk> wrote in message
news:a4hnfh$109f$1 digitaldaemon.com...

 that is rather unpleasant (and likely to produce typos), however that
 is commonly what programmers what, whould may solution be
 better ...

 b = a[0 .. a.last]

 where a.last == a.length-1
Practice shows that the form [0 .. a.length] (end-exclusive) is more practically convenient than end-inclusive one. Otherwise, this is a matter of taste.
 I can imagine that could give errors,  and D (IMHO) would be better
 without such gotchas.
Too late too late =)
Feb 14 2002