www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Yet another include/exclusive slicing thread

reply Tomer Altman <Tomer_member pathlink.com> writes:
First of all, I know and understand that things are hard to change, that Walter
has absolute authority over the language specification and the fact this topic
has been brought up many times already. However I believe (hope) that my
following post will pour more light on the subject and help turn the matters to
the best.

WARNING: LONG POST!

The main advantage of having an inclusive range for start and ending indices for
slicing is the intuitivity and clearness of what the programmer meant as ".." is
a universal denotation of a "range", symmetric:
A[0..2] affects cells 0,1,2.

Now points against it, which for each I will give a counter opinion:
1. To denote the slice is until the end of the array, one must use (length - 1).

While this is correct numerically, one logically thinks about a range in terms
of start and end and not in terms of "length". Therefore if a new property with
a name such as "last", "end", "maxIndex" could be introduced and will be equal
to (length - 1). In this case, A[0..last] can be written, instead of
a[0..length-1], keeping the clarity.
Adding a "last" property can also help in case one wants to simply change the
last cell in the array:
A[last] = 5;


2. Programmers are used to the for(i=0;i<length;++i) idiom.

While it is true that right now this is the case, I believe it became this way
from historic reasons of shorthand instead of writing code such as
"for(i=0;i<=length-1;++i)" which is certainly confusing and long.

From my experience, even tho "for(i=0;i<length;++i)" is better for simple loops,
almost always whenever I have a more complex loop that goes from a variable
index "n" till a variable index "m", I prefer using the <= notation as it gives
a clear idea until where the loop iterates.

I then realized that the reason why the <= notation isn't the common one is that
the "length" of the array doesn't actually fit in a loop over indices, just a
number from which one can denote the maximum index.

Since the system is structured, adding a "last" property as in point 1 would
diminish the shorthand motive for using "for(i=0;i<length;++i)". Instead it
could be "for(i=0;i<=last;++i)"

Moreover, in case you DO refer to length, the usage of ".." is
counter-intuitive. As other people noted, an inclusion of a index and length
notation can be useful regardless of a start..end notation.

The reasoning behind D is to make a language which is like C and C++, but is
better designed, reducing the amount of thought one should use to read/write
code which does what s/he intuitively thinks of. In this case, it means removing
the need for the exclusive < notation and instead using a symmetric <= notation
using a new language feature ("lastIndex").

Note: In both of my points, the code can be compiled by translating the new
version to the old version and compiling it just the same, efficiency isn't
hindered.


3. Creating a zero lengthed array should be easy, so A[0..0] is a good way.

If a certain index and length notation would be added (for example with the




4. Thousands of lines of D code have already been written, we shouldn't change
it now for legacy reasons.

While this is true for the reason C and C++ stayed backwards compatible, the
proportions are completely different, thousands versus hundreds of millions.
This caused many of the faults these languages now suffer from and are the
reason new languages (Java, D, etc.) were designed.
The language is still fresh and evolving. Changing it now might prove to much
more fruitful when it gets popular and actually gets to millions of lines of
complex code written.

On this note, I suggest adding a mandatory "language version" declaration at the
header of source files so that the compiler can act accordingly as the language
and libraries evolve. For example, "version 0.56" has a certain feature, while
"version 0.6" still has that feature but maybe has a different meaning (such as
this post's topic). That way even if the language changes expressions, old code
can be compiled appropriatly. Ofcourse if the versioning doesn't appear, it will
fall off to the version just before versioning is added.


I hope very much that this will help making D a better language for all of us.
Oct 22 2004
next sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
I used to be a vocal proponent of this very same thing.  Walter's 
response to me, basically, was that it was designed that way because it 
made the code easier to write, more often.  Basically, he said that it's 
more common for you to know the index of the element after the last than 
it was for you to know the index of the last one.  For instance, say you 
want to slice an array into two pieces, and you know the index where it 
should start.  With Walter's current design, the code looks like this:
	char[] array = <whatever>;
	int sliceIndx = <whatever>;
	char[] slice1 = array[0..sliceIndx];
	char[] slice2 = array[sliceIndx..length];
With inclusive ranges, you have to add two extra "-1"s to the code:
	char[] slice1 = array[0..sliceIndx-1];
	char[] slice2 = array[sliceIndx..length-1];

At the time, I didn't believe him.  However, in my D experience since, 
I've had to say that I  think he is right.  It is far more common to use 
non-inclusive ranges than inclusive once.

So, I'm a convert.  Yes, it looks confusing, and takes a little to 
learn.  But it's probably the best way to do things after all.
Oct 22 2004
parent reply Sjoerd van Leent <svanleent wanadoo.nl> writes:
Russ Lewis wrote:
 I used to be a vocal proponent of this very same thing.  Walter's 
 response to me, basically, was that it was designed that way because it 
 made the code easier to write, more often.  Basically, he said that it's 
 more common for you to know the index of the element after the last than 
 it was for you to know the index of the last one.  For instance, say you 
 want to slice an array into two pieces, and you know the index where it 
 should start.  With Walter's current design, the code looks like this:
     char[] array = <whatever>;
     int sliceIndx = <whatever>;
     char[] slice1 = array[0..sliceIndx];
     char[] slice2 = array[sliceIndx..length];
 With inclusive ranges, you have to add two extra "-1"s to the code:
     char[] slice1 = array[0..sliceIndx-1];
     char[] slice2 = array[sliceIndx..length-1];
 
 At the time, I didn't believe him.  However, in my D experience since, 
 I've had to say that I  think he is right.  It is far more common to use 
 non-inclusive ranges than inclusive once.
 
 So, I'm a convert.  Yes, it looks confusing, and takes a little to 
 learn.  But it's probably the best way to do things after all.
 
I think that a solution to this should be possible. Why not let people decide themselves to use inclusive or exclusive notation for slicing. The following should be possible to implement: char[] slice1 = array[0 .. length]; // The same thing char[] slice2 = array[start : end]; // Different operator Could such a solution be the one you're looking for? Regards, Sjoerd
Oct 23 2004
parent "Walter" <newshound digitalmars.com> writes:
"Sjoerd van Leent" <svanleent wanadoo.nl> wrote in message
news:cldcc1$2hue$1 digitaldaemon.com...
 I think that a solution to this should be possible. Why not let people
 decide themselves to use inclusive or exclusive notation for slicing.
 The following should be possible to implement:

 char[] slice1 = array[0 .. length]; // The same thing
 char[] slice2 = array[start : end]; // Different operator

 Could such a solution be the one you're looking for?
While that would technically work, I suspect that there would be constant confusion over which was which.
Oct 31 2004
prev sibling next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Tomer Altman wrote:

 While this is true for the reason C and C++ stayed backwards compatible, the
 proportions are completely different, thousands versus hundreds of millions.
 This caused many of the faults these languages now suffer from and are the
 reason new languages (Java, D, etc.) were designed.
 The language is still fresh and evolving. Changing it now might prove to much
 more fruitful when it gets popular and actually gets to millions of lines of
 complex code written.
Java chose exclusive ranges too, if that helps... http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int)
 public String substring(int beginIndex, int endIndex)
   Returns a new string that is a substring of this string.
   The substring begins at the specified beginIndex and extends to  the
character
   at index endIndex - 1. Thus the length of the substring is
endIndex-beginIndex.
Some of us think that it's a *good thing*, just as we like arrays to start from zero and not from one ? --anders
Oct 23 2004
prev sibling next sibling parent reply David Medlock <amedlock nospam.org> writes:
I think the whole issue stems from the ridiculous notion that we start 
counting things at zero in programming languages.

Its completely counterintuitive unless you have been writing compilers 
and you know that:
char *p;
p[2] == *(p + 2)


With one based indexes, then the inclusive idea has more merit.
No need for a[length-1], just a[length] for the last item.

for( i=1; i<=length; i++ ) ... looks more readable to me than
for( i=0; i<length; i++)

This is definitely *not* a critique of Walter by any means, since he has 
made C familiarity a priortity.  Its more in legacy C this has come to pass.


My 0.02$ spend it wisely.
Oct 23 2004
next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
David Medlock wrote:

 With one based indexes, then the inclusive idea has more merit.
 No need for a[length-1], just a[length] for the last item.
 
 for( i=1; i<=length; i++ ) ... looks more readable to me than
 for( i=0; i<length; i++)
I think you meant: "for i := 1 to length do" as readable :-) Since D uses C-style arrays, its exclusive indexing makes sense ? (just as inclusive indexing would make sense with Pascal arrays) And of course, for array loops, the "foreach" is excellent... --anders
Oct 24 2004
prev sibling parent Derek <derek psyc.ward> writes:
On Sat, 23 Oct 2004 21:38:46 -0400, David Medlock wrote:

 I think the whole issue stems from the ridiculous notion that we start 
 counting things at zero in programming languages.
 
 Its completely counterintuitive unless you have been writing compilers 
 and you know that:
 char *p;
 p[2] == *(p + 2)
 
 
 With one based indexes, then the inclusive idea has more merit.
 No need for a[length-1], just a[length] for the last item.
 
 for( i=1; i<=length; i++ ) ... looks more readable to me than
 for( i=0; i<length; i++)
 
 This is definitely *not* a critique of Walter by any means, since he has 
 made C familiarity a priortity.  Its more in legacy C this has come to pass.
 
 
 My 0.02$ spend it wisely.
I'm with you here too. I know that D's heritage does not permit it to use 1-based indexing so I'm not debating its pros and cons here. I think of 0-indexing as not really indexes at all but offsets to the beginning of the element. I've been programming for more than 25 years and a large part of that is with C, and yet 1-based indexing always seems more natural to me. I now do a lot of programming with Euphoria and with Progress, both which use 1-based indexing and it is just easier to read/comprehend and explain to normal people (not programmers!). -- Derek Melbourne, Australia
Oct 24 2004
prev sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
I see where you are coming from - Fortran and MATLAB both include the
endpoint in slices (and they both use 1-based indexing instead of
0-based). Non-programmers tend to like that more than C-style. One
area where including the endpoint makes sense is with custom
containers like a sorted associative array - why should a slice of
such an array need to know the key for the element after the desired
slice? Similarly in a linked list the slice from one node to another
should probably include the endpoint. The difference becomes important
when items are added to the list - do they go into the slice or after
the slice? In MinTL slicing by integers will exclude the endpoint and
slicing by key or node will include the endpoint.

I think people in this newsgroup are a bit worn out right now, though,
so I don't expect this topic to get much debate.

-Ben
Oct 23 2004
parent David Medlock <amedlock nospam.org> writes:
Ben Hinkle wrote:
 I see where you are coming from - Fortran and MATLAB both include the
 endpoint in slices (and they both use 1-based indexing instead of
 0-based). Non-programmers tend to like that more than C-style. One
 area where including the endpoint makes sense is with custom
 containers like a sorted associative array - why should a slice of
 such an array need to know the key for the element after the desired
 slice? Similarly in a linked list the slice from one node to another
 should probably include the endpoint. The difference becomes important
 when items are added to the list - do they go into the slice or after
 the slice? In MinTL slicing by integers will exclude the endpoint and
 slicing by key or node will include the endpoint.
 
 I think people in this newsgroup are a bit worn out right now, though,
 so I don't expect this topic to get much debate.
 
 -Ben
I like the (often downtrodden) pascal language a lot, because it allows you to set the range of your array. I found some old Delphi Code I wrote like 2 years ago and It was perfectly readable. -dm
Oct 25 2004