www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Array start index

reply "DLearner" <bmqazwsx123 gmail.com> writes:
Does the D language set in stone that the first element of an 
array _has_ to be index zero?
Wouldn't starting array elements at one avoid the common 
'off-by-one' logic error, it does
seem more natural to begin a count at 1.

Actually, maybe even better to allow array definitions of form
int foo[x:y];
(y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

I think the (very old) IBM PL/I language was like this.
Aug 01 2015
next sibling parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 1/08/2015 9:35 p.m., DLearner wrote:
 Does the D language set in stone that the first element of an array
 _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 'off-by-one'
 logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.
In c style languages (like D) the index actually defines the offset in memory. Not actually the index. So while in some languages 1 is used instead of 0, 0 maps better to the hardware. Think of this byte array: ubyte* myptr = [ |---|-------| | i | value | |---|-------| | 0 | 1 | | 1 | 255 | ]; For 0 start of index: size_t i = 0; assert(myptr[i] == 1); For 1 start of index: size_t i = 1; assert(i != 0); assert(myptr[i-1] == 1); While this is not the complete reason why 0 is chosen, it is something to think about.
Aug 01 2015
prev sibling next sibling parent "bachmeier" <no spam.net> writes:
On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.
Seems you could easily wrap an array in a struct, define opIndex and opSlice appropriately, and use alias this to keep the other properties of the array. The problem you'll run into is interacting with other code that assumes zero-based indexing. I thought about setting the first index at 1 for my library that embeds D inside R, since that's how it's done in R, but very quickly realized how confusing it would be.
Aug 01 2015
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
For the builtin slice types? Yes, set in stone.
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.
See https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html As other commenters have/will point out, you can easily define a custom type in D that behaves as you describe, and - Dijkstra notwithstanding - there are valid uses for such things.
Aug 01 2015
parent reply "DLearner" <bmqazwsx123 gmail.com> writes:
On Saturday, 1 August 2015 at 17:55:06 UTC, John Colvin wrote:
 On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
For the builtin slice types? Yes, set in stone.
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], 
 foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.
See https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html As other commenters have/will point out, you can easily define a custom type in D that behaves as you describe, and - Dijkstra notwithstanding - there are valid uses for such things.
D is a C derivative, so it seems a shame not to identify causes of bugs in C, and design them out in D. For example, in C difficult to write non-trivial commercial programs without using pointers. Pointer manipulation has a terrible reputation for bugs. But in D, easy to write commercial programs without using pointers. Problem has been designed away. Similarly, off-by-one array bugs are commonplace in C. We should seek to eliminate the source of those bugs, which basically reduces to the issue that programmers find it unnatural to start a count at zero. Whether they _should_ find a zero start unnatural is irrelevant - they just do as an observed fact, so let's change the language so the issue is avoided (designed away). Suggestion: if the codebase for D is considered so large that zero-basing cannot now be changed, why not extend the language to allow for array definitions like 'int[x:y] foo'? And then have a rule that 'int[:y] bar' defines a 1-based array of y elements?
Aug 01 2015
parent reply Andrej Mitrovic via Digitalmars-d-learn writes:
On 8/1/15, DLearner via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:
 D is a C derivative, so it seems a shame not to identify causes
 of bugs in C,
 and design them out in D.
This has already been done! D defines an array to be a struct with a pointer and a length. See this article: http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625 I would argue it's not "off-by-one" that's causing most issues when dealing with C "arrays", but instead it's in general out-of-bounds issues (whether it's off bye one or off by 50..) since you often don't have the length or could easily use the wrong variable as the length. Think about how much D code would actually have subtle off-by-one errors if D didn't use 0-based indexing like the majority of popular languages use. Any time you would interface with other languages you would have to double, triple-check all your uses of arrays. FWIW at the very beginning I also found it odd that languages use 0-based indexing, but that was before I had any significant programming experience under my belt. By now it's second nature to me to use 0-based indexing.
Aug 01 2015
parent reply "bachmeier" <no spam.net> writes:
On Saturday, 1 August 2015 at 19:04:10 UTC, Andrej Mitrovic wrote:
 On 8/1/15, DLearner via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com> wrote:
 D is a C derivative, so it seems a shame not to identify causes
 of bugs in C,
 and design them out in D.
This has already been done! D defines an array to be a struct with a pointer and a length. See this article: http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625 I would argue it's not "off-by-one" that's causing most issues when dealing with C "arrays", but instead it's in general out-of-bounds issues (whether it's off bye one or off by 50..) since you often don't have the length or could easily use the wrong variable as the length. Think about how much D code would actually have subtle off-by-one errors if D didn't use 0-based indexing like the majority of popular languages use. Any time you would interface with other languages you would have to double, triple-check all your uses of arrays. FWIW at the very beginning I also found it odd that languages use 0-based indexing, but that was before I had any significant programming experience under my belt. By now it's second nature to me to use 0-based indexing.
But what type of programming are you doing? Even after decades of programming and trying out dozens of languages, zero-based indexing still gets me at times when the arrays I work with represent vectors and matrices. Especially when porting code from other languages that use one-based indexing. One of the nice things about D is that it gives you the tools to easily make the change if you want.
Aug 01 2015
parent reply "QAston" <qastonx gmail.com> writes:
On Saturday, 1 August 2015 at 23:02:51 UTC, bachmeier wrote:
 But what type of programming are you doing? Even after decades 
 of programming and trying out dozens of languages, zero-based 
 indexing still gets me at times when the arrays I work with 
 represent vectors and matrices. Especially when porting code 
 from other languages that use one-based indexing. One of the 
 nice things about D is that it gives you the tools to easily 
 make the change if you want.
Adding 1-indexed arrays to the language fixes nothing. Just write your 1-indexed array type and if you enjoy using it, publish it as a library. Who knows, if demand is high it may even end up in phobos.
Aug 02 2015
parent reply "bachmeier" <no spam.com> writes:
On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just 
 write your 1-indexed array type and if you enjoy using it, 
 publish it as a library. Who knows, if demand is high it may 
 even end up in phobos.
Oh, I don't think that's a good idea. It's too confusing to have more than one method of indexing within the same language. You just have to do a thorough job of testing, as the possibility of errors is something you'll have to live with, given the different design choices of different languages.
Aug 03 2015
parent reply "DLearner" <bmqazwsx123 gmail.com> writes:
On Monday, 3 August 2015 at 13:45:01 UTC, bachmeier wrote:
 On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just 
 write your 1-indexed array type and if you enjoy using it, 
 publish it as a library. Who knows, if demand is high it may 
 even end up in phobos.
Oh, I don't think that's a good idea. It's too confusing to have more than one method of indexing within the same language. You just have to do a thorough job of testing, as the possibility of errors is something you'll have to live with, given the different design choices of different languages.
Looks like 0-base is fixed, to avoid problems with existing code. But nothing stops _adding_ to the language by allowing int[x:y] foo to mean valid symbols are foo[x], foo[x+1],..., foo[y]. Plus rule that int[:y] means valid symbols are foo[1], foo[2],..., foo[y]. That way, 1-start achieved, with no conflict with existing code?
Aug 03 2015
next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Monday, August 03, 2015 21:32:03 DLearner via Digitalmars-d-learn wrote:
 On Monday, 3 August 2015 at 13:45:01 UTC, bachmeier wrote:
 On Sunday, 2 August 2015 at 21:58:48 UTC, QAston wrote:

 Adding 1-indexed arrays to the language fixes nothing. Just
 write your 1-indexed array type and if you enjoy using it,
 publish it as a library. Who knows, if demand is high it may
 even end up in phobos.
Oh, I don't think that's a good idea. It's too confusing to have more than one method of indexing within the same language. You just have to do a thorough job of testing, as the possibility of errors is something you'll have to live with, given the different design choices of different languages.
Looks like 0-base is fixed, to avoid problems with existing code. But nothing stops _adding_ to the language by allowing int[x:y] foo to mean valid symbols are foo[x], foo[x+1],..., foo[y]. Plus rule that int[:y] means valid symbols are foo[1], foo[2],..., foo[y]. That way, 1-start achieved, with no conflict with existing code?
Almost all programming languages in heavy use at this point in time start indexing at 0. It would be highly confusing to almost all programmers out there to have 1-based indexing. In addition, having 0-based indexing actually makes checking against the end of arrays and other random-access ranges easier. You can just check against length without having to do any math. In general, I would expect 1-based indexing to _increase_ the number of off by one errors in code - both because 0-based indexing helps avoid such problems when dealing with the end of the array and more importantly, because almost everyone expects 0-based indexing. You're really barking up the wrong tree if you're trying to get any support for 1-based indexing in D. I doubt that you will see much of anyone who thinks that it's even vaguely a good idea, and there's no way that Walter or Andrei (or probably anyone in the main dev team) who is going to agree that it's even worth considering. I think that the reality of the matter is that if you're going to do much programming - especially if you're going to be professional programmer - you just need to get used to the idea that array indices start at 0. There are a few languages out there where they don't, but they are far from the norm. - Jonathan M Davis
Aug 03 2015
prev sibling parent "QAston" <qaston gmail.com> writes:
On Monday, 3 August 2015 at 21:32:05 UTC, DLearner wrote:
 Looks like 0-base is fixed, to avoid problems with existing 
 code.

 But nothing stops _adding_ to the language by allowing
 int[x:y] foo to mean valid symbols are foo[x], foo[x+1],..., 
 foo[y].
 Plus rule that int[:y] means valid symbols are foo[1], 
 foo[2],..., foo[y].

 That way, 1-start achieved, with no conflict with existing code?
There're quite a few things stopping this from being added to the language. 1. People will have to learn this new feature and it's interaction with gazillion of other D features. 2. There would be a redundancy - core language will have 2 array types while one of them can be easily implemented using the other. 3. Devs will have to maintain it - as if they don't have enough things to fix atm. Really, this is so simple to do as a library - just use opIndex, opSlice with a template struct. As a general rule - start asking for language features only when things can't be done without them.
Aug 04 2015
prev sibling next sibling parent "jmh530" <john.michael.hall gmail.com> writes:
On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].

 I think the (very old) IBM PL/I language was like this.
I come from matlab and R which start matrices at 1. I used to think it was more natural. However after I started using numpy I now think 0 index is better. Also see http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html Give it a try. You might find you like it.
Aug 03 2015
prev sibling next sibling parent "Observer" <spurious.address yahoo.com> writes:
On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.

 Actually, maybe even better to allow array definitions of form
 int foo[x:y];
 (y >= x) creating integer variables foo[x], foo[x+1],...,foo[y].
This experiment has already been run. Perl used to support a $[ variable to set the array base. After experience with the confusion and problems that causes, it was finally deprecated and effectively removed from the language. See the end paragraphs of http://perldoc.perl.org/perlvar.html and also http://search.cpan.org/~wolfsage/perl/ext/arybase/arybase.pm for more info.
Aug 03 2015
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Saturday, 1 August 2015 at 09:35:53 UTC, DLearner wrote:
 Does the D language set in stone that the first element of an 
 array _has_ to be index zero?
 Wouldn't starting array elements at one avoid the common 
 'off-by-one' logic error, it does
 seem more natural to begin a count at 1.
I, too, don't think this is a good idea in general, but I can see a few use-cases where 1-based indices may be more natural. It's easy to define a wrapper: struct OneBasedArray(T) { T[] _payload; alias _payload this; T opIndex(size_t index) { assert(index > 0); return _payload[index-1]; } void opIndexAssign(U : T)(size_t index, auto ref U value) { assert(index > 0); _payload[index-1] = value; } } unittest { OneBasedArray!int arr; arr = [1,2,3]; arr ~= 4; assert(arr.length == 4); assert(arr[1] == 1); assert(arr[2] == 2); assert(arr[3] == 3); assert(arr[4] == 4); } Test with: rdmd -main -unittest xx.d This can of course be easily extended to support other bases than one.
Aug 04 2015