digitalmars.D - Array indexing and slicing

Lu�s Marques (22/22) Apr 03 2006 Hello,

Norbert Nemec (43/64) Apr 03 2006 ter found

Lu�s Marques (9/17) Apr 03 2006 Notice that I did say I still prefer $ to the verbose .length, not that ...

Lu�s Marques (5/6) Apr 03 2006 somearray[1..-1,1..-1], of course. You are right about having missed the...
Norbert Nemec (25/29) Apr 03 2006 Obviously we both agree that "$" is nicer than nothing (i.e. better than

Lu�s Marques (14/23) Apr 03 2006 Thanks, I noticed it later.

Derek Parnell (22/23) Apr 03 2006 One problem with 'length' is that is not a keyword in all contexts, and ...
Norbert Nemec (20/51) Apr 03 2006 Sorry, I was not very clear: With "extending" I meant doing something

Lu�s Marques (5/6) Apr 04 2006 Before asking some more questions, how could you please give me a workin...

Norbert Nemec (11/18) Apr 04 2006 No, because multidimensional arrays do not exist yet, but are still in

Don Clugston (2/20) Apr 05 2006

Hong (4/9) Apr 03 2006 You can do a[1..length]. length always refer to the array length when it...

Lu�s Marques (4/14) Apr 03 2006 Oh yes, I was forgetting that variant. That's nice, thanks.
Norbert Nemec (21/37) Apr 03 2006 The "length" variant will become problematic once we have

novice2 (2/2) Apr 04 2006 please, can anybody explain me, wich "performance penalty" mentioned in ...

Lu�s Marques (15/17) Apr 04 2006 When array bounds checking is disabled, all the CPU has to do is somethi...

novice2 (11/15) Apr 04 2006 it is not depends on syntax or index sign. this check must be performed ...

Lu�s Marques (13/15) Apr 04 2006 Actually I don't think that can be true. That would only be the case if ...

novice2 (9/14) Apr 05 2006 but programmer defines algo - sequence of some operations.

xs0 (32/50) Apr 05 2006 "old" syntax is actually

Georg Wrede (5/31) Apr 04 2006 Well, we have two separate cases. First, if the array size is known at

novice2 (3/7) Apr 05 2006 but "indexing from the end". if we express it in current syntax, then wi...

Lu�s Marques (2/5) Apr 05 2006 See xs0's reponse.

"Lu�s Marques" <luismarques+spam gmail.com> writes:

Hello,

In the array section of the D docs (http://www.digitalmars.com/d/arrays.html),
there is no mention of the $ operator. Only the "a = a[2 .. a.length]" idiom is
shown. I suppose that should be fixed.

My experience of array slicing comes from Python, so I found it a bit surprising
that a "array[2..]" notation didn't exist ("array[2:]" in Python). I later found
the $ operator in others' code, which mostly fixes that.

How about supporting, as in Python:

a[-1] == a[$];
a[0..] == a[0..$];

I personally dislike the $ operator, it seems a casual fix for a lack of better
notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.

The Python syntax seems to me a better generalization of the concepts. It
doesn't require an extra operator, the indexes and the slice notation have all
the expressiveness required.

With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could go away.
Is it too late for that?

Lu�s Marques

Apr 03 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Lu=EDs Marques wrote:
 My experience of array slicing comes from Python, so I found it a bit s=

urprising
 that a "array[2..]" notation didn't exist ("array[2:]" in Python). I la=

ter found
 the $ operator in others' code, which mostly fixes that.

The omissive indices are not yet supported, but they could be added as
an abbreviation without much trouble. It is just one character ("0" or
"$" vs. "") difference. I don't have a strong opinion on that point.

 How about supporting, as in Python:
=20
 a[-1] =3D=3D a[$];

That should be:
a[-1] =3D=3D a[$-1]

 a[0..] =3D=3D a[0..$];
=20
 I personally dislike the $ operator, it seems a casual fix for a lack o=

f better
 notation and someting you'd expect to find in Perl.

I actually quite like the notation. At first, I didn't like the idea of
"wasting" the precious ASCII character "$" for this profane purpose, but
compared to all other proposed solutions, this is the cleanest.

Also, to me "$" does not seem too counterintuitive. The character has
the long tradition of signifying the end of a line in regular
expressions, which somehow feels related to the issue at hand, even
though I have never seriously worked with Perl.

 I still prefer it to the more verbose "a[2 .. a.length]" notation, thou=

gh.

That may be fine for "a", but it scales up very badly if you replace a
by a more complex expression.

The advantage of "$" will become even more obvious once there are is
better support for multidimensional arrays (yes, I'm still working on a
refined proposal for those) and the integer property "length" is
replaces by an integer array "shape":

consider adressing the interior (everything but the boundaries) of a 2D
array:
	somearray[1..somearray.shape[0]-1,1..somearray.shape[1]-1]
vs.
	somearray[1..$-1,1..$-1]

Or consider indexing via periodic boundary conditions on a 3D grid:

	mygrid[x%mygrid.shape[0],y%mygrid.shape[1],z%mygrid.shape[2]]
vs.
	mygrid[x%$,y%$,z%$]



 The verbose notation has the
 disadvantage of taking more whitespace; while most people read "a[2..3]=

" just
 fine it's a bit confusing to read "a[1..a.length]". Also, if you rename=

 or use a
 different array you have two instances to correct, instead of one.
=20
 The Python syntax seems to me a better generalization of the concepts. =

It
 doesn't require an extra operator, the indexes and the slice notation h=

ave all
 the expressiveness required.
=20
 With omissive indexes (a[..7]) and wrapping indexes (a[-2]) the $ could=

 go away.

These would be a bad replacement:
* omissive indexes are a nice abbreviation but have far less power than
the "$" symbol
* negative indices would have to be checked for at run time giving a
performace penalty that is unacceptable if D tries to aim for a position
in the league of high-performance computing languages.

Apr 03 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0rrgk$1gko$1 digitaldaemon.com>, Norbert Nemec says...
That may be fine for "a", but it scales up very badly if you replace a
by a more complex expression.

Notice that I did say I still prefer $ to the verbose .length, not that I prefer
using .length
I just find it ugly.

	somearray[1..$-1,1..$-1]

or somearray[1..-2,1..-2]

* omissive indexes are a nice abbreviation but have far less power than
the "$" symbol

Which cases are you thinking about?

* negative indices would have to be checked for at run time giving a
performace penalty that is unacceptable if D tries to aim for a position
in the league of high-performance computing languages.

Kind of like array out of bounds exceptions, which is already implemented?

Thanks for you feedback,

Lu�s

Apr 03 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0rupf$1kbg$1 digitaldaemon.com>, Lu�s Marques says...
or somearray[1..-2,1..-2]

somearray[1..-1,1..-1], of course. You are right about having missed the right
indexes, but otherwise my 
opinion remain. I guess it's a matter of style, mostly.

Lu�s

Apr 03 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Obviously we both agree that "$" is nicer than nothing (i.e. better than
having to use ".length")

I also agree with you that Python syntax looks nicer than D syntax. However

a) Python syntax is not an option for D, because it would require
runtime checks that result in an unnecessary performace penalty. This
cannot be compared to array bounds checks, which can be turned off after
debugging phase. The performance penalty would have to be payed by
everyone using arrays, even if they are not even interested in negative
indexing

b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!

B.t.w.: Be careful!

D syntax: somearray[1..$-1,1..$-1]
is equivalent to
Python syntax: somearray[1..-1,1..-1]

you mixed this up in your examples bringing in a -2 somewhere.


Finally you asked:
* omissive indexes are a nice abbreviation but have far less power than
the "$" symbol

 
 Which cases are you thinking about?

I call "omissive indices" only "[.." instead of "[0.." and "..]" instead
of "..$]". Obviously these cover only the trivial corner cases and the
syntax cannot naturally be extended to anything beyond these.

Greetings,
Norbert

Apr 03 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!

Yes, that one is very nice.

you mixed this up in your examples bringing in a -2 somewhere.

Thanks, I noticed it later.

I call "omissive indices" only "[.." instead of "[0.." and "..]" instead
of "..$]". Obviously these cover only the trivial corner cases and the
syntax cannot naturally be extended to anything beyond these.

How would you extend it with $?

That's a good review. Things like this should be noted somewhere..

So,
$ is better designed for multidimensional arrays
$ is shorter than .length

Still thinking, then, couldn't this work:

somearray[x%length,y%length,z%length]

length would evaluate differently for each dimension of the array.
The only advantage would be $ being shorter (somearray[0..] would work for the
simpler cases)

Lu�s Marques

Apr 03 2006

Derek Parnell <derek psych.ward> writes:

On Mon, 3 Apr 2006 22:47:37 +0000 (UTC), Lu�s Marques wrote:

 $ is shorter than .length

One problem with 'length' is that is not a keyword in all contexts, and so
an innocent coding mistake such as the one below can occur and is
manifested at run time. And even then its not so easy to see if you're not
focusing on 'length'.

import std.stdio;
void main()
{
    int length;
    char[] foo;

    foo = "derek parnell".dup;
    length = 4;
    while( foo[length] != 'a' )
        foo = foo[1..$];

    writefln("%s", foo);
}

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
4/04/2006 11:40:09 AM

Apr 03 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Lu=EDs Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
=20
b) The $ is even more powerful than Python-syntax in at least one very
interesting way:
	somearray[x%$,y%$,z%$]
is the most beautiful solution I ever saw for the common problem of
periodic boundaries!

=20
=20
 Yes, that one is very nice.
=20
=20
you mixed this up in your examples bringing in a -2 somewhere.

=20
=20
 Thanks, I noticed it later.
=20
=20
I call "omissive indices" only "[.." instead of "[0.." and "..]" instea=


d
of "..$]". Obviously these cover only the trivial corner cases and the
syntax cannot naturally be extended to anything beyond these.

=20
=20
 How would you extend it with $?

Sorry, I was not very clear: With "extending" I meant doing something
like "..$-3]". What I wanted to say is: an omissive end-index "..]"
would be a nice shorthand for "..$]", but omissive indices are not a
real alternative to the discussed alternatives that allow arithmetics.

 Still thinking, then, couldn't this work:
=20
 somearray[x%length,y%length,z%length]
=20
 length would evaluate differently for each dimension of the array.
 The only advantage would be $ being shorter (somearray[0..] would work =

for the
 simpler cases)

Putting Dereks answer in different words: The parser is built upon the
assumption that every "word" is either a "keyword" or an "identifier" at
all times during the compilation process

Making "length" a keyword would, in principle, be a solution, because
one could in principle pack in any special meaning that is needed.
However: "length" could then not be a property any more and could not be
used as variable name or anything else.

Leaving "length" an identifier, one would have to find a way to give it
the special meaning it needs. In any case, this would be problematic,
because it might always override a variable "length" that is defined in
the surrounding namespace.

The "$" solution is very much like the "keyword" solution from the
compiler perspective, but it does not collide with any existing code.

Apr 03 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
	somearray[x%$,y%$,z%$]

Before asking some more questions, how could you please give me a working
example for this?

Thanks,
Lu�s Marques

Apr 04 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

Lu=EDs Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...
=20
	somearray[x%$,y%$,z%$]

=20
=20
 Before asking some more questions, how could you please give me a worki=

ng
 example for this?

No, because multidimensional arrays do not exist yet, but are still in
the planning phase.

I've been spending some hours working on reworking the proposal lately,
but I have nothing new finished yet.

The 2-years-old version of the proposal can still be found at:
	http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.h=
tml

(It is quite outdated in the details, but should at least give an idea
of where it may be going.)

Apr 04 2006

Don Clugston <dac nospam.com.au> writes:

Norbert Nemec wrote:
 Lu�s Marques wrote:
 In article <e0s6a7$1sbs$1 digitaldaemon.com>, Norbert Nemec says...

 	somearray[x%$,y%$,z%$]

 Before asking some more questions, how could you please give me a working
 example for this?

 
 No, because multidimensional arrays do not exist yet, but are still in
 the planning phase.
 
 I've been spending some hours working on reworking the proposal lately,
 but I have nothing new finished yet.

I'm looking forward to seeing it. Could really be a killer feature.

 The 2-years-old version of the proposal can still be found at:
 	http://homepages.uni-regensburg.de/~nen10015/documents/D-multidimarray.html
 
 (It is quite outdated in the details, but should at least give an idea
 of where it may be going.)

Apr 05 2006

Hong <Hong_member pathlink.com> writes:

In article <e0rmga$1bi6$1 digitaldaemon.com>, Lu�s Marques says...

notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.

You can do a[1..length]. length always refer to the array length when it is used
inside the indexing operator. So there is only 1 place to correct when you
change the array name. length is more verbose but definitely more readable.

Apr 03 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0s0m0$1mp9$1 digitaldaemon.com>, Hong says...
In article <e0rmga$1bi6$1 digitaldaemon.com>, Lu�s Marques says...

notation and someting you'd expect to find in Perl. I still prefer it to the
more verbose "a[2 .. a.length]" notation, though. The verbose notation has the
disadvantage of taking more whitespace; while most people read "a[2..3]" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename or use a
different array you have two instances to correct, instead of one.

You can do a[1..length]. length always refer to the array length when it is used
inside the indexing operator. So there is only 1 place to correct when you
change the array name. length is more verbose but definitely more readable.

Oh yes, I was forgetting that variant. That's nice, thanks.
Perhaps one more reason against having $, though I guess several will disagree.

Lu�s

Apr 03 2006

Norbert Nemec <Norbert Nemec-online.de> writes:

The "length" variant will become problematic once we have
multidimensional arrays, where the integer number "length" is replaced
by an integer array "shape". "$" can be given a very special meaning by
the compiler without problems for the parser, allowing

	somearray[1..$-1,1..$-1,1..$-1]

to mean the same thing as

	somearray[1..shape[0]-1,1..shape[1]-1,1..shape[2]-1]

to get the same effect with using "length" instead of "$", the word
"length" would probably have to become a keyword, which would disallow
its use as property. (Unless you really want to mess up the parser...)

Greetings,
Norbert



Hong wrote:
 In article <e0rmga$1bi6$1 digitaldaemon.com>, Lu=E7=96=84 Marques says.=

=2E.
=20
=20
notation and someting you'd expect to find in Perl. I still prefer it t=


o the
more verbose "a[2 .. a.length]" notation, though. The verbose notation =


has the
disadvantage of taking more whitespace; while most people read "a[2..3]=


" just
fine it's a bit confusing to read "a[1..a.length]". Also, if you rename=


 or use a
different array you have two instances to correct, instead of one.

=20
=20
 You can do a[1..length]. length always refer to the array length when i=

t is used
 inside the indexing operator. So there is only 1 place to correct when =

you
 change the array name. length is more verbose but definitely more reada=

ble.
=20
=20

Apr 03 2006

novice2 <novice2_member pathlink.com> writes:

please, can anybody explain me, wich "performance penalty" mentioned in this
thread? how it come?

Apr 04 2006

"Lu�s Marques" <luismarque+spam gmail.com> writes:

In article <e0tdh2$bsn$1 digitaldaemon.com>, novice2 says...
please, can anybody explain me, wich "performance penalty" mentioned in this
thread? how it come?

When array bounds checking is disabled, all the CPU has to do is something like:

value = *(&array_contents + (index * element size));

The index is just added as an offset (times the element size, which can be
really fast if it's a multiple of 2). When you add bounds checking or negative
indexes, other operations must be performed. For bounds checking a conditional
jump must be issued (conditional jumps can be expensive for performance -- on
inner loops -- plus they add footprint). For negative indexes the corresponding
positive offset must be calculated before accessing the array index. Under some
circunstances the compiler could optimize that, when it would be possible to do
the calculation at compilation or to assume the index was a unsigned integer. I
suppose there's no general arithmetic algorithm for all cases (without
conditional jumps).

I guess that should be it, but others should know more about such matters.

Lu�s Marques

Apr 04 2006

novice2 <novice2_member pathlink.com> writes:

Thanx, Lu�s Marques for reply, but...

When you add bounds checking or negative
indexes, other operations must be performed. For bounds checking a conditional

it is not depends on syntax or index sign. this check must be performed (if
enabled), either positive or negative index.
Even more: we all know, source code and compiler output is not the same. and
source code with negative indexes or other index syntax will be compiled into
the same cpu operations, if it means the same.

inner loops -- plus they add footprint). For negative indexes the corresponding
positive offset must be calculated before accessing the array index. Under some

this is not true. when we use other syntax for some operations the compiler
output will be the same. even if we replace some several operations with one
syntax construction compiler can generate more optimized code.

i just want to say, IMHO, all changes in index/slicing syntax will change
compiler only. not generated binary.

Apr 04 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e0tvba$1aav$1 digitaldaemon.com>, novice2 says...
i just want to say, IMHO, all changes in index/slicing syntax will change
compiler only. not generated binary.

Actually I don't think that can be true. That would only be the case if the
indexes were defined at compilation time. Assuming positive indexes only, the
following code...

int index = get_int_from_user();
value = array[index];

could be translated into:

value = *(&array_contents + (index * element size));

If we allow negative indexes then that doesn't work. "index" is computed at
runtime so the code generated must change to allow that. At least that's the way
I see it.

About bounds checking, I didn't mean to imply it depends on the syntax.

Lu�s Marques

Apr 04 2006

novice2 <novice2_member pathlink.com> writes:

i just want to say, IMHO, all changes in index/slicing syntax will change
compiler only. not generated binary.

Actually I don't think that can be true. That would only be the case if the
indexes were defined at compilation time. Assuming positive indexes only, the
following code...

but programmer defines algo - sequence of some operations.
we should compare equal code for currnet syntax and new syntax.
equal code - with equal results.

just imaginary example:

task: get 3rd item from end in array;
new synatx: b = a[-3];
old syntax: b = a[length-3];

only now we can compare, wich actions will be performed in runtime,
and guess, will be it different or not.

Apr 05 2006

xs0 <xs0 xs0.com> writes:

novice2 wrote:
 i just want to say, IMHO, all changes in index/slicing syntax will change
 compiler only. not generated binary.

 Actually I don't think that can be true. That would only be the case if the
 indexes were defined at compilation time. Assuming positive indexes only, the
 following code...

 
 but programmer defines algo - sequence of some operations.
 we should compare equal code for currnet syntax and new syntax.
 equal code - with equal results.
 
 just imaginary example:
 
 task: get 3rd item from end in array;
 new synatx: b = a[-3];
 old syntax: b = a[length-3];
 
 only now we can compare, wich actions will be performed in runtime,
 and guess, will be it different or not.

"old" syntax is actually

b = a[$-3];

But that is not a good example, because the index is a constant. A 
better example would be:

int idx=someFunc();
b=a[idx];

It currently compiles to something like

int idx=someFunc();
debug {
    if (idx<0 || idx>=a.length)
       throw ...;
}
b=a[idx];

If you use -release, the debug {} section is eliminated and there is no 
cost in speed. On the other hand, with the new version, the result would 
be something like

int idx=someFunc();
if (idx<0) {
     idx+=a.length;
}
debug {
     if (idx<0 || idx>=a.length)
         throw ...;
}
b=a[idx];

The problem is that in the general case, the compiler cannot determine 
whether idx will be negative or not, and cannot optimize away the first 
if() statement, meaning every array access will be notably slower, even 
if you don't use negative indices at all.

Is that inefficiency worth the change? I don't think it is...


xs0

Apr 05 2006

Georg Wrede <georg.wrede nospam.org> writes:

novice2 wrote:
 Thanx, Lu�s Marques for reply, but...
 
 
 When you add bounds checking or negative indexes, other operations
 must be performed. For bounds checking a conditional

 
 
 it is not depends on syntax or index sign. this check must be
 performed (if enabled), either positive or negative index. Even more:
 we all know, source code and compiler output is not the same. and 
 source code with negative indexes or other index syntax will be
 compiled into the same cpu operations, if it means the same.
 
 
 inner loops -- plus they add footprint). For negative indexes the
 corresponding positive offset must be calculated before accessing
 the array index. Under some

 
 
 this is not true. when we use other syntax for some operations the
 compiler output will be the same. even if we replace some several
 operations with one syntax construction compiler can generate more
 optimized code.
 
 i just want to say, IMHO, all changes in index/slicing syntax will
 change compiler only. not generated binary.

Well, we have two separate cases. First, if the array size is known at 
compile time, then all of this can be take care of without runtime 
penalty. Second, if we are talking about dynamic arrays, then it does 
make a difference whether we allow indexing from the end of the array.

Apr 04 2006

novice2 <novice2_member pathlink.com> writes:

 i just want to say, IMHO, all changes in index/slicing syntax will
 change compiler only. not generated binary.

penalty. Second, if we are talking about dynamic arrays, then it does 
make a difference whether we allow indexing from the end of the array.

but "indexing from the end". if we express it in current syntax, then will no
difference. we should compare equivalent code. at least codes with equivalent
results of it execution.

Apr 05 2006

"Lu�s Marques" <luismarques+spam gmail.com> writes:

In article <e106tf$1l5v$1 digitaldaemon.com>, novice2 says...
but "indexing from the end". if we express it in current syntax, then will no
difference. we should compare equivalent code. at least codes with equivalent
results of it execution.

See xs0's reponse.

Apr 05 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Array indexing and slicing