digitalmars.D - Smart slicing

bearophile (72/84) Mar 20 2008 'c'

Trevor Parscal (4/111) Mar 20 2008 Allowing negetive indexes seems like a nice feature - but in fact acts t...

bearophile (4/5) Mar 20 2008 In practice it rarely produces bugs, but note that my whole post was not...

BCS (2/14) Mar 20 2008

renoX (5/24) Mar 20 2008 Sure but it isn't the same semantic for the positive overflow as was

Trevor Parscal (4/11) Mar 20 2008 Allowing out-of-bounds indexes hides bugs. An out of bounds index should...

bearophile (5/7) Mar 20 2008 In my post I am talking about using an alternative (standard) syntax fur...

Neal Becker (6/22) Mar 23 2008 I use my own vector/matix code in python. I made sure that out-of-bound...
Jesse Phillips (9/16) Mar 30 2008 He is not referring to when it is the programmers intent to do it. It is...

bearophile (5/9) Mar 30 2008 I was talking about _slicing_, not about array indexing (that is 1..5 in...

Jesse Phillips (10/27) Mar 30 2008 ah, yes, my mistake. I forgot that I was reading old articles. In any

bearophile (8/15) Mar 31 2008 In C/C++ you often put bugs like that in the code because:

Bill Baxter (13/24) Mar 20 2008 I had no idea you could do that in Python. I'd much rather get an error...

Neal Becker (5/42) Mar 22 2008 Please no!

bearophile (5/8) Mar 22 2008 I see. But I don't see them changing this feature, even if they now can ...

Janice Caron (12/13) Mar 20 2008 or like this:

bearophile <bearophileHUGS lycos.com> writes:

Hello, I have discussed this topic a bit on the IRC #D channel too.

Array slicing is a very useful feature of D. In Python the array (list) slicing
is even more powerful, because array indexes can be negative (they wrap around,
so -1 equals to $-1 in D, but I like the D solution too, it's faster too), and
it has a third optional parameter (stride) that can be a negative number too:

 "abcdefghilmn"[2]



'c'
 "abcdefghilmn"[2:11]



'cdefghilm'
 "abcdefghilmn"[2:]



'cdefghilmn'
 "abcdefghilmn"[:2]



'ab'
 "abcdefghilmn"[-1]



'n'
 "abcdefghilmn"[-2]



'm'
 "abcdefghilmn"[2:-2]



'cdefghil'
 "abcdefghilmn"[11:2]



''
 "abcdefghilmn"[11:2:-1]



'nmlihgfed'
 "abcdefghilmn"[11:2:-2]



'nlhfd'
 "abcdefghilmn"[2:30]



'cdefghilmn'
 "abcdefghilmn"[30:50]



''

(I presume D has used .. instead of : because : is used by AAs, and {} can't be
used to denote AAs because it's used already in the C syntax).
I don't miss all those features in D (D arrays have .reverse that replaces the
common case of stride=-1), but I miss the feature you can see in the last two
examples, that is the overflowing index is put back to the actual max-min index
bound of the array.

A bare-bone semantic like that can be implemented in D like this (no negative
indexes, no optional stride. But it's easy to add a third overloaded function
for the stride too):

T[] slice(T)(T[] array, int start) {
    int end = array.length;
    if (end <= start)
        return new T[0];
    if (start < 0)
        start = 0;
    return array[start .. end];
}

T[] slice(T)(T[] array, int start, int end) {
    if (end <= start)
        return new T[0];
    if (start < 0)
        start = 0;
    if (end > array.length)
        end = array.length;
    return array[start .. end];
}

later you can write:
"abcdefghilmn".slice(2,30)


Some questions:
1) What's the purpose of this smart slicing in the core of the language?
It avoids errors, and speeds up programming a little because you need less care
in slicing bounds.
In Python and Ruby you use such feature often.
You can use it to take "50 characters or less" from a string (less if the
string is shorter than 50), or to safely produce empty slices in some
situations. I know you can do the same with D, for example:

somestring[0 .. $ < 50 ? $ : 50]
Or:
somestring[0 .. min(50, $)]

But it's less immediate, expecially when the 50 is a variable, so you need:

somestring[0 .. min(max(x, 0), $))]

Another small example, replace/create heads of sub-arrays with a given value
(not in-place):


import d.func: list, putr, map;

T[][] replace_heads1(T)(T[][] seq, T x) { // Error: ArrayBoundsError
    T[] sub;
    return list([x]~sub[1..$], sub, seq);
}

T[][] replace_heads2(T)(T[][] seq, T x) {
    T[] sub;
    return list(sub.length ? [x]~sub[1..$] : [x], sub, seq);
}

T[][] replace_heads3(T)(T[][] seq, T x) { // smart slicing
    T[] sub;
    return list([x]~sub.slice(1), sub, seq);
}

void main() {
    int[][] l = [[1, 2, 3], [4, 5], [6], []];
    //putr(replace_heads1(l, 0));
    putr(replace_heads2(l, 0)); // Prints: [[0, 2, 3], [0, 5], [0], [0]]
    putr(replace_heads2(l, 0)); // Prints: [[0, 2, 3], [0, 5], [0], [0]]
}

(list() is like a Python list comphrension)

2) 'Smart slicing' can't be used on poiter-based arrays (C ones), because their
length isn't known at runtime (but it can be done at compile time on static
arrays). This just means you can't rely on smart slicing on such lower-level
arrays.

3) Such smart slicing is slower, and I think at compile-time the compiler can't
optimize it much. D is designed to be quite faster than Python/Ruby, so such
slowdown may be too much.
C++ gives at() and [] to index a a vector, in a safe and unsafe way, so for D
it may be invented a different syntax for safe slicing. I think those slice()
functions may be enough (and you have to import them from the std lib), when
the language will allow to use $ in user functions too. In the meantime the $
hask may be enough.

Bye,
bearophile

Mar 20 2008

Trevor Parscal <trevorparscal hotmail.com> writes:

bearophile Wrote:

 Hello, I have discussed this topic a bit on the IRC #D channel too.
 
 Array slicing is a very useful feature of D. In Python the array (list)
slicing is even more powerful, because array indexes can be negative (they wrap
around, so -1 equals to $-1 in D, but I like the D solution too, it's faster
too), and it has a third optional parameter (stride) that can be a negative
number too:
 
 "abcdefghilmn"[2]



 'c'
 "abcdefghilmn"[2:11]



 'cdefghilm'
 "abcdefghilmn"[2:]



 'cdefghilmn'
 "abcdefghilmn"[:2]



 'ab'
 "abcdefghilmn"[-1]



 'n'
 "abcdefghilmn"[-2]



 'm'
 "abcdefghilmn"[2:-2]



 'cdefghil'
 "abcdefghilmn"[11:2]



 ''
 "abcdefghilmn"[11:2:-1]



 'nmlihgfed'
 "abcdefghilmn"[11:2:-2]



 'nlhfd'
 "abcdefghilmn"[2:30]



 'cdefghilmn'
 "abcdefghilmn"[30:50]



 ''
 
 (I presume D has used .. instead of : because : is used by AAs, and {} can't
be used to denote AAs because it's used already in the C syntax).
 I don't miss all those features in D (D arrays have .reverse that replaces the
common case of stride=-1), but I miss the feature you can see in the last two
examples, that is the overflowing index is put back to the actual max-min index
bound of the array.
 
 A bare-bone semantic like that can be implemented in D like this (no negative
indexes, no optional stride. But it's easy to add a third overloaded function
for the stride too):
 
 T[] slice(T)(T[] array, int start) {
     int end = array.length;
     if (end <= start)
         return new T[0];
     if (start < 0)
         start = 0;
     return array[start .. end];
 }
 
 T[] slice(T)(T[] array, int start, int end) {
     if (end <= start)
         return new T[0];
     if (start < 0)
         start = 0;
     if (end > array.length)
         end = array.length;
     return array[start .. end];
 }
 
 later you can write:
 "abcdefghilmn".slice(2,30)
 
 
 Some questions:
 1) What's the purpose of this smart slicing in the core of the language?
 It avoids errors, and speeds up programming a little because you need less
care in slicing bounds.
 In Python and Ruby you use such feature often.
 You can use it to take "50 characters or less" from a string (less if the
string is shorter than 50), or to safely produce empty slices in some
situations. I know you can do the same with D, for example:
 
 somestring[0 .. $ < 50 ? $ : 50]
 Or:
 somestring[0 .. min(50, $)]
 
 But it's less immediate, expecially when the 50 is a variable, so you need:
 
 somestring[0 .. min(max(x, 0), $))]
 
 Another small example, replace/create heads of sub-arrays with a given value
(not in-place):
 
 
 import d.func: list, putr, map;
 
 T[][] replace_heads1(T)(T[][] seq, T x) { // Error: ArrayBoundsError
     T[] sub;
     return list([x]~sub[1..$], sub, seq);
 }
 
 T[][] replace_heads2(T)(T[][] seq, T x) {
     T[] sub;
     return list(sub.length ? [x]~sub[1..$] : [x], sub, seq);
 }
 
 T[][] replace_heads3(T)(T[][] seq, T x) { // smart slicing
     T[] sub;
     return list([x]~sub.slice(1), sub, seq);
 }
 
 void main() {
     int[][] l = [[1, 2, 3], [4, 5], [6], []];
     //putr(replace_heads1(l, 0));
     putr(replace_heads2(l, 0)); // Prints: [[0, 2, 3], [0, 5], [0], [0]]
     putr(replace_heads2(l, 0)); // Prints: [[0, 2, 3], [0, 5], [0], [0]]
 }
 
 (list() is like a Python list comphrension)
 
 2) 'Smart slicing' can't be used on poiter-based arrays (C ones), because
their length isn't known at runtime (but it can be done at compile time on
static arrays). This just means you can't rely on smart slicing on such
lower-level arrays.
 
 3) Such smart slicing is slower, and I think at compile-time the compiler
can't optimize it much. D is designed to be quite faster than Python/Ruby, so
such slowdown may be too much.
 C++ gives at() and [] to index a a vector, in a safe and unsafe way, so for D
it may be invented a different syntax for safe slicing. I think those slice()
functions may be enough (and you have to import them from the std lib), when
the language will allow to use $ in user functions too. In the meantime the $
hask may be enough.
 
 Bye,
 bearophile

Allowing negetive indexes seems like a nice feature - but in fact acts to hide
bugs in software, which is something that D makes efforts not to do. 

In other words - a negative index should be wrapped by you, not the compiler -
and honestly, it's not hard to say (i % arr.length) in those very seldom cases
in which wrapping the index of an array is needed.

- Trevor

Mar 20 2008

bearophile <bearophileHUGS lycos.com> writes:

Trevor Parscal:
 Allowing negetive indexes seems like a nice feature - but in fact acts to hide
bugs in software, which is something that D makes efforts not to do.

In practice it rarely produces bugs, but note that my whole post was not about
negative indexes, it was about out-of-bound arrays (positive) indexes.

Bye,
bearophile

Mar 20 2008

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Trevor Parscal:
 
 Allowing negetive indexes seems like a nice feature - but in fact
 acts to hide bugs in software, which is something that D makes
 efforts not to do.
 

 In practice it rarely produces bugs, but note that my whole post was
 not about negative indexes, it was about out-of-bound arrays
 (positive) indexes.
 

BTW: this "array[i%$];" does both.

 Bye,
 bearophile

Mar 20 2008

renoX <renosky free.fr> writes:

BCS a �crit :
 Reply to bearophile,
 
 Trevor Parscal:

 Allowing negetive indexes seems like a nice feature - but in fact
 acts to hide bugs in software, which is something that D makes
 efforts not to do.

 In practice it rarely produces bugs, but note that my whole post was
 not about negative indexes, it was about out-of-bound arrays
 (positive) indexes.

 
 BTW: this "array[i%$];" does both.

Sure but it isn't the same semantic for the positive overflow as was 
exposed by bearophile: 'saturation' vs 'modulo' arithmetic.

Bye,
renoX

 
 Bye,
 bearophile

Mar 20 2008

Trevor Parscal <trevorparscal hotmail.com> writes:

bearophile Wrote:

 Trevor Parscal:
 Allowing negetive indexes seems like a nice feature - but in fact acts to hide
bugs in software, which is something that D makes efforts not to do.

 
 In practice it rarely produces bugs, but note that my whole post was not about
negative indexes, it was about out-of-bound arrays (positive) indexes.
 
 Bye,
 bearophile

Allowing out-of-bounds indexes hides bugs. An out of bounds index should always
be an error which is considered exceptional.

If you handle the index wrapping yourself, it's obvious to everyone that in
that particular case that would be an OK thing to do - in all other cases, it
could potentially be an error..

- Trevor

Mar 20 2008

bearophile <bearophileHUGS lycos.com> writes:

Trevor Parscal Wrote:
 Allowing out-of-bounds indexes hides bugs. An out of bounds index should
always be an error which is considered exceptional.

I (like all all Ruby and Python programmers) use such things all the time, and
only once in a while it actually produces a bug in my programs, so I presume
you are wrong.


 If you handle the index wrapping yourself, it's obvious to everyone that in
that particular case that would be an OK thing to do - in all other cases, it
could potentially be an error..

In my post I am talking about using an alternative (standard) syntax fur such
usages (to keep speed up, not to remove bugs).

Bye,
bearophile

Mar 20 2008

Neal Becker <ndbecker2 gmail.com> writes:

bearophile wrote:

 Trevor Parscal Wrote:
 Allowing out-of-bounds indexes hides bugs. An out of bounds index should
 always be an error which is considered exceptional.

 
 I (like all all Ruby and Python programmers) use such things all the time,
 and only once in a while it actually produces a bug in my programs, so I
 presume you are wrong.
 
 
 If you handle the index wrapping yourself, it's obvious to everyone that
 in that particular case that would be an OK thing to do - in all other
 cases, it could potentially be an error..

 
 In my post I am talking about using an alternative (standard) syntax fur
 such usages (to keep speed up, not to remove bugs).
 

I use my own vector/matix code in python.  I made sure that out-of-bounds
indexes are not silently ignored.  This is a very valuable feature.

y[a:b:c] = z[d:e]

I rely on the error checking here to tell me when I mis-calculated.  It has
found many errors for me.

Mar 23 2008

Jesse Phillips <jessekphillips gmail.com> writes:

On Thu, 20 Mar 2008 20:48:34 -0400, bearophile wrote:

 Trevor Parscal Wrote:
 Allowing out-of-bounds indexes hides bugs. An out of bounds index
 should always be an error which is considered exceptional.

 
 I (like all all Ruby and Python programmers) use such things all the
 time, and only once in a while it actually produces a bug in my
 programs, so I presume you are wrong.

He is not referring to when it is the programmers intent to do it. It is 
the times when an index has gone out of bounds by accident.

auto str = "Hello";
auto i = 0;

while(true) {
     writefln(str[i++]);
}

ooops, I forgot an exit condition.

Mar 30 2008

bearophile <bearophileHUGS lycos.com> writes:

Jesse Phillips:
 He is not referring to when it is the programmers intent to do it. It is 
 the times when an index has gone out of bounds by accident.
 while(true) {
      writefln(str[i++]);

I was talking about _slicing_, not about array indexing (that is 1..5 instead
of 5), so you are talking about something different. For a single index I agree
that's an error (and that's a runtime error in Python too).

In the meantime nearly everyone has said this is a bad feature to add to D, so
I have stopped advocating for it :-) There are quite more important things I
like to think/talk about.

Bye,
bearophile

Mar 30 2008

Jesse Phillips <jessekphillips gmail.com> writes:

On Sun, 30 Mar 2008 19:44:57 -0400, bearophile wrote:

 Jesse Phillips:
 He is not referring to when it is the programmers intent to do it. It
 is the times when an index has gone out of bounds by accident.
 while(true) {
      writefln(str[i++]);

 
 I was talking about _slicing_, not about array indexing (that is 1..5
 instead of 5), so you are talking about something different. For a
 single index I agree that's an error (and that's a runtime error in
 Python too).
 
 In the meantime nearly everyone has said this is a bad feature to add to
 D, so I have stopped advocating for it :-) There are quite more
 important things I like to think/talk about.
 
 Bye,
 bearophile

ah, yes, my mistake. I forgot that I was reading old articles. In any 
case I will explain what I was aiming for, and I did mean to use slicing.

while(true) {
    auto subStr = str[0..i++];
    //Do stuff with subStr
}

I just wanted to point out that if a variable is used an the programmer 
wishes to stop the said "loop" when you have reached the end but forgot 
to check, you have a working program that false to work as expected.

Mar 30 2008

bearophile <bearophileHUGS lycos.com> writes:

Jesse Phillips:
 while(true) {
     auto subStr = str[0..i++];
     //Do stuff with subStr
 }
 I just wanted to point out that if a variable is used an the programmer 
 wishes to stop the said "loop" when you have reached the end but forgot 
 to check, you have a working program that false to work as expected.

In C/C++ you often put bugs like that in the code because:
- you are using too much low level abstractions: higher level functions (like
map, filter, etc, and list comp/generators too) allow you to iterate and
process collections avoiding to manage indexes by yourself.
- the syntax is so cluttered that it doesn't let you see the bugs well;
- You don't have an interactive shell to try every little snippet of code;
- (and there are other causes too).

Bye,
bearophile

Mar 31 2008

Bill Baxter <dnewsgroup billbaxter.com> writes:

bearophile wrote:
 Hello, I have discussed this topic a bit on the IRC #D channel too.
 
 Array slicing is a very useful feature of D. In Python the array (list)
slicing is even more powerful, because array indexes can be negative (they wrap
around, so -1 equals to $-1 in D, but I like the D solution too, it's faster
too), and it has a third optional parameter (stride) that can be a negative
number too:
 

 "abcdefghilmn"[2:30]



 'cdefghilmn'
 "abcdefghilmn"[30:50]



 ''
 
 (I presume D has used .. instead of : because : is used by AAs, and {} can't
be used to denote AAs because it's used already in the C syntax).
 I don't miss all those features in D (D arrays have .reverse that replaces the
common case of stride=-1), but I miss the feature you can see in the last two
examples, that is the overflowing index is put back to the actual max-min index
bound of the array.

I had no idea you could do that in Python.  I'd much rather get an error 
there.  My guess is that most cases where you aren't sure how much to 
slice out are covered by end-relative indexes, like stuff[30:] or 
stuff[30:-1].  In D syntax stuff[30..$] and stuff[30..$-1].

For safe-to-go-out-of-bounds slicing like you're talking about, I'd 
rather implement that as a function call, something like your
   stuff.slice(2,30)
or you can probably get this syntax with current D
   stuff.slice[2..30]

It's not something that I'd want to use most of the time, and when I do, 
I'd want to make it clear that I'm relying on automatic bounds trimming.

--bb

Mar 20 2008

Neal Becker <ndbecker2 gmail.com> writes:

Bill Baxter wrote:

 bearophile wrote:
 Hello, I have discussed this topic a bit on the IRC #D channel too.
 
 Array slicing is a very useful feature of D. In Python the array (list)
 slicing is even more powerful, because array indexes can be negative
 (they wrap around, so -1 equals to $-1 in D, but I like the D solution
 too, it's faster too), and it has a third optional parameter (stride)
 that can be a negative number too:
 

 
 "abcdefghilmn"[2:30]



 'cdefghilmn'
 "abcdefghilmn"[30:50]



 ''
 
 (I presume D has used .. instead of : because : is used by AAs, and {}
 can't be used to denote AAs because it's used already in the C syntax). I
 don't miss all those features in D (D arrays have .reverse that replaces
 the common case of stride=-1), but I miss the feature you can see in the
 last two examples, that is the overflowing index is put back to the
 actual max-min index bound of the array.

 
 I had no idea you could do that in Python.  I'd much rather get an error
 there.  My guess is that most cases where you aren't sure how much to
 slice out are covered by end-relative indexes, like stuff[30:] or
 stuff[30:-1].  In D syntax stuff[30..$] and stuff[30..$-1].
 
 For safe-to-go-out-of-bounds slicing like you're talking about, I'd
 rather implement that as a function call, something like your
    stuff.slice(2,30)
 or you can probably get this syntax with current D
    stuff.slice[2..30]
 
 It's not something that I'd want to use most of the time, and when I do,
 I'd want to make it clear that I'm relying on automatic bounds trimming.
 
 --bb

Please no!

Silently ignoring the error is bad and unpythonic.  When I posted this
opinion to python-devel BDFL Guido replied, basically agreeing with me, but
saying it was too late to change.  But for D, Please no!

Mar 22 2008

bearophile <bearophileHUGS lycos.com> writes:

Neal Becker:
 Silently ignoring the error is bad and unpythonic.  When I posted this
 opinion to python-devel BDFL Guido replied, basically agreeing with me, but
 saying it was too late to change.  But for D, Please no!

I see. But I don't see them changing this feature, even if they now can do it
in Python 3.x.
(I think that D disables such error controls when you add the -release compiler
flag. While you can't do that in Python. So the situation is quite different.)

Bye,
bearophile

Mar 22 2008

"Janice Caron" <caron800 googlemail.com> writes:

On 20/03/2008, bearophile <bearophileHUGS lycos.com> wrote:
  A bare-bone semantic like that can be implemented in D like this

or like this:

    struct Slice(T)
    {
        T[] array;
        T opIndex(int i) {...}
        Slice opSlice(int i, int j) {...}
        Slice stride(int i, int j, int k) {...}
        /*etc*/
    }

That way, you can preserve the D syntax and at the same time allow
smart slicing.

Mar 20 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Smart slicing