digitalmars.D - Go and generic programming on reddit, also touches on D

Andrei Alexandrescu (3/3) Sep 17 2011 Quite interesting.

Timon Gehr (11/14) Sep 18 2011 containers. The huge problem with slices is dogfood-related - they are >...

Peter Alexander (15/33) Sep 18 2011 Yes. As I understand, Andrei prefers things in libraries and Walter

Timon Gehr (10/47) Sep 18 2011 Well, I think arrays should be built-in. What I was implying was that D,...

Peter Alexander (13/67) Sep 18 2011 Yes, you are right. D does have some "magic". Every language has to draw...

Timon Gehr (14/87) Sep 18 2011 It is not at all about syntax sugar, or whether or not the functionality...

Andrei Alexandrescu (15/33) Sep 18 2011 That's an incorrect view of my statement. Yes, D's slices are built-in

Timon Gehr (42/83) Sep 18 2011 I don't think so. The "special" feature we are talking about are

Nick Sabalausky (6/14) Sep 18 2011 He's saying that one of the few advantages slices have left over

Timon Gehr (4/21) Sep 18 2011 OK, thanks for clarifying.
Andrei Alexandrescu (4/21) Sep 18 2011 opDollar is more powerful because it can be made to work with infinite

Timon Gehr (2/26) Sep 18 2011 What would it return?

dsimcha (17/18) Sep 18 2011 A dummy type, e.g.:

Timon Gehr (13/31) Sep 18 2011 Ok, but what about

Simen Kjaeraas (10/47) Sep 18 2011 Compile-time error. struct Dollar does not have an overloaded <

Steven Schveighoffer (4/33) Sep 19 2011 Not all types that have an end also support .length, or use sequential

Timon Gehr (8/42) Sep 19 2011 Yes, but D has already gone the 'being inflexible' path for the

Steven Schveighoffer (20/70) Sep 19 2011 I was specifically thinking of red-black tree, which has a distinct end,...

Timon Gehr (5/77) Sep 19 2011 But you'd have to compute the length anyways in the general case:

Steven Schveighoffer (13/25) Sep 19 2011 That's half the string in code units, not code points.

Timon Gehr (18/47) Sep 19 2011 Programmers have to be aware of that if they want efficient code that

Steven Schveighoffer (17/65) Sep 19 2011 I find the same. I don't think I've ever dealt with arbitrary math

Timon Gehr (6/75) Sep 19 2011 Ok. I feel the way narrow strings are handled in Phobos are a reasonable...

Steven Schveighoffer (11/18) Sep 19 2011 I think $ should not pose any type restrictions, and whatever type you

Marco Leise (9/28) Sep 20 2011 The documentation of such generic code would make it obvious, that it

Kagamin (4/9) Sep 19 2011 Where you got that "1"?

Steven Schveighoffer (16/29) Sep 19 2011 set[set.begin .. 7]

dsimcha (4/7) Sep 18 2011 Yes, this is important. IMHO, though, the default behavior of the $

Andrei Alexandrescu (9/17) Sep 18 2011 struct MyRange

Simen Kjaeraas (6/23) Sep 18 2011 This works if opDollar is expected to be a niladic function. For multi-
dsimcha (5/13) Sep 18 2011 The problem with this is that everything in std.algorithm and std.range

Simen Kjaeraas (8/32) Sep 18 2011 Also, multi-dimensional structures:

Andrei Alexandrescu (4/28) Sep 18 2011 You're free to think anything, but I know what I said and your view of

Timon Gehr (5/38) Sep 18 2011 Well, I'd say there was a misunderstanding. Understanding someone who is...

Steven Schveighoffer (14/19) Sep 19 2011 I agree, I don't see why the representation is inherently better. Some ...

Timon Gehr (37/57) Sep 19 2011 Ok, but I'd like the opBinary to not even be put into the object file. I...
Andrei Alexandrescu (4/14) Sep 19 2011 That's where frequency of use comes into play. I'm thinking popFront

Steven Schveighoffer (33/49) Sep 19 2011 I'm not so sure. It's very difficult to prove this is the case (either ...

Steven Schveighoffer (5/7) Sep 19 2011 oops, victim of half-editing there :)

Timon Gehr (23/39) Sep 19 2011 Normally, each popFront comes with an accompanying empty, and the
Robert Jacques (37/53) Sep 19 2011 Well, looking at the two implementations:

Steven Schveighoffer (10/50) Sep 19 2011 I would hope something like this would be optimized by the compiler:

Robert Jacques (3/59) Sep 19 2011 Unfortunately, the compiler isn't going to auto-magically optimize the l...

Andrei Alexandrescu (3/5) Sep 19 2011 Benchmark. A few of your assumptions don't hold.

Andrei Alexandrescu (4/9) Sep 19 2011 s/don't/may not/
Robert Jacques (102/107) Sep 19 2011 So I ran Timon Gehr's benchmark on my system 10 times, and although the ...

Robert Jacques (2/12) Sep 19 2011 P.S. All results are compiled using -O -release -inline and run on a sin...
Timon Gehr (3/124) Sep 20 2011 Thank you for making this more meaningful! I assumed the standard

Robert Jacques (3/5) Sep 20 2011 Yes and no. Benchmark provides a good way to make a single measurement o...

Don (14/19) Sep 21 2011 Note that high-performance libraries that use slices, like GMP and the

Andrei Alexandrescu (4/25) Sep 21 2011 To all who replied and tested - color me convinced we should keep the

Robert Jacques (2/30) Sep 22 2011 No problem. Also, TDPL uses ptr+ptr for its representation. Having gone ...

bearophile (11/12) Sep 18 2011 Andrei Alexandrescu:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Quite interesting.

http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

Andrei

Sep 17 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of 

containers. The huge problem with slices is dogfood-related - they are > 
"magic" because the language features proposed to programmers were not 
enough for expressing a simple abstraction. Reserving "special" features 
for the language is a terrible way to go about programming language design.

Don't D arrays do a similar thing? They are not templates, yet work with 
generic element types.

Afaics, improving the language to the point were dynamic array-like 
structures could be implemented in the library without resulting in a 
bloated executable would be quite involved.

Sep 18 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 18/09/11 5:08 PM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
  > The problem is, Vector was just an example of a multitude of
 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

Yes. As I understand, Andrei prefers things in libraries and Walter 
prefers things built in to the compiler (obviously an 
oversimplification, but I believe that's the general way they 'lean').

There's advantages to both. Being implementable in a library means that 
they can easily be swapped out or modified to work with other code, but 
being built-in ("magic", as Andrei puts it) means that the compiler has 
greater awareness of them and can do better optimizations, give better 
errors etc.

Of course, there are ways of extending the language to provide better 
errors and allow better optimizations (e.g. 'pure' in D), but as a 
purely practical matter, it's easier if they are just built-in.


 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

I don't think you'd get much bloat in D by implementing dynamic arrays 
with templates. Remember, the built-in arrays *are* mostly implemented 
in D: https://github.com/D-Programming-Language/druntime/tree/master/src/rt

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 07:16 PM, Peter Alexander wrote:
 On 18/09/11 5:08 PM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

 Yes. As I understand, Andrei prefers things in libraries and Walter
 prefers things built in to the compiler (obviously an
 oversimplification, but I believe that's the general way they 'lean').

 There's advantages to both. Being implementable in a library means that
 they can easily be swapped out or modified to work with other code, but
 being built-in ("magic", as Andrei puts it) means that the compiler has
 greater awareness of them and can do better optimizations, give better
 errors etc.

 Of course, there are ways of extending the language to provide better
 errors and allow better optimizations (e.g. 'pure' in D), but as a
 purely practical matter, it's easier if they are just built-in.

Well, I think arrays should be built-in. What I was implying was that D, 
not entirely unlike Go, also has some magic. You can get almost the same 
with templates, so it is way better in that aspect than Go, but it is 
still there.

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 I don't think you'd get much bloat in D by implementing dynamic arrays
 with templates.  Remember, the built-in arrays *are* mostly implemented
 in D: https://github.com/D-Programming-Language/druntime/tree/master/src/rt

Those work like generics, not like templates. Making them templates 
would duplicate all the runtime functions that process arrays for every 
element type it is instantiated with. And I am sure that would add some 
bloat. D has no generics, just templates. But built-in arrays work like 
generics.

Sep 18 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 18/09/11 6:55 PM, Timon Gehr wrote:
 On 09/18/2011 07:16 PM, Peter Alexander wrote:
 On 18/09/11 5:08 PM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

 Yes. As I understand, Andrei prefers things in libraries and Walter
 prefers things built in to the compiler (obviously an
 oversimplification, but I believe that's the general way they 'lean').

 There's advantages to both. Being implementable in a library means that
 they can easily be swapped out or modified to work with other code, but
 being built-in ("magic", as Andrei puts it) means that the compiler has
 greater awareness of them and can do better optimizations, give better
 errors etc.

 Of course, there are ways of extending the language to provide better
 errors and allow better optimizations (e.g. 'pure' in D), but as a
 purely practical matter, it's easier if they are just built-in.

 Well, I think arrays should be built-in. What I was implying was that D,
 not entirely unlike Go, also has some magic. You can get almost the same
 with templates, so it is way better in that aspect than Go, but it is
 still there.

Yes, you are right. D does have some "magic". Every language has to draw 
a line between library code and built-in code. The only way a language 
can be purely library is if the "language" is just the source code for 
the compiler :-)


 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 I don't think you'd get much bloat in D by implementing dynamic arrays
 with templates. Remember, the built-in arrays *are* mostly implemented
 in D:
 https://github.com/D-Programming-Language/druntime/tree/master/src/rt

 Those work like generics, not like templates. Making them templates
 would duplicate all the runtime functions that process arrays for every
 element type it is instantiated with. And I am sure that would add some
 bloat. D has no generics, just templates. But built-in arrays work like
 generics.

Yeah, they use runtime polymorphism like generics instead of compile 
time polymorphism.

I don't believe there's any way round this -- you can't solve it with a 
better language design. If you want to handle different types with a 
single interface then either you need to generate code for each type 
(templates) or use runtime polymorphism (generics). Of course, you can 
improve both to minimize the overhead, but there's a fundamental memory 
v.s. performance compromise.

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 08:17 PM, Peter Alexander wrote:
 On 18/09/11 6:55 PM, Timon Gehr wrote:
 On 09/18/2011 07:16 PM, Peter Alexander wrote:
 On 18/09/11 5:08 PM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they
 are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special"
 features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work
 with
 generic element types.

 Yes. As I understand, Andrei prefers things in libraries and Walter
 prefers things built in to the compiler (obviously an
 oversimplification, but I believe that's the general way they 'lean').

 There's advantages to both. Being implementable in a library means that
 they can easily be swapped out or modified to work with other code, but
 being built-in ("magic", as Andrei puts it) means that the compiler has
 greater awareness of them and can do better optimizations, give better
 errors etc.

 Of course, there are ways of extending the language to provide better
 errors and allow better optimizations (e.g. 'pure' in D), but as a
 purely practical matter, it's easier if they are just built-in.

 Well, I think arrays should be built-in. What I was implying was that D,
 not entirely unlike Go, also has some magic. You can get almost the same
 with templates, so it is way better in that aspect than Go, but it is
 still there.

 Yes, you are right. D does have some "magic". Every language has to draw
 a line between library code and built-in code. The only way a language
 can be purely library is if the "language" is just the source code for
 the compiler :-)

It is not at all about syntax sugar, or whether or not the functionality 
comes in a library or as a built-in. I am merely feature-oriented here, 
and D does not have generics, but arrays work like generics. I am not 
even saying that is a bad thing. :)

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 I don't think you'd get much bloat in D by implementing dynamic arrays
 with templates. Remember, the built-in arrays *are* mostly implemented
 in D:
 https://github.com/D-Programming-Language/druntime/tree/master/src/rt

 Those work like generics, not like templates. Making them templates
 would duplicate all the runtime functions that process arrays for every
 element type it is instantiated with. And I am sure that would add some
 bloat. D has no generics, just templates. But built-in arrays work like
 generics.

 Yeah, they use runtime polymorphism like generics instead of compile
 time polymorphism.

 I don't believe there's any way round this -- you can't solve it with a
 better language design. If you want to handle different types with a
 single interface then either you need to generate code for each type
 (templates) or use runtime polymorphism (generics). Of course, you can
 improve both to minimize the overhead, but there's a fundamental memory
 v.s. performance compromise.

Consider eg this:

T foo(T: Object)(T x){
     return x;
}

Generics clearly win on that one, because it does not require code 
duplication or runtime polymorphism (as often is true in pure OO design 
programs for instance!). The same is the case for arrays. They don't 
need runtime polymorphism (except for stray language bugs such as 
array.sort).

Sep 18 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/18/11 11:08 AM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
  > The problem is, Vector was just an example of a multitude of
 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

That's an incorrect view of my statement. Yes, D's slices are built-in 
but the language offers facilities for defining any other parameterized 
types that are just as powerful. The only advantages slices have left 
are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax, 
i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray 
language bugs such as '$'.

Walter and I have long discussed that T[] should use an actual type 
object.Slice!T as back-end. That would allow us to e.g. switch from the 
pointer+length representation to the arguably better pointer+pointer 
representation with ease. The main difficulty with object.Slice is CTFE 
- the compiler can manipulate a T[] during compilation because it 
understands its invariant. The same invariant would be difficult to 
infer from a user defined type.


Andrei

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 On 9/18/11 11:08 AM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 That's an incorrect view of my statement.

I don't think so. The "special" feature we are talking about are 
generics. I am certainly not saying that your statement implies in any 
way that arrays should be a library feature.

 Yes, D's slices are built-in
 but the language offers facilities for defining any other parameterized
 types that are just as powerful.

Even way more powerful. But with great power comes great responsibility, 
eg some binary file bloat, along with the undecidability of the 
well-formedness of a particular template.

I am perfectly fine with arrays being built in. I am also fine with some 
"magic", because, as you say templates are good enough to replace the 
magic. But still I _do_ think that there is a tiny bit of magic.

 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

I am thankful for $, as it is a great feature, and it really should be 
made accessible to user defined types. Either through opDollar or the 
rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a stray 
language bug to you?

 Walter and I have long discussed that T[] should use an actual type
 object.Slice!T as back-end.

That would make the magic go away indeed, but then you'll get the bloat.

 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

In what way is that representation better?

Pointer+Length:
b=a[i] <=> b:=*(a.ptr+i); (if lhs typechecks)
b=a[i..j] <=> b.ptr:=a.ptr+i, b.length=j-i;
b=a.length <=> b:=a.length
bounds check a[i]: cmp i,$; jae label;


Pointer+Pointer
b=a[i] <=> b:=*(a.begin+i) (if lhs typechecks)
b=a[i..j] <=> b.begin:=a.begin+i; b.end=a.begin+j
b=a.length <=> b:=a.end-a.begin
bounds check a[i]: cmp i,begin; jb label; cmp i,end; jae label;
(not the only way to do it, you could also increase register pressure by 
first computing the length and then doing the other thing, but if the 
array ptr and length is in fact in registers, that loses)


Therefore, in my eyes Pointer+Pointer loses badly, because getting the 
length/bounds checking requires additional machine instructions.

 The main difficulty with object.Slice is CTFE
 - the compiler can manipulate a T[] during compilation because it
 understands its invariant.  The same invariant would be difficult to
 infer from a user defined type.

Basically, all that would have to be done would be to make (at least 
parts) of core.memory.GC CTFE-able. Or to treat the template as special 
all along, because basic efficiency concerns dictate that some amount of 
built-in-ness is great for such a basic data structure. Imho they are 
not built in enough into DMD right now.


As a side note, as soon as you use std.array.appender, the CTFE-ability 
goes away anyways. And most of the array-manipulating code in Phobos 
does just that. Are there any plans to make std.array.appender 
CTFE-able? I guess now that we have if(__ctfe){} that would be trivial, 
and the benefit would be extremely large, because many Phobos functions 
would become CTFE-able at once.

Sep 18 2011

"Nick Sabalausky" <a a.a> writes:

"Timon Gehr" <timon.gehr gmx.ch> wrote in message 
news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be 
 made accessible to user defined types. Either through opDollar or the 
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a stray 
 language bug to you?

He's saying that one of the few advantages slices have left over 
user-defined types is that, for slices, $ actually works. The bug is that it 
doesn't work for user-defined types.

FWIW, I like the rewrite idea far better than opDollar.

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 09:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch>  wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] =>  a[foo(a.length)]. What makes it qualify as a stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is that it
 doesn't work for user-defined types.

OK, thanks for clarifying.
(I would have called that an implementation bug though)

 FWIW, I like the rewrite idea far better than opDollar.

Me too. It fits better in the picture of D operator overloading.

Sep 18 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch>  wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] =>  a[foo(a.length)]. What makes it qualify as a stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

opDollar is more powerful because it can be made to work with infinite 
ranges.

Andrei

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch> wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

What would it return?

Sep 18 2011

dsimcha <dsimcha yahoo.com> writes:

On 9/18/2011 4:16 PM, Timon Gehr wrote:
 What would it return?

A dummy type, e.g.:

struct Repeat(T) {
     T val;
     T front()  property { return val; }
     void popFront() {}
     enum empty = false;

     static struct Dollar {}
     Dollar opDollar() {
         return Dollar.init;
     }


     auto opSlice(size_t lower, Dollar dollar) { return this; }
}

void main() {
     auto r = Repeat!int(1);
     auto r2 = r[666..$];
}

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 10:21 PM, dsimcha wrote:
 On 9/18/2011 4:16 PM, Timon Gehr wrote:
 What would it return?

 A dummy type, e.g.:

 struct Repeat(T) {
 T val;
 T front()  property { return val; }
 void popFront() {}
 enum empty = false;

 static struct Dollar {}
 Dollar opDollar() {
 return Dollar.init;
 }


 auto opSlice(size_t lower, Dollar dollar) { return this; }
 }

 void main() {
 auto r = Repeat!int(1);
 auto r2 = r[666..$];
 }

Ok, but what about

void main(){
     auto r = Repeat!int(1);
     auto r2 = r[666..$-1]; // ok, return entire range
     auto r3 = r[666..$/2]; // ditto
     auto r4 = r[666..$<100?$:100]; // ???
     // those could be made illegal:
     auto r5 = r[666..$*0]; // ???
     auto r6 = r[666..$-$]; // ???
     auto r7 = r[666..2*$-$]; // ???
     auto r8 = r[666..($*$)%4==3?667:668]; // ???
}

Sep 18 2011

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Sun, 18 Sep 2011 22:37:03 +0200, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 10:21 PM, dsimcha wrote:
 On 9/18/2011 4:16 PM, Timon Gehr wrote:
 What would it return?

 A dummy type, e.g.:

 struct Repeat(T) {
 T val;
 T front()  property { return val; }
 void popFront() {}
 enum empty = false;

 static struct Dollar {}
 Dollar opDollar() {
 return Dollar.init;
 }


 auto opSlice(size_t lower, Dollar dollar) { return this; }
 }

 void main() {
 auto r = Repeat!int(1);
 auto r2 = r[666..$];
 }

 Ok, but what about

 void main(){
      auto r = Repeat!int(1);
      auto r2 = r[666..$-1]; // ok, return entire range
      auto r3 = r[666..$/2]; // ditto
      auto r4 = r[666..$<100?$:100]; // ???

Compile-time error. struct Dollar does not have an overloaded <
operator. Or you make one that always returns false, in which case
things are a bit more complex (Dollar and 100 do not have a common
type).


      // those could be made illegal:
      auto r5 = r[666..$*0]; // ???
      auto r6 = r[666..$-$]; // ???
      auto r7 = r[666..2*$-$]; // ???
      auto r8 = r[666..($*$)%4==3?667:668]; // ???
 }

They already are. The supplied Dollar struct does not have overloaded
operators for any of those operations. If you implement them, you can
have them do whatever you feel is right.

-- 
   Simen

Sep 18 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch> wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a  
 stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 What would it return?

Not all types that have an end also support .length, or use sequential  
integers for indexes.

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch> wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a
 stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 What would it return?

 Not all types that have an end also support .length, or use sequential
 integers for indexes.

 -Steve

Yes, but D has already gone the 'being inflexible' path for the 
comparison/negation/logical/ternary operators. Isn't this a similar 
thing? I don't think infinite ranges work well with restrictive operator 
overloading. eg a[0..100<$?100:$]. They'd need some additional language 
support or they'll possibly blow up randomly on generic code.

Btw, do you know of an example of a data structure that can be indexed 
continuously and has the notion of an end, but no length?

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 09:49:14 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch> wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal  
 syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of  
 stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should  
 be
 made accessible to user defined types. Either through opDollar or  
 the
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a
 stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 What would it return?

 Not all types that have an end also support .length, or use sequential
 integers for indexes.

 -Steve

 Yes, but D has already gone the 'being inflexible' path for the  
 comparison/negation/logical/ternary operators. Isn't this a similar  
 thing? I don't think infinite ranges work well with restrictive operator  
 overloading. eg a[0..100<$?100:$]. They'd need some additional language  
 support or they'll possibly blow up randomly on generic code.

 Btw, do you know of an example of a data structure that can be indexed  
 continuously and has the notion of an end, but no length?

I was specifically thinking of red-black tree, which has a distinct end,  
but it's index is not necessarily length (or any value for that matter).

If you look at dcollections, you can slice a TreeMap using indexes or  
cursors.  However, to index through the last element in the tree, you use  
the special cursor .end:

auto range = mytree["hello" .. mytree.end]; // get all the elements in the  
tree which are >= "hello"

Here, mytree["hello" .. mytree.length] would simply not compile.

In addition, a tree with a uint index would silently do the *wrong* thing:

auto set = new TreeSet!uint(1, 3, 5, 7, 9);
assert(set.length == 5);
auto range = set[1..set.length];

assert(walkLength(range) == 2); // probably not what you expected

So I think it's not only limiting to require x.length to be $, it's very  
wrong in some cases.

Also, think of a string.  It has no length (well technically, it does, but  
it's not the number of elements), but it has a distinct end point.  A  
properly written string type would fail to compile if $ was s.length.

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
 On Mon, 19 Sep 2011 09:49:14 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 01:15 PM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 16:16:23 -0400, Timon Gehr <timon.gehr gmx.ch>
 wrote:

 On 09/18/2011 10:09 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch> wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal
 syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of
 stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really
 should be
 made accessible to user defined types. Either through opDollar or
 the
 rewrite a[foo($)] => a[foo(a.length)]. What makes it qualify as a
 stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 What would it return?

 Not all types that have an end also support .length, or use sequential
 integers for indexes.

 -Steve

 Yes, but D has already gone the 'being inflexible' path for the
 comparison/negation/logical/ternary operators. Isn't this a similar
 thing? I don't think infinite ranges work well with restrictive
 operator overloading. eg a[0..100<$?100:$]. They'd need some
 additional language support or they'll possibly blow up randomly on
 generic code.

 Btw, do you know of an example of a data structure that can be indexed
 continuously and has the notion of an end, but no length?

 I was specifically thinking of red-black tree, which has a distinct end,
 but it's index is not necessarily length (or any value for that matter).

 If you look at dcollections, you can slice a TreeMap using indexes or
 cursors. However, to index through the last element in the tree, you use
 the special cursor .end:

 auto range = mytree["hello" .. mytree.end]; // get all the elements in
 the tree which are >= "hello"

 Here, mytree["hello" .. mytree.length] would simply not compile.

 In addition, a tree with a uint index would silently do the *wrong* thing:

 auto set = new TreeSet!uint(1, 3, 5, 7, 9);
 assert(set.length == 5);
 auto range = set[1..set.length];

 assert(walkLength(range) == 2); // probably not what you expected

Ok, makes sense. Thanks.

 So I think it's not only limiting to require x.length to be $, it's very
 wrong in some cases.

 Also, think of a string. It has no length (well technically, it does,
 but it's not the number of elements), but it has a distinct end point. A
 properly written string type would fail to compile if $ was s.length.

But you'd have to compute the length anyways in the general case:

str[0..$/2];

Or am I misunderstanding something?

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
 So I think it's not only limiting to require x.length to be $, it's very
 wrong in some cases.

 Also, think of a string. It has no length (well technically, it does,
 but it's not the number of elements), but it has a distinct end point. A
 properly written string type would fail to compile if $ was s.length.

 But you'd have to compute the length anyways in the general case:

 str[0..$/2];

 Or am I misunderstanding something?

That's half the string in code units, not code points.

If string was properly implemented, this would fail to compile.  $ is not  
the length of the string range (meaning the number of code points).  The  
given slice operation might actually create an invalid string.

It's tricky, because you want fast slicing, but only certain slices are  
valid.  I once created a string type that used a char[] as its backing,  
but actually implemented the limitations that std.range tries to enforce  
(but cannot).  It's somewhat of a compromise.  If $ was mapped to  
s.length, it would fail to compile, but I'm not sure what I *would* use  
for $.  It actually might be the code units, which would not make the  
above line invalid.

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
 On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
 So I think it's not only limiting to require x.length to be $, it's very
 wrong in some cases.

 Also, think of a string. It has no length (well technically, it does,
 but it's not the number of elements), but it has a distinct end point. A
 properly written string type would fail to compile if $ was s.length.

 But you'd have to compute the length anyways in the general case:

 str[0..$/2];

 Or am I misunderstanding something?

 That's half the string in code units, not code points.

 If string was properly implemented, this would fail to compile. $ is not
 the length of the string range (meaning the number of code points). The
 given slice operation might actually create an invalid string.

Programmers have to be aware of that if they want efficient code that 
deals with unicode. I think having random access to the code units and 
being able to iterate per code point is fine, because it gives you the 
best of both worlds. Manually decoding a string and slicing it at 
positions that were remembered to be safe has been good enough for me, 
at least it is efficient.

 It's tricky, because you want fast slicing, but only certain slices are
 valid. I once created a string type that used a char[] as its backing,
 but actually implemented the limitations that std.range tries to enforce
 (but cannot). It's somewhat of a compromise. If $ was mapped to
 s.length, it would fail to compile, but I'm not sure what I *would* use
 for $. It actually might be the code units, which would not make the
 above line invalid.

 -Steve

Well it would have to be consistent for a string type that "does it 
right" . Either the string is indexed with units or it is indexed with 
code points, and the other option should be provided. Dollar should just 
be the length of what is used for indexing/slicing here, and having that 
be different from length makes for a somewhat awkward interface imho.

Btw, D double-quoted string literals let you define invalid byte 
sequences with eg. octal literals:
string s="\377";

What would be use cases for that? Shouldn't \377 map to the extended 
ascii charset instead and yield the same code point that would be given 
in C dq strings?

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
 On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
 So I think it's not only limiting to require x.length to be $, it's  
 very
 wrong in some cases.

 Also, think of a string. It has no length (well technically, it does,
 but it's not the number of elements), but it has a distinct end  
 point. A
 properly written string type would fail to compile if $ was s.length.

 But you'd have to compute the length anyways in the general case:

 str[0..$/2];

 Or am I misunderstanding something?

 That's half the string in code units, not code points.

 If string was properly implemented, this would fail to compile. $ is not
 the length of the string range (meaning the number of code points). The
 given slice operation might actually create an invalid string.

 Programmers have to be aware of that if they want efficient code that  
 deals with unicode. I think having random access to the code units and  
 being able to iterate per code point is fine, because it gives you the  
 best of both worlds. Manually decoding a string and slicing it at  
 positions that were remembered to be safe has been good enough for me,  
 at least it is efficient.

I find the same.  I don't think I've ever dealt with arbitrary math  
operations to do slices of strings like the above.  I only slice a string  
when I know the bounds are sane.

Like I said, it's a compromise.  The "right" thing to do is probably not  
even allow code-unit access via index (some have even argued that  
code-point slicing is too dangerous, because you can split a grapheme,  
leaving a valid, but incorrect slice of the original).

 It's tricky, because you want fast slicing, but only certain slices are
 valid. I once created a string type that used a char[] as its backing,
 but actually implemented the limitations that std.range tries to enforce
 (but cannot). It's somewhat of a compromise. If $ was mapped to
 s.length, it would fail to compile, but I'm not sure what I *would* use
 for $. It actually might be the code units, which would not make the
 above line invalid.

 -Steve

 Well it would have to be consistent for a string type that "does it  
 right" . Either the string is indexed with units or it is indexed with  
 code points, and the other option should be provided. Dollar should just  
 be the length of what is used for indexing/slicing here, and having that  
 be different from length makes for a somewhat awkward interface imho.

Except we are defining a string as a *range* and a range's length is  
defined as the number of elements.

Note that hasLength!string evaluates to false in std.range.'

$ should denote the end point of the aggregate, but it does not have to be  
equivalent to length, or even an integer/uint.  It should just mean "end".

I also proposed a while back to have ^ denote the beginning (similar to  
regex) of an aggregate for aggregates that don't use 0 as the beginning,  
but people didn't like it :)

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:
 On Mon, 19 Sep 2011 11:03:15 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 04:43 PM, Steven Schveighoffer wrote:
 On Mon, 19 Sep 2011 10:24:33 -0400, Timon Gehr <timon.gehr gmx.ch>
 wrote:

 On 09/19/2011 04:02 PM, Steven Schveighoffer wrote:
 So I think it's not only limiting to require x.length to be $, it's
 very
 wrong in some cases.

 Also, think of a string. It has no length (well technically, it does,
 but it's not the number of elements), but it has a distinct end
 point. A
 properly written string type would fail to compile if $ was s.length.

 But you'd have to compute the length anyways in the general case:

 str[0..$/2];

 Or am I misunderstanding something?

 That's half the string in code units, not code points.

 If string was properly implemented, this would fail to compile. $ is not
 the length of the string range (meaning the number of code points). The
 given slice operation might actually create an invalid string.

 Programmers have to be aware of that if they want efficient code that
 deals with unicode. I think having random access to the code units and
 being able to iterate per code point is fine, because it gives you the
 best of both worlds. Manually decoding a string and slicing it at
 positions that were remembered to be safe has been good enough for me,
 at least it is efficient.

 I find the same. I don't think I've ever dealt with arbitrary math
 operations to do slices of strings like the above. I only slice a string
 when I know the bounds are sane.

 Like I said, it's a compromise. The "right" thing to do is probably not
 even allow code-unit access via index (some have even argued that
 code-point slicing is too dangerous, because you can split a grapheme,
 leaving a valid, but incorrect slice of the original).

 It's tricky, because you want fast slicing, but only certain slices are
 valid. I once created a string type that used a char[] as its backing,
 but actually implemented the limitations that std.range tries to enforce
 (but cannot). It's somewhat of a compromise. If $ was mapped to
 s.length, it would fail to compile, but I'm not sure what I *would* use
 for $. It actually might be the code units, which would not make the
 above line invalid.

 -Steve

 Well it would have to be consistent for a string type that "does it
 right" . Either the string is indexed with units or it is indexed with
 code points, and the other option should be provided. Dollar should
 just be the length of what is used for indexing/slicing here, and
 having that be different from length makes for a somewhat awkward
 interface imho.

 Except we are defining a string as a *range* and a range's length is
 defined as the number of elements.

 Note that hasLength!string evaluates to false in std.range.'

Ok. I feel the way narrow strings are handled in Phobos are a reasonable 
trade-off.

 $ should denote the end point of the aggregate, but it does not have to
 be equivalent to length, or even an integer/uint. It should just mean
 "end".

Point taken. What is the solution for infinite ranges? Should any 
arithmetics on $ just be disallowed?

 I also proposed a while back to have ^ denote the beginning (similar to
 regex) of an aggregate for aggregates that don't use 0 as the beginning,
 but people didn't like it :)

 -Steve

=D, well, it is grammatically unambiguous!

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 12:00:46 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:

 $ should denote the end point of the aggregate, but it does not have to
 be equivalent to length, or even an integer/uint. It should just mean
 "end".

 Point taken. What is the solution for infinite ranges? Should any  
 arithmetics on $ just be disallowed?

I think $ should not pose any type restrictions, and whatever type you  
return can have whatever semantics you want.  The only semantic  
restriction is it should mean "the end of the container".

In other words, generic code that expects to be able to do:

x[0..$/2]

should only expect this to work when $ is a numeric value, or defines  
division by numbers.

Technically, half of infinity is still infinity, so the type of $ could be  
defined so that any arithmetic operations on it result in itself ;)

-Steve

Sep 19 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 19.09.2011, 19:08 Uhr, schrieb Steven Schveighoffer  
<schveiguy yahoo.com>:

 On Mon, 19 Sep 2011 12:00:46 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/19/2011 05:52 PM, Steven Schveighoffer wrote:

 $ should denote the end point of the aggregate, but it does not have to
 be equivalent to length, or even an integer/uint. It should just mean
 "end".

 Point taken. What is the solution for infinite ranges? Should any  
 arithmetics on $ just be disallowed?

 I think $ should not pose any type restrictions, and whatever type you  
 return can have whatever semantics you want.  The only semantic  
 restriction is it should mean "the end of the container".

 In other words, generic code that expects to be able to do:

 x[0..$/2]

 should only expect this to work when $ is a numeric value, or defines  
 division by numbers.

 Technically, half of infinity is still infinity, so the type of $ could  
 be defined so that any arithmetic operations on it result in itself ;)

 -Steve

The documentation of such generic code would make it obvious, that it  
cannot work with infinite ranges.
sort() on a list of all primes for example is obviously not working.
Btw, I don't mind [^..$]. I think it is clear for everyone who ever wrote  
a regex and easy for anyone else.
For natrually 0 based ranges, you'd still use 0. ^ would just be  
obfuscating there.

Sep 20 2011

Kagamin <spam here.lot> writes:

Steven Schveighoffer Wrote:

 auto set = new TreeSet!uint(1, 3, 5, 7, 9);
 assert(set.length == 5);
 auto range = set[1..set.length];
 
 assert(walkLength(range) == 2); // probably not what you expected

Where you got that "1"?
How to slice it from begin to 7?

Looks like an operator overload abuse. Slicing is an array's feature. If a
collection doesn't support array interface, it's questionable whether it should
support slicing as it's defined in D. By stretching slicing to non-array
collections you break its semantics. Good for voodoo, bad for engineering.

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 11:17:01 -0400, Kagamin <spam here.lot> wrote:

 Steven Schveighoffer Wrote:

 auto set = new TreeSet!uint(1, 3, 5, 7, 9);
 assert(set.length == 5);
 auto range = set[1..set.length];

 assert(walkLength(range) == 2); // probably not what you expected

 Where you got that "1"?

1 is the first element in the set.

 How to slice it from begin to 7?

set[set.begin .. 7]

This does not include the element at position 7.  To get that element too,  
you have to do something funky:

auto cur = set.elemAt(7);
cur.popFront();

set[set.begin .. cur];

I haven't thought of an easy way to signify find me the element After X.

 Looks like an operator overload abuse. Slicing is an array's feature. If  
 a collection doesn't support array interface, it's questionable whether  
 it should support slicing as it's defined in D. By stretching slicing to  
 non-array collections you break its semantics. Good for voodoo, bad for  
 engineering.

I emphatically disagree.  Slicing is the method of extracting a range from  
a collection given two reference points.  For arrays, that happens to be  
indexes.  For node-based containers, it would be references to those nodes  
(in dcollections, those are called cursors).  I don't see why slicing  
should be restricted to ints, otherwise, why have opSlice compile with  
anything other than ints?

-Steve

Sep 19 2011

dsimcha <dsimcha yahoo.com> writes:

On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:
 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

Yes, this is important.  IMHO, though, the default behavior of the $ 
operator should be to call range.length if it exists and opDollar isn't 
explicitly overloaded.  This would save a lot of boilerplate.

Sep 18 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/18/11 3:19 PM, dsimcha wrote:
 On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:
 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 Yes, this is important. IMHO, though, the default behavior of the $
 operator should be to call range.length if it exists and opDollar isn't
 explicitly overloaded. This would save a lot of boilerplate.

struct MyRange
{
   ...
   alias length opDollar;
}

I do agree that most of the time this is what you want anyway, so that 
line would occur a lot of times...


Andrei

Sep 18 2011

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Sun, 18 Sep 2011 22:32:44 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 9/18/11 3:19 PM, dsimcha wrote:
 On 9/18/2011 4:09 PM, Andrei Alexandrescu wrote:
 opDollar is more powerful because it can be made to work with infinite
 ranges.

 Andrei

 Yes, this is important. IMHO, though, the default behavior of the $
 operator should be to call range.length if it exists and opDollar isn't
 explicitly overloaded. This would save a lot of boilerplate.

 struct MyRange
 {
    ...
    alias length opDollar;
 }

 I do agree that most of the time this is what you want anyway, so that  
 line would occur a lot of times...

This works if opDollar is expected to be a niladic function. For multi-
dimensional structures, it would have to be monadic.

-- 
   Simen

Sep 18 2011

dsimcha <dsimcha yahoo.com> writes:

On 9/18/2011 4:32 PM, Andrei Alexandrescu wrote:
 struct MyRange
 {
 ...
 alias length opDollar;
 }

 I do agree that most of the time this is what you want anyway, so that
 line would occur a lot of times...


 Andrei

The problem with this is that everything in std.algorithm and std.range 
would have to be manually changed.  I don't feel like doing this myself 
or waiting for someone else to get around to it.  It just makes a heck 
of a lot more sense to specify it in one place, in the compiler.

Sep 18 2011

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On Sun, 18 Sep 2011 22:09:19 +0200, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 9/18/11 2:46 PM, Nick Sabalausky wrote:
 "Timon Gehr"<timon.gehr gmx.ch>  wrote in message
 news:j55h4f$1ia5$1 digitalmars.com...
 The only advantages slices have left
 are (a) type syntax, i.e. T[] instead of Slice!T, (b) literal syntax,
 i.e. [ 1, 2, 3 ] instead of slice(1, 2, 3), and (c) a couple of stray
 language bugs such as '$'.

 I am thankful for $, as it is a great feature, and it really should be
 made accessible to user defined types. Either through opDollar or the
 rewrite a[foo($)] =>  a[foo(a.length)]. What makes it qualify as a  
 stray
 language bug to you?

 He's saying that one of the few advantages slices have left over
 user-defined types is that, for slices, $ actually works. The bug is  
 that it
 doesn't work for user-defined types.

 FWIW, I like the rewrite idea far better than opDollar.

 opDollar is more powerful because it can be made to work with infinite  
 ranges.

Also, multi-dimensional structures:

     RectangularArray!int a = [[1,2,3], [4,5,6,7]];
     int b = a[$-1, $-2];

Those are obviously different $s.

-- 
   Simen

Sep 18 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/18/11 2:34 PM, Timon Gehr wrote:
 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 On 9/18/11 11:08 AM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special" features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work with
 generic element types.

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 That's an incorrect view of my statement.

 I don't think so.

You're free to think anything, but I know what I said and your view of 
it is incorrect.

Andrei

Sep 18 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/18/2011 10:06 PM, Andrei Alexandrescu wrote:
 On 9/18/11 2:34 PM, Timon Gehr wrote:
 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 On 9/18/11 11:08 AM, Timon Gehr wrote:
 On 09/18/2011 03:48 AM, Andrei Alexandrescu wrote:
 Quite interesting.

 http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/

 2 hours ago, Andrei Alexandrescu wrote:
 The problem is, Vector was just an example of a multitude of

 containers. The huge problem with slices is dogfood-related - they
 are >
 "magic" because the language features proposed to programmers were not
 enough for expressing a simple abstraction. Reserving "special"
 features
 for the language is a terrible way to go about programming language
 design.

 Don't D arrays do a similar thing? They are not templates, yet work
 with
 generic element types.

 Afaics, improving the language to the point were dynamic array-like
 structures could be implemented in the library without resulting in a
 bloated executable would be quite involved.

 That's an incorrect view of my statement.

 I don't think so.

 You're free to think anything, but I know what I said and your view of
 it is incorrect.

 Andrei

Well, I'd say there was a misunderstanding. Understanding someone who is 
not explaining himself is particularly difficult, so it is quite 
possible I am the one who misunderstood. But if you lack the time or 
motivation to carry out this discussion, that is fine of course.

Sep 18 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

I agree, I don't see why the representation is inherently better.  Some  
operations become higher performance (i.e. popFront), and some become  
worse (i.e. length).  Most of the others are a wash.

FWIW, you can avoid bloat by converting to runtime calls when templating  
is not necessary.  For example, append could just be a template shell:

opBinary(string op : "~=")(T other)
{
    return _d_appendTi(...) // don't remember the parameter types/order
}

In any case, before this could happen, we'd need to implement UFCS for  
custom types, and we'd need a solution on how to specify const(T)[] using  
a template (that implicitly casts from T[]).

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 01:25 PM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 FWIW, you can avoid bloat by converting to runtime calls when templating
 is not necessary. For example, append could just be a template shell:

 opBinary(string op : "~=")(T other)
 {
 return _d_appendTi(...) // don't remember the parameter types/order
 }

Ok, but I'd like the opBinary to not even be put into the object file. I 
believe an  inline annotation that guarantees inlining or compilation 
failure if it is impossible would be of great use for this and other 
applications.

 In any case, before this could happen, we'd need to implement UFCS for
 custom types,

First of all, the design of UFCS for custom types has to be settled on. 
Should it be implicit like the current ad-hoc way for arrays or explicit 
(eg per a 'this' storage class) ? I would be in favor of an explicit 
solution.

 and we'd need a solution on how to specify const(T)[]
 using a template (that implicitly casts from T[]).

Even more than that, templates would need to be able to specify stuff like

// the fact that this currently compiles is a quite severe bug that 
compromises type/memory safety, it would have to be disallowed without 
an explicit cast:

class C{}
class D: C{}
class E: C{}

void main(){
     D[] d = [new D];
     C[] c = d;
     c[0] = new E;
     assert(typeid(d[0]) == typeid(E)); // a stray E in a D[]!
}

// this on the other hand is perfectly fine:
void main(){
     D[] d = [new D];
     const(C)[] c = d;
     // no possibility to screw up d. (no possibility to change the 
individual elements per method calls either)
}

As I pointed out in my initial post, I think the language changes to 
make something that works like a dynamic array implementable in a 
library would be quite involved, because there _is_ some nontrivial 
magic going on for them. Similar claims hold for pointers.

Probably having something ad-hoc, like an opImplicitCast, would work 
out, but it would share some functionality with alias this...

Still, imho the best solution is to keep dynamic arrays built in, 
whether or not their special features are made available to the programmer.

Sep 19 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

That's where frequency of use comes into play. I'm thinking popFront 
would be used most often, and it touches two words.

Andrei

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 That's where frequency of use comes into play. I'm thinking popFront  
 would be used most often, and it touches two words.

I'm not so sure.  It's very difficult to prove this is the case (either  
way).

Already foreach does not use the range primitives for arrays, which is  
probably the most frequent user of popFront (for ranges other than arrays  
of course).

However, length is used frequently in slicing, or passing to underlying C  
or OS functions which require ptr + length.  Maybe the compiler can  
optimize out the calculations of length when they are just getting  
translated right back to pointers.  For example a = a[1..$] shouldn't have  
to calculate a.length, it should just be a.ptr += 5.  I also think it  
would be nice to have direct access to the end pointer.  Likewise,  
a.length -= 5 shouldn't have to calculate the original length, then  
subtract 5, then pass that to the setlength function.

I'm not sure if the runtime code would be better or worse if we used 2  
pointers instead of ptr/length.  for sure that is where the bulk of the  
changes would need to be made.

The other thing that I don't like is it's much easier to construct an  
invalid slice with 2 pointers than it is with a ptr/length.

I suspect my opinion doesn't matter much for what we all "think" is used  
more frequently, but there are bigger issues to solve before we could have  
a library-based array type.  It might be best to leave the array as a  
magic type, but just change the internal representation in the  
compiler/runtime.

I know that printf was abused at some point for writing slices by relying  
on the fact that the order of length/ptr happened to be the same when you  
specified the length of a string as a parameter to printf, this would  
definitely break any code that relies on that.  Not that we need to care,  
but it's something to think about.

And actually, false pointers would be cut down (both pointers always point  
to the referenced block), which might be worth all the trouble anyways :)

-Steve

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 10:38:21 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 For example a = a[1..$] shouldn't have to calculate a.length, it should  
 just be a.ptr += 5.

oops, victim of half-editing there :)

Meant to say a = a[5..$]

-Steve

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/19/2011 04:08 PM, Andrei Alexandrescu wrote:
 On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 That's where frequency of use comes into play. I'm thinking popFront
 would be used most often, and it touches two words.

 Andrei

Normally, each popFront comes with an accompanying empty, and the 
comparison against 0 is faster after a decrement than the comparison of 
two arbitrary pointer values.

Now a benchmark:

import std.datetime, std.stdio;

int[100000] a;

void f1(){
     for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len){}
}
void f2(){
     for(auto b=a.ptr, e=a.ptr+a.length; b!=e; b++){}
}

void main(){
     auto b=benchmark!(f1,f2)(100000);
     writeln(b[0].to!("seconds",double)," ", b[1].to!("seconds",double));
}

On my machine: (-O -release -inline)
4.00256 4.00099

The difference is inconceivable on my oldish Core 2 Duo processor. Yet 
there is a reproducible difference that indicates that you were right 
about the pointer-pointer representation being slightly faster for that 
use case.

Sep 19 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 That's where frequency of use comes into play. I'm thinking popFront
 would be used most often, and it touches two words.

 Andrei

Well, looking at the two implementations:

popFront()

if(length) {
     ptr    += T.sizeof;
     length -= 1;
}

vs

if(ptrBack - ptrFront > 0) {
     ptrFront += T.sizeof;
}

And don't forget that every popFront call will be matched by a call to empty

empty()

return length;

vs

return ptrBack - ptrFront > 0;

I see that the 'new' popFront still needs to read 2 words and results in an
extra subtraction. Without the guard statements, then the instruction counts
are identical, but note that the current popFront implementation uses slices
which are 'guarded' and I see every reason not to change this behavior. And
given how memory works, I doubt there by any advantage to 1 vs 2 writes one
after another to the same cache line.

By the way, for those wondering why I didn't use 'ptrBack > ptrFront', that's
because comparisons are eventually reduced to a 'Jump if (Not) Zero', and I
wanted to make the amount and types of hardware instructions clear.

The elephant in the room, of course, is that length now requires a division and
that popFront is actually implemented using slicing:

a = a[i .. $];

which translates into:

auto begin = i;
auto end   = length;
if(end - begin >= 0  && length - end >= 0) {
     ptr     = ptr + T.sizeof * begin;
     length  = end - begin;
}

vs

auto length = (ptrBack - ptrFront) / T.sizeof;
auto begin  = ptrFront + T.sizeof * i;
auto end    = ptrFront + T.sizeof * length;
if(end - begin >= 0  && ptrBack - end >= 0) {
     ptrFront = begin;
     ptrBack  = end;
}

And both integer multiplication and division of non-power of 2 sizes is really,
really, really slow, compared to +-. Now, the division is only needed whenever
'length' is called, but the extra multiplication is required on every slice.

So, on balance, I'd say the two pointers representation is categorically worse
than the fat pointer representation.

Sep 19 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 19 Sep 2011 11:46:44 -0400, Robert Jacques <sandford jhu.edu>  
wrote:

 On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 That's where frequency of use comes into play. I'm thinking popFront
 would be used most often, and it touches two words.

 Andrei

 The elephant in the room, of course, is that length now requires a  
 division and that popFront is actually implemented using slicing:

 a = a[i .. $];

 which translates into:

 auto begin = i;
 auto end   = length;
 if(end - begin >= 0  && length - end >= 0) {
      ptr     = ptr + T.sizeof * begin;
      length  = end - begin;
 }

 vs

 auto length = (ptrBack - ptrFront) / T.sizeof;
 auto begin  = ptrFront + T.sizeof * i;
 auto end    = ptrFront + T.sizeof * length;
 if(end - begin >= 0  && ptrBack - end >= 0) {
      ptrFront = begin;
      ptrBack  = end;
 }

I would hope something like this would be optimized by the compiler:

auto begin = ptrFront + T.sizeof * i;
if(ptrBack - begin >= 0)
    ptrFront = begin;

If not, popFront could optimize it.

Certainly, to say popFront is going to perform *worse* using a  
dual-pointer representation is false.  Only one calculation is needed.

-Steve

Sep 19 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 19 Sep 2011 12:00:17 -0400, Steven Schveighoffer <schveiguy yahoo.com>
wrote:

 On Mon, 19 Sep 2011 11:46:44 -0400, Robert Jacques <sandford jhu.edu>
 wrote:

 On Mon, 19 Sep 2011 10:08:32 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 On 9/19/11 6:25 AM, Steven Schveighoffer wrote:
 On Sun, 18 Sep 2011 15:34:16 -0400, Timon Gehr <timon.gehr gmx.ch>
 wrote:

 On 09/18/2011 08:28 PM, Andrei Alexandrescu wrote:
 That would allow us to e.g. switch from the
 pointer+length representation to the arguably better pointer+pointer
 representation with ease.

 In what way is that representation better?

 I agree, I don't see why the representation is inherently better. Some
 operations become higher performance (i.e. popFront), and some become
 worse (i.e. length). Most of the others are a wash.

 That's where frequency of use comes into play. I'm thinking popFront
 would be used most often, and it touches two words.

 Andrei

 The elephant in the room, of course, is that length now requires a
 division and that popFront is actually implemented using slicing:

 a = a[i .. $];

 which translates into:

 auto begin = i;
 auto end   = length;
 if(end - begin >= 0  && length - end >= 0) {
      ptr     = ptr + T.sizeof * begin;
      length  = end - begin;
 }

 vs

 auto length = (ptrBack - ptrFront) / T.sizeof;
 auto begin  = ptrFront + T.sizeof * i;
 auto end    = ptrFront + T.sizeof * length;
 if(end - begin >= 0  && ptrBack - end >= 0) {
      ptrFront = begin;
      ptrBack  = end;
 }

 I would hope something like this would be optimized by the compiler:

 auto begin = ptrFront + T.sizeof * i;
 if(ptrBack - begin >= 0)
     ptrFront = begin;

 If not, popFront could optimize it.

 Certainly, to say popFront is going to perform *worse* using a
 dual-pointer representation is false.  Only one calculation is needed.

 -Steve

Unfortunately, the compiler isn't going to auto-magically optimize the length
computation away. (Remember that end is not necessarily equal to ptrBack, for
an arbitrary set of ptrFront and ptrBack; it is the high level invariants
associated with arrays which make it true) The compiler could optimize the
slice operator, but it has to do so at a very high level; i.e. recognizing and
special casing statements like a[1..$]. (And such optimizations are notoriously
brittle) And yes, popFront could optimize the slicing operator. But having to
do so is a pretty good indication that something's wrong with the array design.

Also, I didn't say that popFront, by itself, is going to perform *worse*. I
said that popFront + empty require 1 extra subtraction and 1 less memory write.
On today's out of order desktop processors that probably amounts to a wash.
(And the benchmarks support this, see my other post). And given that the change
makes computations like empty, length and slicing much worse, I stated that
dual-pointers would, on the whole, be worse than ptr+length.

Sep 19 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

Benchmark. A few of your assumptions don't hold.

Andrei

Sep 19 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/19/11 11:12 AM, Andrei Alexandrescu wrote:
 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

s/don't/may not/

:o)

Andrei

Sep 19 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

So I ran Timon Gehr's benchmark on my system 10 times, and although the Old
method performed the best, the times showed a scheduling jitter of about 0.2 s
which is an order of magnitude larger than the differences between the runs.
I've had some experience with removing jitter from benchmarks, so I redid it.
For those who don't want to read through all the results and code snippets
below, here's the executive summary: there's no measurable difference between
the optimum popFront + empty implementations as laid out by Timon Gehr and
myself. However, when you consider the current slice based implementation of
popFront (i.e. a[1..$]), things change drastically. For an int[], ptr+ptr was
3x slower and for an int[3][] it was 8x slower.


Timon Gehr's benchmark:

Old     New     Diff
4.64242 4.58722 0.0551916
4.60659 4.5482  0.058386
4.53478 4.51753 0.0172519
4.54561 4.51867 0.026935
4.46662 4.4733  -0.00668139
4.47204 4.46893 0.00310762
4.52798 4.54021 -0.0122326
4.5685  4.57667 -0.00816801
4.50138 4.49186 0.00951325
4.49764 4.49422 0.00342912

--------------------------

import std.datetime, std.stdio, std.algorithm;

int[1_000_000] a;

void f1(){
     for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len) {}
}
void f2(){
     for(auto b=a.ptr, e=a.ptr+a.length; b<e; b++){}
}

void main(){
     writeln("Always remember to run benchmarks on a single core. Pause");
     readln();
     double min_f1 = int.max;
     double min_f2 = int.max;
     foreach(i;0..10_000) {
         auto b=benchmark!(f1,f2)(3);
         min_f1 = min(min_f1, b[0].to!("seconds",double));
         min_f2 = min(min_f2, b[1].to!("seconds",double));
     }
     writeln("Old\tNew\tDiff");
     writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
}

which gives

Old     New     Diff
0.00127244      0.00127244      0%

On my system. Which is pretty much as expected. Now onto the more critical test.

import std.datetime, std.stdio, std.algorithm, std.range, std.array;

alias int[3] T;
T[1_000_000] a;

void f1(){
     auto a2 = a[];
     while(!a2.empty) { a2.popFront(); }
}
void f2(){
     auto ptrFront = a.ptr;
     auto ptrBack  = a.ptr + a.length;
     auto i = 1;
     while(!(ptrFront >= ptrBack) ) {
         auto length = (ptrBack - ptrFront) / T.sizeof;
         auto begin  = ptrFront + T.sizeof * i;
         auto end    = ptrFront + T.sizeof * length;
         if(end - begin >= 0  && ptrBack - end >= 0) {
             ptrFront = begin;
             ptrBack  = end;
         }
     }
}

void main(){
     writeln("Always remember to run benchmarks on a single core. Pause");
     readln;
     double min_f1 = int.max;
     double min_f2 = int.max;
     foreach(i;0..10_000) {
         auto b=benchmark!(f1,f2)(3);
         min_f1 = min(min_f1, b[0].to!("seconds",double));
         min_f2 = min(min_f2, b[1].to!("seconds",double));
     }
     writeln("Old\tNew\tDiff");
     writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
}

Old     New     Diff
0.00869062      0.0118318       36.145%

Which makes sense given the division. Switching T to an int gives

Old     New     Diff
0.00868968      0.00502586      -42.1629%

Ah.. Perhaps I should also inline f1:

void f1(){
//    auto a2 = a[];
//    while(!a2.empty) { a2.popFront(); }

     auto i = 1;
     auto length = a.length;
     auto ptr    = a.ptr;
     while(!(length == 0)) {
         auto end   = length;
         auto begin = i;
         if(end - begin >= 0  && length - end >= 0) {
             ptr     = ptr + T.sizeof * begin;
             length  = end - begin;
         }
     }
}

which results in:

Old     New     Diff
0.00127244      0.00502912      295.233%

And Switching T back to int[3] gives:

Old     New     Diff
0.00127244      0.01192 836.78%

Sep 19 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 20 Sep 2011 02:18:12 -0400, Robert Jacques <sandford jhu.edu> wrote:

 On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

 So I ran Timon Gehr's benchmark on my system 10 times...

P.S. All results are compiled using -O -release -inline and run on a single
core of a Core 2 Duo T7500 2.2 GHz

Sep 19 2011

Timon Gehr <timon.gehr gmx.ch> writes:

On 09/20/2011 08:18 AM, Robert Jacques wrote:
 On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

 So I ran Timon Gehr's benchmark on my system 10 times, and although the
 Old method performed the best, the times showed a scheduling jitter of
 about 0.2 s which is an order of magnitude larger than the differences
 between the runs. I've had some experience with removing jitter from
 benchmarks, so I redid it. For those who don't want to read through all
 the results and code snippets below, here's the executive summary:
 there's no measurable difference between the optimum popFront + empty
 implementations as laid out by Timon Gehr and myself. However, when you
 consider the current slice based implementation of popFront (i.e.
 a[1..$]), things change drastically. For an int[], ptr+ptr was 3x slower
 and for an int[3][] it was 8x slower.


 Timon Gehr's benchmark:

 Old New Diff
 4.64242 4.58722 0.0551916
 4.60659 4.5482 0.058386
 4.53478 4.51753 0.0172519
 4.54561 4.51867 0.026935
 4.46662 4.4733 -0.00668139
 4.47204 4.46893 0.00310762
 4.52798 4.54021 -0.0122326
 4.5685 4.57667 -0.00816801
 4.50138 4.49186 0.00951325
 4.49764 4.49422 0.00342912

 --------------------------

 import std.datetime, std.stdio, std.algorithm;

 int[1_000_000] a;

 void f1(){
 for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len) {}
 }
 void f2(){
 for(auto b=a.ptr, e=a.ptr+a.length; b<e; b++){}
 }

 void main(){
 writeln("Always remember to run benchmarks on a single core. Pause");
 readln();
 double min_f1 = int.max;
 double min_f2 = int.max;
 foreach(i;0..10_000) {
 auto b=benchmark!(f1,f2)(3);
 min_f1 = min(min_f1, b[0].to!("seconds",double));
 min_f2 = min(min_f2, b[1].to!("seconds",double));
 }
 writeln("Old\tNew\tDiff");
 writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
 }

 which gives

 Old New Diff
 0.00127244 0.00127244 0%

 On my system. Which is pretty much as expected. Now onto the more
 critical test.

 import std.datetime, std.stdio, std.algorithm, std.range, std.array;

 alias int[3] T;
 T[1_000_000] a;

 void f1(){
 auto a2 = a[];
 while(!a2.empty) { a2.popFront(); }
 }
 void f2(){
 auto ptrFront = a.ptr;
 auto ptrBack = a.ptr + a.length;
 auto i = 1;
 while(!(ptrFront >= ptrBack) ) {
 auto length = (ptrBack - ptrFront) / T.sizeof;
 auto begin = ptrFront + T.sizeof * i;
 auto end = ptrFront + T.sizeof * length;
 if(end - begin >= 0 && ptrBack - end >= 0) {
 ptrFront = begin;
 ptrBack = end;
 }
 }
 }

 void main(){
 writeln("Always remember to run benchmarks on a single core. Pause");
 readln;
 double min_f1 = int.max;
 double min_f2 = int.max;
 foreach(i;0..10_000) {
 auto b=benchmark!(f1,f2)(3);
 min_f1 = min(min_f1, b[0].to!("seconds",double));
 min_f2 = min(min_f2, b[1].to!("seconds",double));
 }
 writeln("Old\tNew\tDiff");
 writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
 }

 Old New Diff
 0.00869062 0.0118318 36.145%

 Which makes sense given the division. Switching T to an int gives

 Old New Diff
 0.00868968 0.00502586 -42.1629%

 Ah.. Perhaps I should also inline f1:

 void f1(){
 // auto a2 = a[];
 // while(!a2.empty) { a2.popFront(); }

 auto i = 1;
 auto length = a.length;
 auto ptr = a.ptr;
 while(!(length == 0)) {
 auto end = length;
 auto begin = i;
 if(end - begin >= 0 && length - end >= 0) {
 ptr = ptr + T.sizeof * begin;
 length = end - begin;
 }
 }
 }

 which results in:

 Old New Diff
 0.00127244 0.00502912 295.233%

 And Switching T back to int[3] gives:

 Old New Diff
 0.00127244 0.01192 836.78%

Thank you for making this more meaningful! I assumed the standard 
library benchmark function would take care of those things. Should it?

Sep 20 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 20 Sep 2011 14:01:05 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:
[snip]

 Thank you for making this more meaningful! I assumed the standard
 library benchmark function would take care of those things. Should it?

Yes and no. Benchmark provides a good way to make a single measurement of a
function, as for really short functions you do have to loop many times to be
able to get a reliable reading. However, actual benchmarking requires a) tuning
the benchmark() call time to about 10-20 ms and b) running benchmark() many
times, taking the minimum. The idea is that on any given run you could hit a
context switch, etc. so if you make multiple run, one will get lucky and not be
interrupted. Worse, if a core switch happens between StopWatch start and end,
the number of ticks returned is random. Hence, the comment to manually limit
execution to a single core. So, it might be nice if benchmark() also took a
second parameter, denoting the number of times to benchmark the function and
had some better documentation on how to do good benchmarking.

Sep 20 2011

Don <nospam nospam.com> writes:

On 19.09.2011 18:12, Andrei Alexandrescu wrote:
 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

Note that high-performance libraries that use slices, like GMP and the 
many BLAS libraries, use the pointer+length representation, not 
pointer+pointer. They've done a lot of benchmarking on a huge range of 
architectures, with a large range of compilers.

The underlying reason for this, is that almost all CISC instruction sets 
have built-in support for pointer+length. AFAIK nothing has builtin 
support for ptr+ptr.

On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be 
used on almost every instruction, so that the calculation [addr + 
sz*index] takes ZERO clock cycles when sz is a power of 2.
Generally, when you supply two pointers, the optimizer will try to 
convert it into ptr + offset (where offset isn't bytes, it corresponds 
to D's length).

Sep 21 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/21/11 1:49 PM, Don wrote:
 On 19.09.2011 18:12, Andrei Alexandrescu wrote:
 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

 Note that high-performance libraries that use slices, like GMP and the
 many BLAS libraries, use the pointer+length representation, not
 pointer+pointer. They've done a lot of benchmarking on a huge range of
 architectures, with a large range of compilers.

 The underlying reason for this, is that almost all CISC instruction sets
 have built-in support for pointer+length. AFAIK nothing has builtin
 support for ptr+ptr.

 On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be
 used on almost every instruction, so that the calculation [addr +
 sz*index] takes ZERO clock cycles when sz is a power of 2.
 Generally, when you supply two pointers, the optimizer will try to
 convert it into ptr + offset (where offset isn't bytes, it corresponds
 to D's length).

To all who replied and tested - color me convinced we should keep the 
current state of affairs. Thanks!

Andrei

Sep 21 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 21 Sep 2011 16:02:53 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 9/21/11 1:49 PM, Don wrote:
 On 19.09.2011 18:12, Andrei Alexandrescu wrote:
 On 9/19/11 10:46 AM, Robert Jacques wrote:
 So, on balance, I'd say the two pointers representation is categorically
 worse than the fat pointer representation.

 Benchmark. A few of your assumptions don't hold.

 Andrei

 Note that high-performance libraries that use slices, like GMP and the
 many BLAS libraries, use the pointer+length representation, not
 pointer+pointer. They've done a lot of benchmarking on a huge range of
 architectures, with a large range of compilers.

 The underlying reason for this, is that almost all CISC instruction sets
 have built-in support for pointer+length. AFAIK nothing has builtin
 support for ptr+ptr.

 On x86, you have this wonderful [EAX+8*EBX] addressing mode, that can be
 used on almost every instruction, so that the calculation [addr +
 sz*index] takes ZERO clock cycles when sz is a power of 2.
 Generally, when you supply two pointers, the optimizer will try to
 convert it into ptr + offset (where offset isn't bytes, it corresponds
 to D's length).

 To all who replied and tested - color me convinced we should keep the
 current state of affairs. Thanks!

 Andrei

No problem. Also, TDPL uses ptr+ptr for its representation. Having gone back
and looked at the chapter on arrays, I think that it makes for great figures
and aides the comprehension of ideas. On the other hand, a lot of programming
books, e.g. Numeric Recipes in C, have done a lot of harm over the years
through people copying their sub-optimal code samples/snippets. So you may want
to add a sentence regarding D's actual implementation to the clause under
figure 4.1 on page 98.

Sep 22 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

In that thread you have said:
http://www.reddit.com/r/programming/comments/kikut/think_in_go_gos_alternative_to_the/c2krft0

 I'm interested in furthering D's grok of dependent types.

This blog post shows some basic nice examples of dependent types in ATS
language:
http://www.bluishcoder.co.nz/2010/09/01/dependent-types-in-ats.html

Some other notes:
http://leonardo-m.livejournal.com/98077.html

But those are toys. For practical programming this alternative version is close
to what you can do in D, and keep both efficiency and sanity:
http://rosettacode.org/wiki/Matrix_multiplication#Alternative_version

Bye,
bearophile

Sep 18 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Go and generic programming on reddit, also touches on D