digitalmars.D - Yet another strike against the current AA implementation

dsimcha (23/23) Apr 26 2009 So I tried to create a struct to allow iteration over the keys or values...

Michel Fortin (8/16) Apr 26 2009 Hum, are you saying opApply superior when it comes to iterating in a

dsimcha (12/24) Apr 26 2009 Exactly. On the other hand, you lose a lot of flexibility with opApply....

Michel Fortin (10/30) Apr 26 2009 Indeed. I certainly agree that both ranges and opApply have their place.

dsimcha (28/33) Apr 26 2009 You probably have a point. If the range interface is "good enough" for ...
Andrei Alexandrescu (4/36) Apr 26 2009 I think, all other things being equal, that opApply tends to be slower

Steven Schveighoffer (15/51) Apr 27 2009 It depends. The range must have either inline-able functions, or must N...

Andrei Alexandrescu (6/25) Apr 27 2009 This is a very good point. If a range offers virtual

Andrei Alexandrescu (7/33) Apr 26 2009 1. Depth of iteration is low, so a short vector optimization will most

Christopher Wright (9/12) Apr 26 2009 When you know the type beforehand or can use templates, that is, rather

Andrei Alexandrescu (5/19) Apr 26 2009 Yah, that's a good motivation to change how hashes are currently
dsimcha (64/76) Apr 26 2009 Rule number 1 of all performance discussions: Always measure if possibl...

Robert Fraser (4/95) Apr 26 2009 Did you compile that with optimization/inlining? If not, it's not really...

dsimcha (3/6) Apr 27 2009 Yes, -O -inline -release. I didn't bother stating that explicitly b/c i...

Michel Fortin (11/15) Apr 27 2009 Nice.

Daniel Keep (6/20) Apr 27 2009 Inlining does not automatically make things faster. Case in point:

bearophile (5/12) Apr 27 2009 That's true in general: if you inline too much code you may end up with ...

Frits van Bommel (8/23) Apr 27 2009 For non-virtual opApply this is exactly what LDC does[1] at the higher

bearophile (4/7) Apr 27 2009 When you find a bit of free time please time it and show the timings :-)

bearophile (57/58) Apr 27 2009 Very good, thank you. I have run your code, with a bit of changes:
Andrei Alexandrescu (10/22) Apr 27 2009 Thanks for the numbers. The way I'd read them is not to look at the
Frits van Bommel (19/32) Apr 27 2009 I edited this code to work with ldc (D1) + Tango, and saw the Direct and...

Michel Fortin (12/16) Apr 28 2009 Thank you for your timings. I think it shows my point: that by

Daniel Keep (7/21) Apr 28 2009 Not true. Here's an excellent reason to use ranges over opApply: you

Michel Fortin (15/37) Apr 28 2009 I guess I removed too much context from the above posts. We're just
downs (2/27) Apr 28 2009

Steven Schveighoffer (4/26) Apr 28 2009 read: less overhead than full threads, not less overhead than ranges ;)
grauzone (5/24) Apr 28 2009 First, don't they have LOTS of memory overhead, because each fiber needs...

Joel C. Salomon (5/17) Apr 28 2009 A fiber-specific stack needn’t be very large. A few KB is often enough

downs (2/19) Apr 28 2009 Furthermore, using anonymous mmapped files the memory is only allocated ...

Georg Wrede (25/40) Apr 29 2009 The Commodore-64 had a stack of 0.25kB shared between your Basic

Andrei Alexandrescu (8/41) Apr 29 2009 Since C64 programs have gotten a lot bigger, programming style favors

Georg Wrede (16/58) Apr 29 2009 The old size estimation problem. No matter how big you decide on,

Joel C. Salomon (6/17) Apr 29 2009 How much rewriting of the stack would have to happen? Relocating the

Andrei Alexandrescu (4/5) Apr 29 2009 In a way it is trivial... if you accept a 7% added overhead per function...

bearophile (11/12) Apr 29 2009 Now Delphi and FreePascal allow to switch that on and off. From the page...

Daniel Keep (3/19) Apr 29 2009 Doesn't the OS protect you from stack overflows?

Leandro Lucarella (8/13) Apr 29 2009 Using page protection have no overhead, unless you actually hit a stack

Andrei Alexandrescu (17/38) Apr 29 2009 You can't be more serious than Mark Twain when giving advice on the

Sean Kelly (9/46) Apr 29 2009 You could have the GC report stack sizes when it performs a collection.
Georg Wrede (33/72) Apr 29 2009 Two methods come to mind.(1)

Andrei Alexandrescu (11/36) Apr 29 2009 I'd be amazed if hello.d were your prototype of a useful program :o).

Georg Wrede (13/49) Apr 29 2009 Somebody more resourceful than I might /prove/ some maximum for any

Andrei Alexandrescu (12/17) Apr 29 2009 There's no need for a lot of expertise, it's a trivial fact for anyone

Don (5/27) Apr 29 2009 I think in the first instance it was at least to get an idea of how much...

Georg Wrede (9/63) Apr 30 2009 Having looked into things, it turns out I'm the one that now suspects

Andrei Alexandrescu (10/25) Apr 30 2009 What I referred to was inferring a thread's actual stack requirements

Georg Wrede (48/75) Apr 30 2009 Could it be that you're looking at this too much through the eyes of a

Andrei Alexandrescu (22/111) Apr 30 2009 I am sorry, I need to fix a few mistakes in this post and then I'll have...

Georg Wrede (2/4) Apr 30 2009 Sigh.

bearophile (5/8) Apr 30 2009 See also sys.setrecursionlimit() to lift that limit where useful. You ca...

Georg Wrede (3/18) Apr 30 2009 Python seems a nice language. It's a shame that it's a bit slow. Heh,

Sean Kelly (9/19) Apr 29 2009 One thing I don't understand is how the OS can manage essentially

Georg Wrede (4/24) Apr 29 2009 Well, unbounded and unbounded. It just means there's no set value. The

Walter Bright (7/11) Apr 29 2009 No way in hell is there a reliable way for a compiler to estimate stack

bearophile (35/47) Apr 28 2009 I'll try to zip two ranges that return the leaves of two different binar...

dsimcha (14/45) Apr 28 2009 This can be a simple example to show to attract people to D2 language :-...

bearophile (10/13) Apr 28 2009 Yes, sorry, I have found it few seconds after sending the post. I am dum...

Steve Teale (9/35) Apr 26 2009 Ranges, ranges! That's all I hear these days, and it looks to me like th...

dsimcha (18/53) Apr 26 2009 is implemented - like x ~= c to add a character to the end of an array -
Leandro Lucarella (11/15) Apr 26 2009 A PEP[1]-like process would be great.
Andrei Alexandrescu (17/21) Apr 26 2009 There was. Incidentally, it was called "RFC on range design for D2".
Georg Wrede (27/57) Apr 27 2009 There's an illusion. And that illusion comes from the D newsgroups

Derek Parnell (12/19) Apr 27 2009 Really?! I still find them very confusing. Maybe its just the formatting

Andrei Alexandrescu (3/10) Apr 27 2009 Suggestions are always welcome.
bearophile (5/7) Apr 27 2009 I find templates in D1 easy to use in most situations. And template cons...

Andrei Alexandrescu (4/12) Apr 27 2009 Yes, well put. I think it would be great to define a digitalmars.d2 or

Simen Kjaeraas (6/16) Apr 27 2009 And what would we do the day D987 came out? Start another group called

Christopher Wright (2/22) Apr 27 2009 How about digitalmars.dnext?

Denis Koroskin (2/21) Apr 27 2009 Maybe digitalmars.future? :)

dsimcha <dsimcha yahoo.com> writes:

So I tried to create a struct to allow iteration over the keys or values of a
builtin associative array using a lazy forward range.  This would allow one to
pass the keys or values of an AA to any function that expects a regular old
forward range w/o any heap activity, eager copying, etc. and with O(1)
auxiliary memory usage.  Now that ranges are the lingua franca of iteration in
D, this seems like a pretty necessary feature.

The problem is that D's current AAs are based on a hash table with collision
resolution done using binary trees.  I guess this decision was made to allow
for O(log N) worst case performance even when there are ridiculous amounts of
hash collisions.  However, this is a very bad decision because, in exchange
for more bulletproof performance in corner cases, you lose:

1.  Every element is now (void*).sizeof bytes larger because you have to store
a left and right pointer.
2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
in any way I'm aware of.  This means that a lazy range to iterate over the
keys or values is relatively useless because it would generate heap activity
anyhow to store the stack or queue necessary to iterate over the tree, so you
may as well just use the current .keys or .values properties, which just
return an array.  (Because of the way that ranges are designed, you can't just
use the call stack, unless you use fibers or something, which is obviously not
a win from an efficiency point of view.)

Both of these are important in the more common, general case of sane hashing
performance.

Apr 26 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:

 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
 in any way I'm aware of.  This means that a lazy range to iterate over the
 keys or values is relatively useless because it would generate heap activity
 anyhow to store the stack or queue necessary to iterate over the tree, so you
 may as well just use the current .keys or .values properties, which just
 return an array.  (Because of the way that ranges are designed, you can't just
 use the call stack, unless you use fibers or something, which is obviously not
 a win from an efficiency point of view.)

Hum, are you saying opApply superior when it comes to iterating in a 
tree? It seems that with opApply you could use the stack by recursively 
calling opApply with the given delegate on each one of the branches.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Michel Fortin (michel.fortin michelf.com)'s article
 On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:
 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
 in any way I'm aware of.  This means that a lazy range to iterate over the
 keys or values is relatively useless because it would generate heap activity
 anyhow to store the stack or queue necessary to iterate over the tree, so you
 may as well just use the current .keys or .values properties, which just
 return an array.  (Because of the way that ranges are designed, you can't just
 use the call stack, unless you use fibers or something, which is obviously not
 a win from an efficiency point of view.)

 Hum, are you saying opApply superior when it comes to iterating in a
 tree? It seems that with opApply you could use the stack by recursively
 calling opApply with the given delegate on each one of the branches.

Exactly.  On the other hand, you lose a lot of flexibility with opApply.  If you
tried to port most of std.range to operate on the opApply interface instead fo
the
forward range interface, I doubt you'd get very far.

IMHO, though, opApply should *not* be deprecated.  opApply and ranges attempt to
solve similar problems, but not the same problem.  Ranges attempt to solve the
problem of iterating over some object with maximum flexibility from the point of
view of the user of the object.  opApply attempts to solve the problem of
iterating with maximum flexibility from the point of view of the implementer of
the object.  In some cases, like the one you just described, the latter can be
better.  However, it might make sense to document and give examples of this
tradeoff somewhere.

Apr 26 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:

 == Quote from Michel Fortin (michel.fortin michelf.com)'s article
 Hum, are you saying opApply superior when it comes to iterating in a
 tree? It seems that with opApply you could use the stack by recursively
 calling opApply with the given delegate on each one of the branches.

 
 Exactly.  On the other hand, you lose a lot of flexibility with 
 opApply.  If you
 tried to port most of std.range to operate on the opApply interface 
 instead fo the
 forward range interface, I doubt you'd get very far.
 
 IMHO, though, opApply should *not* be deprecated.  opApply and ranges 
 attempt to
 solve similar problems, but not the same problem.  Ranges attempt to solve the
 problem of iterating over some object with maximum flexibility from the 
 point of
 view of the user of the object.  opApply attempts to solve the problem of
 iterating with maximum flexibility from the point of view of the implementer of
 the object.  In some cases, like the one you just described, the latter can be
 better.

Indeed. I certainly agree that both ranges and opApply have their place.

So what the implementer can do with opApply is write an optimized 
iteration algorithm for use with foreach. Which may mean that when both 
opApply and ranges are available for generating foreach's code, the 
compiler should favor opApply. Currently, I believe it's the reverse.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Michel Fortin (michel.fortin michelf.com)'s article
 Indeed. I certainly agree that both ranges and opApply have their place.
 So what the implementer can do with opApply is write an optimized
 iteration algorithm for use with foreach. Which may mean that when both
 opApply and ranges are available for generating foreach's code, the
 compiler should favor opApply. Currently, I believe it's the reverse.

You probably have a point.  If the range interface is "good enough" for the
implementer of the object to write an optimal implementation without defining
opApply, then the implementer will probably not define opApply.  On the other
hand, it could make sense to define a range interface for when the user of the
object needs that flexibility and an opApply interface that is more efficient
internally in the same object.

What I think it really boils down to, and what should be emphasized in the
documentation, is control of the stack.  If the implementer of the object needs
control of the stack during iteration, use opApply to get this control.
Otherwise, use ranges to allow the user more control over the iteration process.

However, what I care more about, upon thinking more about it, is that the
concept
of an iterable gets defined "officially" in Phobos and in the documentation.  An
iterable is any object of type T such that the following code works:

foreach(elem; T.init) { // do stuff. }

This can be considered a superset of input ranges, since all input ranges are
iterables but not all iterables are input ranges.  Of course, stuff that works
like:

foreach(key, value; T.init) { // do stuff. }

uses opApply but would not be considered an iterable because it iterates more
than
one item.  Therefore, strictly speaking, the model is broken, but these kinds of
oddball situations are few enough and far enough between that I think they can
be
ignored, at least until the issue of making ranges do this kind of thing too is
addressed.

The idea, then, is that if all you need is an iterable for some generic function
in Phobos or in any 3rd party library, make the constraint be that it only
requires an iterable.  To encourage this, an isIterable template should be
included in Phobos, and std.range.ElementType should be generalized to return
the
element type for all iterables, not just ranges.

Apr 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:
 
 == Quote from Michel Fortin (michel.fortin michelf.com)'s article
 Hum, are you saying opApply superior when it comes to iterating in a
 tree? It seems that with opApply you could use the stack by recursively
 calling opApply with the given delegate on each one of the branches.

 Exactly.  On the other hand, you lose a lot of flexibility with 
 opApply.  If you
 tried to port most of std.range to operate on the opApply interface 
 instead fo the
 forward range interface, I doubt you'd get very far.

 IMHO, though, opApply should *not* be deprecated.  opApply and ranges 
 attempt to
 solve similar problems, but not the same problem.  Ranges attempt to 
 solve the
 problem of iterating over some object with maximum flexibility from 
 the point of
 view of the user of the object.  opApply attempts to solve the problem of
 iterating with maximum flexibility from the point of view of the 
 implementer of
 the object.  In some cases, like the one you just described, the 
 latter can be
 better.

 
 Indeed. I certainly agree that both ranges and opApply have their place.
 
 So what the implementer can do with opApply is write an optimized 
 iteration algorithm for use with foreach. Which may mean that when both 
 opApply and ranges are available for generating foreach's code, the 
 compiler should favor opApply. Currently, I believe it's the reverse.

I think, all other things being equal, that opApply tends to be slower 
than iterators.

Andrei

Apr 26 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 26 Apr 2009 21:20:32 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Michel Fortin wrote:
 On 2009-04-26 11:46:51 -0400, dsimcha <dsimcha yahoo.com> said:

 == Quote from Michel Fortin (michel.fortin michelf.com)'s article
 Hum, are you saying opApply superior when it comes to iterating in a
 tree? It seems that with opApply you could use the stack by  
 recursively
 calling opApply with the given delegate on each one of the branches.

 Exactly.  On the other hand, you lose a lot of flexibility with  
 opApply.  If you
 tried to port most of std.range to operate on the opApply interface  
 instead fo the
 forward range interface, I doubt you'd get very far.

 IMHO, though, opApply should *not* be deprecated.  opApply and ranges  
 attempt to
 solve similar problems, but not the same problem.  Ranges attempt to  
 solve the
 problem of iterating over some object with maximum flexibility from  
 the point of
 view of the user of the object.  opApply attempts to solve the problem  
 of
 iterating with maximum flexibility from the point of view of the  
 implementer of
 the object.  In some cases, like the one you just described, the  
 latter can be
 better.

  Indeed. I certainly agree that both ranges and opApply have their  
 place.
  So what the implementer can do with opApply is write an optimized  
 iteration algorithm for use with foreach. Which may mean that when both  
 opApply and ranges are available for generating foreach's code, the  
 compiler should favor opApply. Currently, I believe it's the reverse.

 I think, all other things being equal, that opApply tends to be slower  
 than iterators.

It depends.  The range must have either inline-able functions, or must NOT  
call virtual functions.  Otherwise, it could be worse.  there is also the  
factor that you are using an inner function with opApply, so if you  
regularly access variables outside the foreach loop, then those add up.

Observe that in opapply style, you do one delegate call per loop iteration.
With range style, you do three calls on the range per loop iteration  
(empty, front, popFront).

If those range calls call virtual calls or ARE virtual calls, then range  
loses.  Of course, the difference might be washed if you do lots of  
accesses to an outer variable in your foreach loop, which require a  
pointer dereference vs. a stack dereference.

We should quantify this and see what the actual performance is.

-Steve

Apr 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 26 Apr 2009 21:20:32 -0400, Andrei Alexandrescu 
 I think, all other things being equal, that opApply tends to be slower 
 than iterators.

 
 It depends.  The range must have either inline-able functions, or must 
 NOT call virtual functions.  Otherwise, it could be worse.  there is 
 also the factor that you are using an inner function with opApply, so if 
 you regularly access variables outside the foreach loop, then those add up.
 
 Observe that in opapply style, you do one delegate call per loop iteration.
 With range style, you do three calls on the range per loop iteration 
 (empty, front, popFront).
 
 If those range calls call virtual calls or ARE virtual calls, then range 
 loses.  Of course, the difference might be washed if you do lots of 
 accesses to an outer variable in your foreach loop, which require a 
 pointer dereference vs. a stack dereference.
 
 We should quantify this and see what the actual performance is.

This is a very good point. If a range offers virtual 
empty/popFront/front, then opApply is a faster proposition. This makes 
for a good use case for making opApply the preferred way of iteration, 
if present (if you just do foreach and opApply is present, use it).

Andrei

Apr 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Michel Fortin (michel.fortin michelf.com)'s article
 On 2009-04-26 11:21:54 -0400, dsimcha <dsimcha yahoo.com> said:
 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
 in any way I'm aware of.  This means that a lazy range to iterate over the
 keys or values is relatively useless because it would generate heap activity
 anyhow to store the stack or queue necessary to iterate over the tree, so you
 may as well just use the current .keys or .values properties, which just
 return an array.  (Because of the way that ranges are designed, you can't just
 use the call stack, unless you use fibers or something, which is obviously not
 a win from an efficiency point of view.)

 Hum, are you saying opApply superior when it comes to iterating in a
 tree? It seems that with opApply you could use the stack by recursively
 calling opApply with the given delegate on each one of the branches.

 
 Exactly.  On the other hand, you lose a lot of flexibility with opApply.  If
you
 tried to port most of std.range to operate on the opApply interface instead fo
the
 forward range interface, I doubt you'd get very far.
 
 IMHO, though, opApply should *not* be deprecated.  opApply and ranges attempt
to
 solve similar problems, but not the same problem.  Ranges attempt to solve the
 problem of iterating over some object with maximum flexibility from the point
of
 view of the user of the object.  opApply attempts to solve the problem of
 iterating with maximum flexibility from the point of view of the implementer of
 the object.  In some cases, like the one you just described, the latter can be
 better.  However, it might make sense to document and give examples of this
 tradeoff somewhere.

1. Depth of iteration is low, so a short vector optimization will most 
of the time solve the problem without allocating extra memory.

2. I haven't measured, but the cost of the indirect call is large enough 
to make me suspect that opApply is not as efficient as it's cracked to 
be, even when compared with an iterator.


Andrei

Apr 26 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 2. I haven't measured, but the cost of the indirect call is large enough 
 to make me suspect that opApply is not as efficient as it's cracked to 
 be, even when compared with an iterator.

When you know the type beforehand or can use templates, that is, rather 
than wrapping your range struct in a wrapper class. If you can't use a 
template for whatever reason, ranges are going to suck -- three virtual 
calls rather than one.

I don't usually care sufficiently about performance to worry about 
whether a call is virtual or not, but you brought that issue up before. 
And I imagine that, most of the time, you will know the range type in 
advance.

Apr 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Andrei Alexandrescu wrote:
 2. I haven't measured, but the cost of the indirect call is large 
 enough to make me suspect that opApply is not as efficient as it's 
 cracked to be, even when compared with an iterator.

 
 When you know the type beforehand or can use templates, that is, rather 
 than wrapping your range struct in a wrapper class. If you can't use a 
 template for whatever reason, ranges are going to suck -- three virtual 
 calls rather than one.
 
 I don't usually care sufficiently about performance to worry about 
 whether a call is virtual or not, but you brought that issue up before. 
 And I imagine that, most of the time, you will know the range type in 
 advance.

Yah, that's a good motivation to change how hashes are currently 
handled. But above I was referring to the cost of opApply's callback, 
which adds to the costs you mention.

Andrei

Apr 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Christopher Wright (dhasenan gmail.com)'s article
 Andrei Alexandrescu wrote:
 2. I haven't measured, but the cost of the indirect call is large enough
 to make me suspect that opApply is not as efficient as it's cracked to
 be, even when compared with an iterator.

 When you know the type beforehand or can use templates, that is, rather
 than wrapping your range struct in a wrapper class. If you can't use a
 template for whatever reason, ranges are going to suck -- three virtual
 calls rather than one.
 I don't usually care sufficiently about performance to worry about
 whether a call is virtual or not, but you brought that issue up before.
 And I imagine that, most of the time, you will know the range type in
 advance.

Rule number 1 of all performance discussions:  Always measure if possible.

import std.stdio, std.perf;

struct Direct {
    uint n;

    uint front() {
        return n;
    }

    void popFront() {
        n++;
    }

    bool empty() {
        return n >= 1_000_000_000;
    }
}

struct Apply {

    int opApply(int delegate(ref uint) dg) {
        int res = 0;
        foreach(i; 0U..1_000_000_000) {
            res = dg(i);
            if(res) break;
        }
        return res;
    }
}

class Virtual {
    uint n;

    uint front() {
        return n;
    }

    void popFront() {
        n++;
    }

    bool empty() {
        return n >= 1_000_000_000;
    }
}

void main() {
    scope pc = new PerformanceCounter;
    pc.start;
    foreach(elem; Direct.init) {}
    pc.stop;
    writeln("Direct:  ", pc.milliseconds);
    pc.start;
    auto v = new Virtual;
    foreach(elem; v) {}
    pc.stop;
    writeln("Virtual:  ", pc.milliseconds);
    pc.start;
    foreach(elem; Apply.init) {}
    pc.stop;
    writeln("opApply:  ", pc.milliseconds);
}

Output:
Direct:  2343
Virtual:  5695
opApply:  3014

Bottom line is that these pretty much come in the order you would expect them
to,
but there are no particularly drastic differences between any of them.  To put
these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
2.7 GHz machine) 15 clock cycles per iteration.  How often does anyone really
have
code that is performance critical *and* where the contents of the loop don't
take
long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
need the iteration to be polymorphic?

Apr 26 2009

Robert Fraser <fraserofthenight gmail.com> writes:

dsimcha wrote:
 == Quote from Christopher Wright (dhasenan gmail.com)'s article
 Andrei Alexandrescu wrote:
 2. I haven't measured, but the cost of the indirect call is large enough
 to make me suspect that opApply is not as efficient as it's cracked to
 be, even when compared with an iterator.

 When you know the type beforehand or can use templates, that is, rather
 than wrapping your range struct in a wrapper class. If you can't use a
 template for whatever reason, ranges are going to suck -- three virtual
 calls rather than one.
 I don't usually care sufficiently about performance to worry about
 whether a call is virtual or not, but you brought that issue up before.
 And I imagine that, most of the time, you will know the range type in
 advance.

 
 Rule number 1 of all performance discussions:  Always measure if possible.
 
 import std.stdio, std.perf;
 
 struct Direct {
     uint n;
 
     uint front() {
         return n;
     }
 
     void popFront() {
         n++;
     }
 
     bool empty() {
         return n >= 1_000_000_000;
     }
 }
 
 struct Apply {
 
     int opApply(int delegate(ref uint) dg) {
         int res = 0;
         foreach(i; 0U..1_000_000_000) {
             res = dg(i);
             if(res) break;
         }
         return res;
     }
 }
 
 class Virtual {
     uint n;
 
     uint front() {
         return n;
     }
 
     void popFront() {
         n++;
     }
 
     bool empty() {
         return n >= 1_000_000_000;
     }
 }
 
 void main() {
     scope pc = new PerformanceCounter;
     pc.start;
     foreach(elem; Direct.init) {}
     pc.stop;
     writeln("Direct:  ", pc.milliseconds);
     pc.start;
     auto v = new Virtual;
     foreach(elem; v) {}
     pc.stop;
     writeln("Virtual:  ", pc.milliseconds);
     pc.start;
     foreach(elem; Apply.init) {}
     pc.stop;
     writeln("opApply:  ", pc.milliseconds);
 }
 
 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014
 
 Bottom line is that these pretty much come in the order you would expect them
to,
 but there are no particularly drastic differences between any of them.  To put
 these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
 2.7 GHz machine) 15 clock cycles per iteration.  How often does anyone really
have
 code that is performance critical *and* where the contents of the loop don't
take
 long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
 need the iteration to be polymorphic?

Did you compile that with optimization/inlining? If not, it's not really 
a fair test... and if so why isn't the compiler getting rid of those 
pointless loops?

Apr 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Robert Fraser (fraserofthenight gmail.com)'s article
 Did you compile that with optimization/inlining? If not, it's not really
 a fair test... and if so why isn't the compiler getting rid of those
 pointless loops?

Yes, -O -inline -release.  I didn't bother stating that explicitly b/c it's the
standard way of doing benchmarks.

Apr 27 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:

 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014

Nice.

Isn't there room for optimisation on the compiler side though? I mean, 
the compiler could inline opApply, and while doing so it could notice 
that the delegate is constant and inline it too. I suspect that adding 
this optimisation would put opApply at the same performance level than 
ranges.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 27 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Michel Fortin wrote:
 On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:
 
 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014

 
 Nice.
 
 Isn't there room for optimisation on the compiler side though? I mean,
 the compiler could inline opApply, and while doing so it could notice
 that the delegate is constant and inline it too. I suspect that adding
 this optimisation would put opApply at the same performance level than
 ranges.

Inlining does not automatically make things faster.  Case in point:
downs' raytracer stacy actually slows down when you inline certain
parts.  The penalty of not fitting in cache was overwhelming the penalty
from using virtual methods.

  -- Daniel

Apr 27 2009

bearophile <bearophileHUGS lycos.com> writes:

 Michel Fortin:
 Isn't there room for optimisation on the compiler side though? I mean,
 the compiler could inline opApply, and while doing so it could notice
 that the delegate is constant and inline it too. I suspect that adding
 this optimisation would put opApply at the same performance level than
 ranges.


Daniel Keep:
 Inlining does not automatically make things faster.

That's true in general: if you inline too much code you may end up with too
much code to be moved inside and out of the caches, reducing the performance.
I agree with Michel, if done wisely by the compiler some inlining may help with
such loops.

Bye,
bearophile

Apr 27 2009

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Michel Fortin wrote:
 On 2009-04-27 00:50:23 -0400, dsimcha <dsimcha yahoo.com> said:
 
 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014

 
 Nice.
 
 Isn't there room for optimisation on the compiler side though? I mean, 
 the compiler could inline opApply, and while doing so it could notice 
 that the delegate is constant and inline it too.

For non-virtual opApply this is exactly what LDC does[1] at the higher 
optimization levels :).
(Assuming both opApply and the foreach body are deemed inline-worthy by the
inliner)

[1]: Since trunk r1219, which was about 11 days ago.


 I suspect that adding
 this optimisation would put opApply at the same performance level than
 ranges.

I haven't actually measured to see if this is true, but there should indeed be 
very little difference since this optimization essentially turns opApply into a 
regular loop (and LDC does this before any loop optimizations run).

Apr 27 2009

bearophile <bearophileHUGS lycos.com> writes:

Frits van Bommel:
 I haven't actually measured to see if this is true, but there should indeed be 
 very little difference since this optimization essentially turns opApply into
a 
 regular loop (and LDC does this before any loop optimizations run).

When you find a bit of free time please time it and show the timings :-)

Bye,
bearophile

Apr 27 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 Rule number 1 of all performance discussions:  Always measure if possible.

Very good, thank you. I have run your code, with a bit of changes:

import std.stdio: writeln;
import std.perf: PerformanceCounter;

enum uint N = 1_000_000_000;

struct Direct {
    uint n;
    uint front() { return n; }
    void popFront() { n++; }
    bool empty() { return n >= N; }
}


struct Apply {
    int opApply(int delegate(ref uint) dg) {
        int res;
        foreach (i; 0U .. N) {
            res = dg(i);
            if(res)
                break;
        }
        return res;
    }
}


class Virtual {
    uint n;
    uint front() { return n; }
    void popFront() { n++; }
    bool empty() { return n >= N; }
}


void main() {
    scope pc = new PerformanceCounter;
    pc.start;
    foreach (elem; Direct.init) {}
    pc.stop;
    writeln("Direct:         ", pc.milliseconds);

    pc.start;
    foreach (elem; Apply.init) {}
    pc.stop;
    writeln("opApply:        ", pc.milliseconds);

    pc.start;
    auto v = new Virtual;
    foreach (elem; v) {}
    pc.stop;
    writeln("Virtual:        ", pc.milliseconds);

    pc.start;
    scope auto v2 = new Virtual;
    foreach (elem; v2) {}
    pc.stop;
    writeln("Scoped virtual: ", pc.milliseconds);
}

As you see I have added a version with scoped class at the bottom hoping to
improve performance a bit, but the result is the opposite, can you explain me
why?

Direct:         2699
opApply:        3520
Virtual:        7543
Scoped virtual: 8550

Run on a Core2 2 GHz, on WinXP.

Bye,
bearophile

Apr 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014
 
 Bottom line is that these pretty much come in the order you would expect them
to,
 but there are no particularly drastic differences between any of them.  To put
 these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
 2.7 GHz machine) 15 clock cycles per iteration.  How often does anyone really
have
 code that is performance critical *and* where the contents of the loop don't
take
 long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
 need the iteration to be polymorphic?

Thanks for the numbers. The way I'd read them is not to look at the 
milliseconds but at the fractions. Loops with opApply have an extra 28% 
overhead, and I think that doesn't make them look very well. 
Bearophile's numbers show a 30% overhead.

I have measurements that show more dramatic pessimization, but I never 
keep those around. I know in my early tests for std.algorithm that had I 
used opApply in it, I would have essentially never used an algorithm if 
I could help it.


Andrei

Apr 27 2009

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

dsimcha wrote:
[snip]
 
 Output:
 Direct:  2343
 Virtual:  5695
 opApply:  3014
 
 Bottom line is that these pretty much come in the order you would expect them
to,
 but there are no particularly drastic differences between any of them.  To put
 these timings in perspective, 5700 ms for 1 billion iterations is roughly (on a
 2.7 GHz machine) 15 clock cycles per iteration.  How often does anyone really
have
 code that is performance critical *and* where the contents of the loop don't
take
 long enough to dwarf the 15 clock cycles per iteration loop overhead *and* you
 need the iteration to be polymorphic?

I edited this code to work with ldc (D1) + Tango, and saw the Direct and
opApply 
cases generate identical code (inc, cmp, jne, with the loop counter in a 
register) [1], so they're equally fast (modulo process scheduling randomness).
Virtual was roughly 10 times slower on my machine. (with ldc)

Unfortunately, I can't directly compare timings between ldc and dmd directly 
because dmd is likely at a disadvantage due to being 32-bit in a 64-bit world.

Although... the Virtual case takes about equal time with ldc- and dmd-compiled 
code, so maybe the slowness of Direct/dmd when compared to Direct/ldc (the dmd 
code is a factor 3 slower) is due to it apparently not register-allocating the 
loop variable.

The opApply case was another factor 2 slower than Direct with dmd on my
machine, 
probably because opApply and the loop body don't get inlined.

It seems gdc is the only compiler to realize the first loop can be completely 
optimized away. It's again about equally fast for Virtual, but for opApply it's 
roughly a factor 3 slower than ldc; it seem to inline only opApply itself, not 
the loop body.


[1]: -O3 -release (with inlining), x86_64

Apr 27 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-27 10:51:22 -0400, Frits van Bommel 
<fvbommel REMwOVExCAPSs.nl> said:

 I edited this code to work with ldc (D1) + Tango, and saw the Direct 
 and opApply cases generate identical code (inc, cmp, jne, with the loop 
 counter in a register) [1], so they're equally fast (modulo process 
 scheduling randomness).

Thank you for your timings. I think it shows my point: that by 
prefering ranges over opApply we're just optimising around a deficiency 
in DMD's optimizer.

I'm thinking... with proper inlining, perhaps we could take the notion 
of ranges out of the compiler and just define a generic opApply in 
std.range that use front, popFront, and empty. :-) Perhaps.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 28 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Michel Fortin wrote:
 On 2009-04-27 10:51:22 -0400, Frits van Bommel
 <fvbommel REMwOVExCAPSs.nl> said:
 
 I edited this code to work with ldc (D1) + Tango, and saw the Direct
 and opApply cases generate identical code (inc, cmp, jne, with the
 loop counter in a register) [1], so they're equally fast (modulo
 process scheduling randomness).

 
 Thank you for your timings. I think it shows my point: that by prefering
 ranges over opApply we're just optimising around a deficiency in DMD's
 optimizer.

Not true.  Here's an excellent reason to use ranges over opApply: you
can't define zip with opApply.  Because opApply uses inversion of
control, you can't use more than one without bringing threads into the
equation.

 I'm thinking... with proper inlining, perhaps we could take the notion
 of ranges out of the compiler and just define a generic opApply in
 std.range that use front, popFront, and empty. :-) Perhaps.

I suspect supporting ranges is just much easier.

  -- Daniel

Apr 28 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-28 06:33:13 -0400, Daniel Keep <daniel.keep.lists gmail.com> said:

 Michel Fortin wrote:
 On 2009-04-27 10:51:22 -0400, Frits van Bommel
 <fvbommel REMwOVExCAPSs.nl> said:
 
 I edited this code to work with ldc (D1) + Tango, and saw the Direct
 and opApply cases generate identical code (inc, cmp, jne, with the
 loop counter in a register) [1], so they're equally fast (modulo
 process scheduling randomness).

 
 Thank you for your timings. I think it shows my point: that by prefering
 ranges over opApply we're just optimising around a deficiency in DMD's
 optimizer.

 
 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

I guess I removed too much context from the above posts. We're just 
timing various foreach implementations.

You're right when you say ranges are more versatile than opApply, and 
I'm all for keeping both ranges and opApply. I just want the compiler 
to prefer opApply over ranges when both are available when generating 
code for foreach, since with opApply you sometime can optimize things 
in a way that that you can't with ranges.


 I'm thinking... with proper inlining, perhaps we could take the notion
 of ranges out of the compiler and just define a generic opApply in
 std.range that use front, popFront, and empty. :-) Perhaps.

 
 I suspect supporting ranges is just much easier.

Sure, especially since they're already implemented in the compiler. 
Inlining of delegates known at compile-time would have a greater reach 
though.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 28 2009

downs <default_357-line yahoo.de> writes:

Daniel Keep wrote:
 
 Michel Fortin wrote:
 On 2009-04-27 10:51:22 -0400, Frits van Bommel
 <fvbommel REMwOVExCAPSs.nl> said:

 I edited this code to work with ldc (D1) + Tango, and saw the Direct
 and opApply cases generate identical code (inc, cmp, jne, with the
 loop counter in a register) [1], so they're equally fast (modulo
 process scheduling randomness).

 Thank you for your timings. I think it shows my point: that by prefering
 ranges over opApply we're just optimising around a deficiency in DMD's
 optimizer.

 
 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.
 

Your point stands, of course, but I just wanted to mention that
stackthreads/fibers work too and have far less overhead.

 I'm thinking... with proper inlining, perhaps we could take the notion
 of ranges out of the compiler and just define a generic opApply in
 std.range that use front, popFront, and empty. :-) Perhaps.

 
 I suspect supporting ranges is just much easier.
 
   -- Daniel

Apr 28 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 28 Apr 2009 07:35:22 -0400, downs <default_357-line yahoo.de>  
wrote:

 Daniel Keep wrote:
 Michel Fortin wrote:
 On 2009-04-27 10:51:22 -0400, Frits van Bommel
 <fvbommel REMwOVExCAPSs.nl> said:

 I edited this code to work with ldc (D1) + Tango, and saw the Direct
 and opApply cases generate identical code (inc, cmp, jne, with the
 loop counter in a register) [1], so they're equally fast (modulo
 process scheduling randomness).

 Thank you for your timings. I think it shows my point: that by  
 prefering
 ranges over opApply we're just optimising around a deficiency in DMD's
 optimizer.

 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 Your point stands, of course, but I just wanted to mention that  
 stackthreads/fibers work too and have far less overhead.

read: less overhead than full threads, not less overhead than ranges ;)

-Steve

Apr 28 2009

grauzone <none example.net> writes:

downs wrote:
 Daniel Keep wrote:
 Michel Fortin wrote:
 On 2009-04-27 10:51:22 -0400, Frits van Bommel
 <fvbommel REMwOVExCAPSs.nl> said:

 I edited this code to work with ldc (D1) + Tango, and saw the Direct
 and opApply cases generate identical code (inc, cmp, jne, with the
 loop counter in a register) [1], so they're equally fast (modulo
 process scheduling randomness).

 Thank you for your timings. I think it shows my point: that by prefering
 ranges over opApply we're just optimising around a deficiency in DMD's
 optimizer.

 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 
 Your point stands, of course, but I just wanted to mention that
stackthreads/fibers work too and have far less overhead.


First, don't they have LOTS of memory overhead, because each fiber needs 
a new/separate stack?

Second, isn't the overhead still relatively high compared to simple 
function calls?

Apr 28 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

grauzone wrote:
 downs wrote:
 Daniel Keep wrote:
 Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 Your point stands, of course, but I just wanted to mention that
 stackthreads/fibers work too and have far less overhead.

 
 First, don't they have LOTS of memory overhead, because each fiber needs
 a new/separate stack?

A fiber-specific stack needn’t be very large. A few KB is often enough
even for long-running threads; if the call stack is only going to be a
few levels deep you might get away with a few hundred bytes each.

—Joel Salomon

Apr 28 2009

downs <default_357-line yahoo.de> writes:

Joel C. Salomon wrote:
 grauzone wrote:
 downs wrote:
 Daniel Keep wrote:
 Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 Your point stands, of course, but I just wanted to mention that
 stackthreads/fibers work too and have far less overhead.

 First, don't they have LOTS of memory overhead, because each fiber needs
 a new/separate stack?

 
 A fiber-specific stack needn’t be very large. A few KB is often enough
 even for long-running threads; if the call stack is only going to be a
 few levels deep you might get away with a few hundred bytes each.
 
 —Joel Salomon

Furthermore, using anonymous mmapped files the memory is only allocated when
accessed - the only thing taken is address space. Which, granted, can be
limiting. Luckily 64-bit will fix that. :)

Apr 28 2009

Georg Wrede <georg.wrede iki.fi> writes:

downs wrote:
 Joel C. Salomon wrote:
 grauzone wrote:
 downs wrote:
 Daniel Keep wrote:
 Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 Your point stands, of course, but I just wanted to mention that
 stackthreads/fibers work too and have far less overhead.

 First, don't they have LOTS of memory overhead, because each fiber needs
 a new/separate stack?

 A fiber-specific stack needn�t be very large. A few KB is often enough
 even for long-running threads; if the call stack is only going to be a
 few levels deep you might get away with a few hundred bytes each.


The Commodore-64 had a stack of 0.25kB shared between your Basic 
program, subroutine calls, and the operating system. While not unheard 
of, it still was unusual to have to make amendments to program design 
because of stack limitations, even in large programs.

*If* we had a convenient way to record the high-water mark of stack 
usage for functions (and maybe also threads), then people could have new 
insights on how little stack space is needed.

This would be /very/ convenient once we start using thread specific 
stacks, because the stack space has to be allocated in advance, 
hopefully not wasting huge amounts of space "just in case".

Andrei, Walter???


PS, on the 6502 (the C64 CPU), the stack was hard wired to 0100-01FF. 
This made the processor vastly simpler, which was essential for cost 
savings in an era where an additional transistor in the processor did 
cost an arm and a leg.

The area before the stack was used with 8-bit addressing, which was 
faster, and traditionally it was used more or less as an extension to 
processor registers, because the 6502 only had the X, Y, A, PC, and 
Flags registers. The area after the stack was free for the circuit board 
designer to use in any way he liked. On the C64 this was used for ROM, 
memory mapped peripherals, IO, the heap, and the program area. 
Interestingly, self-modifying code was used for keyboard input, to 
enhance speed and code size. This concept was copied from the VIC-20, 
which came with 3k of usable RAM.

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 downs wrote:
 Joel C. Salomon wrote:
 grauzone wrote:
 downs wrote:
 Daniel Keep wrote:
 Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into 
 the
 equation.

 Your point stands, of course, but I just wanted to mention that
 stackthreads/fibers work too and have far less overhead.

 First, don't they have LOTS of memory overhead, because each fiber 
 needs
 a new/separate stack?

 A fiber-specific stack needn�t be very large. A few KB is often enough
 even for long-running threads; if the call stack is only going to be a
 few levels deep you might get away with a few hundred bytes each.


 
 The Commodore-64 had a stack of 0.25kB shared between your Basic 
 program, subroutine calls, and the operating system. While not unheard 
 of, it still was unusual to have to make amendments to program design 
 because of stack limitations, even in large programs.
 
 *If* we had a convenient way to record the high-water mark of stack 
 usage for functions (and maybe also threads), then people could have new 
 insights on how little stack space is needed.
 
 This would be /very/ convenient once we start using thread specific 
 stacks, because the stack space has to be allocated in advance, 
 hopefully not wasting huge amounts of space "just in case".
 
 Andrei, Walter???

Since C64 programs have gotten a lot bigger, programming style favors 
deeper call stacks, and (mutually) recursive functions are not a rarity.

Computing the stack depth even for non-recursive functions is an 
interprocedural analysis so it's difficult to do modularly. The stack is 
also unchecked so going with a "good enough" approximation is bound to 
be risky.


Andrei

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 downs wrote:
 Joel C. Salomon wrote:
 grauzone wrote:
 downs wrote:
 Daniel Keep wrote:
 Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads 
 into the
 equation.

 Your point stands, of course, but I just wanted to mention that
 stackthreads/fibers work too and have far less overhead.

 First, don't they have LOTS of memory overhead, because each fiber 
 needs
 a new/separate stack?

 A fiber-specific stack needn�t be very large. A few KB is often enough
 even for long-running threads; if the call stack is only going to be a
 few levels deep you might get away with a few hundred bytes each.


 The Commodore-64 had a stack of 0.25kB shared between your Basic 
 program, subroutine calls, and the operating system. While not unheard 
 of, it still was unusual to have to make amendments to program design 
 because of stack limitations, even in large programs.

 *If* we had a convenient way to record the high-water mark of stack 
 usage for functions (and maybe also threads), then people could have 
 new insights on how little stack space is needed.

 This would be /very/ convenient once we start using thread specific 
 stacks, because the stack space has to be allocated in advance, 
 hopefully not wasting huge amounts of space "just in case".

 Andrei, Walter???

 
 Since C64 programs have gotten a lot bigger, programming style favors 
 deeper call stacks, and (mutually) recursive functions are not a rarity.
 
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The stack is 
 also unchecked so going with a "good enough" approximation is bound to 
 be risky.

The old size estimation problem. No matter how big you decide on, 
somewhere is an idiot who will blow that size too.


A runtime solution just occurred to me. The newly allocated stack for a 
thread is zeroed first, right? Well then, just before the thread 
finishes, one could scan the stack area, and find the first address that 
contains a non-zero value.

This isn't bullet proof in theory, but hey, the aim is to help in 
/estimating/ the needed stack size.

If this shows the maximum stack usage as 160 bytes, then today's 
youngsters could allocate, say, 0.25k, (or even 1k) instead of the 50k 
they'd otherwise allocate.

Incidentally, this raises another thought: what if the stack was 
extendable? Like D arrays? A stack overflow would trigger a relocation 
of the stack, and then continue. Being able to have hundreds of small 
stacks would be useful in other things than threads, too.

Apr 29 2009

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

Georg Wrede wrote:
 *If* we had a convenient way to record the high-water mark of stack
 usage for functions (and maybe also threads), then people could have new
 insights on how little stack space is needed.
 
 This would be /very/ convenient once we start using thread specific
 stacks, because the stack space has to be allocated in advance,
 hopefully not wasting huge amounts of space "just in case".

What do you mean by “thread-specific stacks”?

 Incidentally, this raises another thought: what if the stack was
 extendable? Like D arrays? A stack overflow would trigger a relocation
 of the stack, and then continue. Being able to have hundreds of small
 stacks would be useful in other things than threads, too.

How much rewriting of the stack would have to happen? Relocating the
stack will invalidate any pointers to automatic (in the C sense) variables.

Detecting stack overflow at runtime might also not be trivial.

—Joel Salomon

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Joel C. Salomon wrote:
 Detecting stack overflow at runtime might also not be trivial.

In a way it is trivial... if you accept a 7% added overhead per function 
call :o). (The number is what I remember costed Borland Pascal.)

Andrei

Apr 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 [Detecting stack overflow at runtime] In a way it is trivial... if you accept
a 7% added overhead per function call :o). (The number is what I remember
costed Borland Pascal.)<

Now Delphi and FreePascal allow to switch that on and off. From the page:
http://www.freepascal.org/docs-html/prog/progsu100.html

1.2.24  $S : Stack checking
The {$S+} directive tells the compiler to generate stack checking code. This
generates code to check if a stack overflow occurred, i.e. to see whether the
stack has grown beyond its maximally allowed size. If the stack grows beyond
the maximum size, then a run-time error is generated, and the program will exit
with exit code 202.
Specifying {$S-} will turn generation of stack-checking code off.
The command line compiler switch -Ct has the same effect as the {$S+} directive.
By default, no stack checking is performed.

I have never timed programs to test how much you have to pay if you want stack
checking code. The compiler is free, and suitable testing programs can be found
in the Shootout site, so if you want you don't need much time to test this.

Bye,
bearophile

Apr 29 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

bearophile wrote:
 Andrei Alexandrescu:
 [Detecting stack overflow at runtime] In a way it is trivial... if you accept
a 7% added overhead per function call :o). (The number is what I remember
costed Borland Pascal.)<

 
 Now Delphi and FreePascal allow to switch that on and off. From the page:
 http://www.freepascal.org/docs-html/prog/progsu100.html
 
 1.2.24  $S : Stack checking
 The {$S+} directive tells the compiler to generate stack checking code. This
generates code to check if a stack overflow occurred, i.e. to see whether the
stack has grown beyond its maximally allowed size. If the stack grows beyond
the maximum size, then a run-time error is generated, and the program will exit
with exit code 202.
 Specifying {$S-} will turn generation of stack-checking code off.
 The command line compiler switch -Ct has the same effect as the {$S+}
directive.
 By default, no stack checking is performed.
 
 I have never timed programs to test how much you have to pay if you want stack
checking code. The compiler is free, and suitable testing programs can be found
in the Shootout site, so if you want you don't need much time to test this.
 
 Bye,
 bearophile

Doesn't the OS protect you from stack overflows?

  -- Daniel

Apr 29 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 29 de abril a las 09:58 me escribiste:
 Joel C. Salomon wrote:
Detecting stack overflow at runtime might also not be trivial.

 
 In a way it is trivial... if you accept a 7% added overhead per function call 
 :o). (The number is what I remember costed Borland Pascal.)

Using page protection have no overhead, unless you actually hit a stack
overflow.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The stack 
 is also unchecked so going with a "good enough" approximation is bound 
 to be risky.

 
 The old size estimation problem. No matter how big you decide on, 
 somewhere is an idiot who will blow that size too.
 
 
 A runtime solution just occurred to me. The newly allocated stack for a 
 thread is zeroed first, right? Well then, just before the thread 
 finishes, one could scan the stack area, and find the first address that 
 contains a non-zero value.
 
 This isn't bullet proof in theory, but hey, the aim is to help in 
 /estimating/ the needed stack size.

You can't be more serious than Mark Twain when giving advice on the 
stock market: "Buy some stock. If it goes up, sell it. If it doesn't go 
up, don't buy it."

 Incidentally, this raises another thought: what if the stack was 
 extendable? Like D arrays? A stack overflow would trigger a relocation 
 of the stack, and then continue. Being able to have hundreds of small 
 stacks would be useful in other things than threads, too.

That would be great. I don't know whether it can be done.

By the way, I think you're underestimating how much stack the OS gives 
to the stack. On my system:

$ grep PTHREAD_STACK /usr/include/**/*.h
/usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384
/usr/include/pthread.h:   minimal size of the block must be 
PTHREAD_STACK_MIN.*/
/usr/include/pthread.h:   to be started.  This size must never be less 
than PTHREAD_STACK_MIN
$ _

You can't have a stack smaller than 16KB as far as I understand. I seem 
to recall the default stack size is much bigger.


Andrei

Apr 29 2009

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The stack 
 is also unchecked so going with a "good enough" approximation is 
 bound to be risky.

 The old size estimation problem. No matter how big you decide on, 
 somewhere is an idiot who will blow that size too.


 A runtime solution just occurred to me. The newly allocated stack for 
 a thread is zeroed first, right? Well then, just before the thread 
 finishes, one could scan the stack area, and find the first address 
 that contains a non-zero value.

 This isn't bullet proof in theory, but hey, the aim is to help in 
 /estimating/ the needed stack size.


You could have the GC report stack sizes when it performs a collection. 
  No guarantee it would be a high-water mark, but it's something.

 Incidentally, this raises another thought: what if the stack was 
 extendable? Like D arrays? A stack overflow would trigger a relocation 
 of the stack, and then continue. Being able to have hundreds of small 
 stacks would be useful in other things than threads, too.

 
 That would be great. I don't know whether it can be done.

Windows works this way, and it's possible to do in user code (the Fiber 
implementation has some of the groundwork for this).  Beyond "usable" 
stack is a guard page that triggers a stack extension when hit.  But I 
think if you grow the stack past the guard page without ever touching it 
you can still blow the stack.

 By the way, I think you're underestimating how much stack the OS gives 
 to the stack. On my system:
 
 $ grep PTHREAD_STACK /usr/include/**/*.h
 /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384
 /usr/include/pthread.h:   minimal size of the block must be 
 PTHREAD_STACK_MIN.*/
 /usr/include/pthread.h:   to be started.  This size must never be less 
 than PTHREAD_STACK_MIN
 $ _
 
 You can't have a stack smaller than 16KB as far as I understand. I seem 
 to recall the default stack size is much bigger.

On Windows I think the default size is 1MB.

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
Georg Wrede wrote:
Andrei Alexandrescu wrote:
Computing the stack depth even for non-recursive functions is an
interprocedural analysis so it's difficult to do modularly. The stack
is also unchecked so going with a "good enough" approximation is
bound to be risky.

The old size estimation problem. No matter how big you decide on,
somewhere is an idiot who will blow that size too.

A runtime solution just occurred to me. The newly allocated stack for
a thread is zeroed first, right? Well then, just before the thread
finishes, one could scan the stack area, and find the first address
that contains a non-zero value.

This isn't bullet proof in theory, but hey, the aim is to help in
/estimating/ the needed stack size.

You can't be more serious than Mark Twain when giving advice on the
stock market: "Buy some stock. If it goes up, sell it. If it doesn't go
up, don't buy it."

I'd be amazed if the stack usage of hello.d varied from run to run.

Incidentally, this raises another thought: what if the stack was
extendable? Like D arrays? A stack overflow would trigger a relocation
of the stack, and then continue. Being able to have hundreds of small
stacks would be useful in other things than threads, too.

That would be great. I don't know whether it can be done.

Two methods come to mind.(1)

By the way, I think you're underestimating how much stack the OS gives
to the stack. On my system:

$ grep PTHREAD_STACK /usr/include/**/*.h
/usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384
/usr/include/pthread.h: minimal size of the block must be
PTHREAD_STACK_MIN.*/
/usr/include/pthread.h: to be started. This size must never be less
than PTHREAD_STACK_MIN

Seems like that minimum means the OS won't start a new thread with less
[stack] space left on the computer. Gotta be /some/ value, big enough to
be fine with virtually any app.

You can't have a stack smaller than 16KB as far as I understand. I seem
to recall the default stack size is much bigger.

Totally not having researched this... The page
http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-occupied-process.html
states that "Most Linux systems have no stack limits set" (didn't have
time to find a definitive reference, yet).

Also, PTHREAD_STACK_MIN would be the main process stack size. If we're
running co-operative multitasking, or continuations or some such, then
the other stacks really can be smaller.

The important thing to know is, does "somebody else" than our program
use the current stack? If not, then we're free to allocate "just enough"
(in practice much more) to the stacks. IIUC, our stack is only used by
our app, its library calls, and the OS API calls.

(1) Using several stacks:

What does one need to use several stacks? Let's assume we have the
"regular" stack, and then we temporarily want to use another.

First we allocate space for the new stack. Then we disable interrupts,
(we could even save processor state), change the stack pointer to point
to the new stack. Enable interrupts, and go on with the program (i.e.
call the routine we want to have using the new stack). Once it finishes,
disable interrupts, reset the stack pointer, enable interrupts.

Knowing in advance the needed stack space helps here. Especially if
instances of the same code are to run with each stack, then all we need
is an array of stack spaces. :-)

The other thing: suppose we really don't know the stack usage. Then each
routine can use the same stack while it runs, and at task switch we copy
the actually used portion of the stack to the heap. Next time we want to
run that task, we copy the data back from the heap.

I admit this isn't done in an evening, but it's doable.

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The 
 stack is also unchecked so going with a "good enough" approximation 
 is bound to be risky.

 The old size estimation problem. No matter how big you decide on, 
 somewhere is an idiot who will blow that size too.


 A runtime solution just occurred to me. The newly allocated stack for 
 a thread is zeroed first, right? Well then, just before the thread 
 finishes, one could scan the stack area, and find the first address 
 that contains a non-zero value.

 This isn't bullet proof in theory, but hey, the aim is to help in 
 /estimating/ the needed stack size.

 You can't be more serious than Mark Twain when giving advice on the 
 stock market: "Buy some stock. If it goes up, sell it. If it doesn't 
 go up, don't buy it."

 
 I'd be amazed if the stack usage of hello.d varied from run to run.

I'd be amazed if hello.d were your prototype of a useful program :o).

Let's face it. Any program that uses mutual recursion of any kind (even 
when the user did not set out to use recursion) or that uses alloca 
(fewer), is bound to have the consumed stack dependent on the input. 
Even without recursion, various traces of a program depend on a data and 
have different stack depths. That's why your estimate is utterly 
useless. I can't believe you brought it up as anything else but jesting 
(maybe you are and I'm not getting it). I can't even believe I'm 
replying! :o)


Andrei

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The 
 stack is also unchecked so going with a "good enough" approximation 
 is bound to be risky.

 The old size estimation problem. No matter how big you decide on, 
 somewhere is an idiot who will blow that size too.


 A runtime solution just occurred to me. The newly allocated stack 
 for a thread is zeroed first, right? Well then, just before the 
 thread finishes, one could scan the stack area, and find the first 
 address that contains a non-zero value.

 This isn't bullet proof in theory, but hey, the aim is to help in 
 /estimating/ the needed stack size.

 You can't be more serious than Mark Twain when giving advice on the 
 stock market: "Buy some stock. If it goes up, sell it. If it doesn't 
 go up, don't buy it."

 I'd be amazed if the stack usage of hello.d varied from run to run.

 
 I'd be amazed if hello.d were your prototype of a useful program :o).

Somebody more resourceful than I might /prove/ some maximum for any 
program that doesn't use recursion (be it implicit or explicit 
recursion). Hello.d attempted to be an example of such a program.

 Let's face it. Any program that uses mutual recursion of any kind (even 
 when the user did not set out to use recursion) or that uses alloca 
 (fewer), is bound to have the consumed stack dependent on the input. 

Of course. And there's a limit to the stack size, ultimately at the 
hardware level. For any increase in available stack size, the 
probability of an out-of-stack exception decreases, never reaching zero. 
I agree.

 Even without recursion, various traces of a program depend on a data and 
 have different stack depths. That's why your estimate is utterly 
 useless.

It'd be useless if we were attempting to give 100.000% guarantees.

 I can't believe you brought it up as anything else but jesting 
 (maybe you are and I'm not getting it).

Jesting on a level where you might not get it, wouldn't be fun. And 
definitely boring and confusing to the rest of the crowd. :-)

Maybe I'll have to dig up prior art, or something. Clearly I'm not 
qualified to properly defend the validity of this idea.

Apr 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Somebody more resourceful than I might /prove/ some maximum for any 
 program that doesn't use recursion (be it implicit or explicit 
 recursion). Hello.d attempted to be an example of such a program.

There's no need for a lot of expertise, it's a trivial fact for anyone 
who's written even a toy compiler. The tidbit that's difficult is the 
interprocedural part: in order to compute the stack consumed by any 
function, you need to know the stack consumed by all functions it invokes.

 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

I think my failure is seeing what the idea is. My understanding is that 
you use one trace of a run to estimate the (average? minimum? maximum?) 
stack needed by future runs of a thread. It's unclear to me how that 
estimate would be used, what sort of information it provides (if any - I 
highly doubt it tells much), and generally how you'd integrate that into 
a compiler.


Andrei

Apr 29 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Somebody more resourceful than I might /prove/ some maximum for any 
 program that doesn't use recursion (be it implicit or explicit 
 recursion). Hello.d attempted to be an example of such a program.

 
 There's no need for a lot of expertise, it's a trivial fact for anyone 
 who's written even a toy compiler. The tidbit that's difficult is the 
 interprocedural part: in order to compute the stack consumed by any 
 function, you need to know the stack consumed by all functions it invokes.
 
 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

 
 I think my failure is seeing what the idea is. My understanding is that 
 you use one trace of a run to estimate the (average? minimum? maximum?) 
 stack needed by future runs of a thread. It's unclear to me how that 
 estimate would be used, what sort of information it provides (if any - I 
 highly doubt it tells much), and generally how you'd integrate that into 
 a compiler.

I think in the first instance it was at least to get an idea of how much 
stack space you're typically using. I must confess to having absolutely 
no idea of how much stack space my D apps are using. It'd be an 
interesting thing to know.

 
 
 Andrei

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The 
 stack is also unchecked so going with a "good enough" 
 approximation is bound to be risky.

 The old size estimation problem. No matter how big you decide on, 
 somewhere is an idiot who will blow that size too.


 A runtime solution just occurred to me. The newly allocated stack 
 for a thread is zeroed first, right? Well then, just before the 
 thread finishes, one could scan the stack area, and find the first 
 address that contains a non-zero value.

 This isn't bullet proof in theory, but hey, the aim is to help in 
 /estimating/ the needed stack size.

 You can't be more serious than Mark Twain when giving advice on the 
 stock market: "Buy some stock. If it goes up, sell it. If it doesn't 
 go up, don't buy it."

 I'd be amazed if the stack usage of hello.d varied from run to run.

 I'd be amazed if hello.d were your prototype of a useful program :o).

 
 Somebody more resourceful than I might /prove/ some maximum for any 
 program that doesn't use recursion (be it implicit or explicit 
 recursion). Hello.d attempted to be an example of such a program.
 
 Let's face it. Any program that uses mutual recursion of any kind 
 (even when the user did not set out to use recursion) or that uses 
 alloca (fewer), is bound to have the consumed stack dependent on the 
 input. 

 
 Of course. And there's a limit to the stack size, ultimately at the 
 hardware level. For any increase in available stack size, the 
 probability of an out-of-stack exception decreases, never reaching zero. 
 I agree.
 
 Even without recursion, various traces of a program depend on a data 
 and have different stack depths. That's why your estimate is utterly 
 useless.

 
 It'd be useless if we were attempting to give 100.000% guarantees.
 
 I can't believe you brought it up as anything else but jesting (maybe 
 you are and I'm not getting it).


Having looked into things, it turns out I'm the one that now suspects 
you of jesting.

 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

There is prior art, by the truckload. A /very/ short slide show here: 
http://zope.stackless.com/StacklessEuroPy.ppt

I'd kill to get that in D.

And about your comments on stack size, seems regular Python has an 
in-built limit on recursion, at 1000 deep. That should be diametrically 
opposite your stance on stack size.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 I can't believe you brought it up as anything else but jesting (maybe 
 you are and I'm not getting it).


 
 Having looked into things, it turns out I'm the one that now suspects 
 you of jesting.

What I referred to was inferring a thread's actual stack requirements 
from one trace of its run. That's just untenable.

 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

 
 There is prior art, by the truckload. A /very/ short slide show here: 
 http://zope.stackless.com/StacklessEuroPy.ppt
 
 I'd kill to get that in D.

Interesting, just you can't claim you were talking about that all along. 
I mean, come on! You say one thing that was untenable, and now you come 
up with a different one that is. And you were right all along?

 And about your comments on stack size, seems regular Python has an 
 in-built limit on recursion, at 1000 deep. That should be diametrically 
 opposite your stance on stack size.

Python has apparently set a maximum to have a guarantee it won't crash 
when given a certain minimum stack size. That's nice, but I'm not quite 
seeing how that's opposite to what I said in this discussion.


Andrei

Apr 30 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 I can't believe you brought it up as anything else but jesting 
 (maybe you are and I'm not getting it).


 Having looked into things, it turns out I'm the one that now suspects 
 you of jesting.


Could it be that you're looking at this too much through the eyes of a 
compiler developer? With souce code that can include templates, it 
should be no surprise that a programmer can write a file which results 
in preposterous stack needs. This is certainly the case with C++, for 
example.

 What I referred to was inferring a thread's actual stack requirements 
 from one trace of its run. 

Of course you'd give different input data, a little like you give 
different input data to your unit tests.

 That's just untenable.

Agreed, IF there's a possibility of recursion happening, the variation 
in input data is not guaranteed, etc. But for practical purposes (read: 
most of my programs, and presumably those of the other programmers'), 
the situation is different.

For a particular application, we expect some explicit or implicit 
constarints on the input data. We also know (er, expect) whether (and to 
what extent) there will be recursion. Additionally (as with any 
real-life application), we accept that if the input data goes wild, the 
application /doesn't/ have to cope with it. (Making nuke proof apps 
where reasonably good is sufficient, is just gratuituous work.)

For example, a robust application, instead of accepting and coping with 
*anything*, instead has *documented limitations*. IMHO, that is the 
right way to "fix infinity problems". Not demanding limitless stack for 
all kinds of programs.

 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

 There is prior art, by the truckload. A /very/ short slide show here: 
 http://zope.stackless.com/StacklessEuroPy.ppt

 I'd kill to get that in D.

 
 Interesting, just you can't claim you were talking about that all along. 
 I mean, come on! You say one thing that was untenable, and now you come 
 up with a different one that is. And you were right all along?

I was actually talking about *two* things. If you remember, I was 
talking about measuring actually used stack depth. Then I got the idea 
of having separate stacks for threads. Two separate things in a post.

(Now that I think more about it, I must have heard about stackless 
Python somewhere. But as I've seen others do, this time I, too, believed 
I invented something that I've actually seen somewhere.)

 And about your comments on stack size, seems regular Python has an 
 in-built limit on recursion, at 1000 deep. That should be 
 diametrically opposite your stance on stack size.

 
 Python has apparently set a maximum to have a guarantee it won't crash 
 when given a certain minimum stack size. That's nice, but I'm not quite 
 seeing how that's opposite to what I said in this discussion.

Nope. It's because

  - They don't want the entire computer thrashing each time some idiot 
has a typo in his recursive algorithm.

  - They genuinely don't think you usually need preposterous levels of 
recursion. And if you do, then just change the limit.

Python is for Real Computers (as opposed to Windows). There it's polite 
to make your apps (here, the Python interpreter) have some decency as to 
how much resources they take. The idea is that there are hundreds of 
other users, and dozens of daemons running critical apps, round the clock.

If developers didn't know (by measurement or analysis) the reasonable 
stack size needed, there would be no mobile phones, no set-top boxes 
with hard disks, no GPS receivers, no digital PBX appliances, no solid 
state firewalls, no ADSL boxes, no programmable calculators......

Dammit, any pocket calculator that has parenthesis buttons needs a 
stack. And that stack space is predermined by the manufacturer.

I'm still maintaining that the stack space needs of regular applications 
are much less than people tend to think. Last time I checked, you ran 
out of stack on a name brand calculator with just ten levels of 
parentheses. That's not much. On the other hand, in real life, not too 
many people in the world have ever seen a calculator run out of stack space.

Apr 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 Georg Wrede wrote:
 I can't believe you brought it up as anything else but jesting 
 (maybe you are and I'm not getting it).


 Having looked into things, it turns out I'm the one that now suspects 
 you of jesting.


 
 Could it be that you're looking at this too much through the eyes of a 
 compiler developer? With souce code that can include templates, it 
 should be no surprise that a programmer can write a file which results 
 in preposterous stack needs. This is certainly the case with C++, for 
 example.

I am sorry, I need to fix a few mistakes in this post and then I'll have 
to leave this conversation as it becomes ungainful for us both.

The above is wrong, or at least contains a claim that you'd need to 
support better. Stack size requirements grow with the prevalence of 
short functions. Those are a style of programming not particularly 
correlated with templates. I don't have lots of small functions that 
call one another in templated code, any more than in regular code.

 What I referred to was inferring a thread's actual stack requirements 
 from one trace of its run. 

 
 Of course you'd give different input data, a little like you give 
 different input data to your unit tests.
 
 That's just untenable.

 
 Agreed, IF there's a possibility of recursion happening, the variation 
 in input data is not guaranteed, etc. But for practical purposes (read: 
 most of my programs, and presumably those of the other programmers'), 
 the situation is different.

This is utter nonsense. Practical vs. theoretical has absolutely nothing 
to do with this all. Forget recursion. I'm talking about straight "if"s 
that create different program traces that in turn require different 
stack sizes. Collecting one and saying it's fine "for practical 
purposes" is just nonsense. What's so complicated about this?

 For a particular application, we expect some explicit or implicit 
 constarints on the input data. We also know (er, expect) whether (and to 
 what extent) there will be recursion. Additionally (as with any 
 real-life application), we accept that if the input data goes wild, the 
 application /doesn't/ have to cope with it. (Making nuke proof apps 
 where reasonably good is sufficient, is just gratuituous work.)

Again, just to drive the point home - recursion has nothing to do with. 
Your inferences are starting from a faulty premise.

 For example, a robust application, instead of accepting and coping with 
 *anything*, instead has *documented limitations*. IMHO, that is the 
 right way to "fix infinity problems". Not demanding limitless stack for 
 all kinds of programs.
 
 Maybe I'll have to dig up prior art, or something. Clearly I'm not 
 qualified to properly defend the validity of this idea.

 There is prior art, by the truckload. A /very/ short slide show here: 
 http://zope.stackless.com/StacklessEuroPy.ppt

 I'd kill to get that in D.

 Interesting, just you can't claim you were talking about that all 
 along. I mean, come on! You say one thing that was untenable, and now 
 you come up with a different one that is. And you were right all along?

 
 I was actually talking about *two* things. If you remember, I was 
 talking about measuring actually used stack depth. Then I got the idea 
 of having separate stacks for threads. Two separate things in a post.

Threads already have separate stacks. (You might have said more.)

 (Now that I think more about it, I must have heard about stackless 
 Python somewhere. But as I've seen others do, this time I, too, believed 
 I invented something that I've actually seen somewhere.)

Fortran was stackless for... for a long time :o). An activation record 
in Fotran is in static memory. To call a Fortran function, you deposit 
arguments in that static storage, issue the CALL, and wait.

 And about your comments on stack size, seems regular Python has an 
 in-built limit on recursion, at 1000 deep. That should be 
 diametrically opposite your stance on stack size.

 Python has apparently set a maximum to have a guarantee it won't crash 
 when given a certain minimum stack size. That's nice, but I'm not 
 quite seeing how that's opposite to what I said in this discussion.

 
 Nope. It's because
 
  - They don't want the entire computer thrashing each time some idiot 
 has a typo in his recursive algorithm.
 
  - They genuinely don't think you usually need preposterous levels of 
 recursion. And if you do, then just change the limit.
 
 Python is for Real Computers (as opposed to Windows). There it's polite 
 to make your apps (here, the Python interpreter) have some decency as to 
 how much resources they take. The idea is that there are hundreds of 
 other users, and dozens of daemons running critical apps, round the clock.
 
 If developers didn't know (by measurement or analysis) the reasonable 
 stack size needed, there would be no mobile phones, no set-top boxes 
 with hard disks, no GPS receivers, no digital PBX appliances, no solid 
 state firewalls, no ADSL boxes, no programmable calculators......
 
 Dammit, any pocket calculator that has parenthesis buttons needs a 
 stack. And that stack space is predermined by the manufacturer.
 
 I'm still maintaining that the stack space needs of regular applications 
 are much less than people tend to think. Last time I checked, you ran 
 out of stack on a name brand calculator with just ten levels of 
 parentheses. That's not much. On the other hand, in real life, not too 
 many people in the world have ever seen a calculator run out of stack 
 space.

Sure, I'm not contending that. But that's a rather different claim than 
some others you made.


Andrei

Apr 30 2009

Georg Wrede <georg.wrede iki.fi> writes:

Andrei Alexandrescu wrote:
 I am sorry, I need to fix a few mistakes in this post and then I'll have 
 to leave this conversation as it becomes ungainful for us both.

Sigh.

Apr 30 2009

bearophile <bearophileHUGS lycos.com> writes:

Georg Wrede:
 And about your comments on stack size, seems regular Python has an 
 in-built limit on recursion, at 1000 deep. That should be diametrically 
 opposite your stance on stack size.

See also sys.setrecursionlimit() to lift that limit where useful. You can also
define how much deep to show the when an error occurs (to avoid flooding the
shell, for example).

Python is a high-level language, so naturally people expect it to be fit for
recursive algorithms, because they are sometimes shorter and more natural (when
they work on nested data structures). But the low stack limit (and the very low
performance of function calls) forces people to often manually translate
recursive algorithms into iterative ones. I hate this.

Bye,
bearophile

Apr 30 2009

Georg Wrede <georg.wrede iki.fi> writes:

bearophile wrote:
 Georg Wrede:
 And about your comments on stack size, seems regular Python has an
  in-built limit on recursion, at 1000 deep. That should be
 diametrically opposite your stance on stack size.

 
 See also sys.setrecursionlimit() to lift that limit where useful. You
 can also define how much deep to show the when an error occurs (to
 avoid flooding the shell, for example).
 
 Python is a high-level language, so naturally people expect it to be
 fit for recursive algorithms, because they are sometimes shorter and
 more natural (when they work on nested data structures). But the low
 stack limit (and the very low performance of function calls) forces
 people to often manually translate recursive algorithms into
 iterative ones. I hate this.

Python seems a nice language. It's a shame that it's a bit slow. Heh, 
that's good for D.

Apr 30 2009

Sean Kelly <sean invisibleduck.org> writes:

Georg Wrede wrote:
 Andrei Alexandrescu wrote:
 
 You can't have a stack smaller than 16KB as far as I understand. I 
 seem to recall the default stack size is much bigger.

 
 Totally not having researched this... The page 
 http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-o
cupied-process.html 
 
 states that "Most Linux systems have no stack limits set" (didn't have 
 time to find a definitive reference, yet).

One thing I don't understand is how the OS can manage essentially 
unbounded stack growth in a multithreaded program with a flat address 
space.  Is it simply that the physical memory for each stack is mapped 
into the same virtual address space when a context switch occurs?  And 
if so, how does this work with multiple cores?  The only alternative I 
can think of would be to dedicate a virtual address range to each stack, 
which would set a definite upper bound on the stack size for each 
thread, and wasted/unused memory between stacks.

Apr 29 2009

Georg Wrede <georg.wrede iki.fi> writes:

Sean Kelly wrote:
 Georg Wrede wrote:
 Andrei Alexandrescu wrote:

 You can't have a stack smaller than 16KB as far as I understand. I 
 seem to recall the default stack size is much bigger.

 Totally not having researched this... The page 
 http://www.unix.com/high-level-programming/34632-how-find-out-stack-size-o
cupied-process.html 

 states that "Most Linux systems have no stack limits set" (didn't have 
 time to find a definitive reference, yet).

 
 One thing I don't understand is how the OS can manage essentially 
 unbounded stack growth in a multithreaded program with a flat address 
 space.  Is it simply that the physical memory for each stack is mapped 
 into the same virtual address space when a context switch occurs?  And 
 if so, how does this work with multiple cores?  The only alternative I 
 can think of would be to dedicate a virtual address range to each stack, 
 which would set a definite upper bound on the stack size for each 
 thread, and wasted/unused memory between stacks.

Well, unbounded and unbounded. It just means there's no set value. The 
stack can grow as long as there is space. And the virtual address range 
is more than there's physical memory, so I guess they can say no limit.

Apr 29 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 Computing the stack depth even for non-recursive functions is an 
 interprocedural analysis so it's difficult to do modularly. The stack is 
 also unchecked so going with a "good enough" approximation is bound to 
 be risky.

No way in hell is there a reliable way for a compiler to estimate stack 
depth. Heck, just a virtual function call blows it.

The stack, however, is checked. The virtual memory system puts a "guard" 
page following the end of the stack, so running off the end produces an 
exception. If there is unallocated space beyond that, the operating 
system will allocate more pages, then resume from the exception.

Apr 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Some more notes about Phobos2. Some of the things I say may be wrong because my
experience with Phobos2 is limited still.

Daniel Keep:
 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

I'll try to zip two ranges that return the leaves of two different binary
trees. This can be a simple example to show to attract people to D2 language :-)

So I've tried using zip:

import std.range: zip;
import std.stdio: writeln;
void main() {
    auto a = [1, 2, 3];
    string[] b = ["a", "b", "c"];
    foreach (xy; zip(a, b))
        writeln(xy.at!(0), " ", xy.at!(1));
}

That doesn't work:
...\dmd\src\phobos\std\range.d(1847): Error: template instance
Zip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...)
if (R.length && allSatisfy!(isInputRange,R))

probably because 'a' is a static array. Is isInputRange false for static arrays?

------------

The most basic and useful range is the one of integer numbers. How can I create
with Phobos2 lazy and eager ranges like the following ones?

 range(1, 5)



[1, 2, 3, 4]
 range(5, 1, -1)



[5, 4, 3, 2]
 list(xrange(5, 10))



[5, 6, 7, 8, 9]
 list(xrange(5, 10, 2))



[5, 7, 9]

Similar ranges are useful with map,zip, and in many other situations.
(I hope the x..y range syntax of D2 foreach will evolve in a syntax that can be
used everywhere an integral range can be used).

------------

The docs say about "map":

Multiple functions can be passed to map. In that case, the element type of map
is a tuple containing one element for each function.<
 [...]
 foreach (e; map!("a + a", "a * a")(arr1))

Having the possibility to map two functions in parallel may be useful in some
situations (where for example computing the items is costly) but quite more
useful can be to map a function that takes two or more arguments. An example:

 map(lambda x, y: x*y, ["a", "b", "c"], xrange(1, 4))



['a', 'bb', 'ccc']

If you don't have that you are forced to use a zip inside the map, but then you
also are forced to change the D2 mapping function to something like:
(p){ return p.at!(0) * p.at!(1); }

----------

all() and any() functions are useful to test if all or any items of an iterable
are true (with optinal mapping function too). They are useful as templates too,
so I suggest to rename allSatisfy as "All" and "anySatisfy" as "Any".
A similar template named "Map" can be created if not already present, to map a
given template to a typetuple.

In future I'll probably write more notes, suggestions and questions about
Phobos2. I hope such notes are very useful.

Bye,
bearophile

Apr 28 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Some more notes about Phobos2. Some of the things I say may be wrong because my

experience with Phobos2 is limited still.
 Daniel Keep:
 Not true.  Here's an excellent reason to use ranges over opApply: you
 can't define zip with opApply.  Because opApply uses inversion of
 control, you can't use more than one without bringing threads into the
 equation.

 I'll try to zip two ranges that return the leaves of two different binary
trees.

This can be a simple example to show to attract people to D2 language :-)
 So I've tried using zip:
 import std.range: zip;
 import std.stdio: writeln;
 void main() {
     auto a = [1, 2, 3];
     string[] b = ["a", "b", "c"];
     foreach (xy; zip(a, b))
         writeln(xy.at!(0), " ", xy.at!(1));
 }
 That doesn't work:
 ...\dmd\src\phobos\std\range.d(1847): Error: template instance

Zip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...)
if
(R.length && allSatisfy!(isInputRange,R))
 probably because 'a' is a static array. Is isInputRange false for static
arrays?

Yes, because the range interface for arrays relies on std.array being imported
and
using extension method syntax.  This has bitten me a bunch of times, too.  The
functions in std.array take dynamic arrays and static arrays are not implicitly
convertible to dynamic arrays, except in the specific case of binding an array
literal to a variable.  Whether they should be, I don't know.
 ------------
 The most basic and useful range is the one of integer numbers. How can I create

with Phobos2 lazy and eager ranges like the following ones?
 range(1, 5)



 [1, 2, 3, 4]
 range(5, 1, -1)



 [5, 4, 3, 2]
 list(xrange(5, 10))



 [5, 6, 7, 8, 9]
 list(xrange(5, 10, 2))



 [5, 7, 9]
 Similar ranges are useful with map,zip, and in many other situations.
 (I hope the x..y range syntax of D2 foreach will evolve in a syntax that can be

used everywhere an integral range can be used).

I think you're looking for std.range.iota, although it doesn't work so well
right
now because of bug 2871.

Apr 28 2009

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

Whether they should be, I don't know.<

In my D1 dlibs most things digest static arrays too (but they don't let you
iterate on them or return them, that's a limit of the language). I don't know
D2 enough yet to give you a better answer, but I think it's better to remove
similar limits from the language. zip-ing two static arrays has to be a legit
operation.


I think you're looking for std.range.iota,<

Yes, sorry, I have found it few seconds after sending the post. I am dumb.


although it doesn't work so well right now because of bug 2871.<

And I have found the bug, of course... :-)
I can also see Denis Koroskin has suggested a small fix:
http://d.puremagic.com/issues/show_bug.cgi?id=2871


Regarding iota: it's better for the third argument of iota to defult to 1,
becasuse most of the times you want a step = 1.
In Python range() goes one step further: if you give just one argument it's
meant to be the second one and the first defaults to zero. In practice this
causes no problems because in Python you use range()/xrange() all the time. In
D I may accept iota to have 2-3 arguments because it will probably used much
less often.

Bye and thank you,
bearophile

Apr 28 2009

Steve Teale <steve.teale britseyeview.com> writes:

dsimcha Wrote:

 So I tried to create a struct to allow iteration over the keys or values of a
 builtin associative array using a lazy forward range.  This would allow one to
 pass the keys or values of an AA to any function that expects a regular old
 forward range w/o any heap activity, eager copying, etc. and with O(1)
 auxiliary memory usage.  Now that ranges are the lingua franca of iteration in
 D, this seems like a pretty necessary feature.
 
 The problem is that D's current AAs are based on a hash table with collision
 resolution done using binary trees.  I guess this decision was made to allow
 for O(log N) worst case performance even when there are ridiculous amounts of
 hash collisions.  However, this is a very bad decision because, in exchange
 for more bulletproof performance in corner cases, you lose:
 
 1.  Every element is now (void*).sizeof bytes larger because you have to store
 a left and right pointer.
 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
 in any way I'm aware of.  This means that a lazy range to iterate over the
 keys or values is relatively useless because it would generate heap activity
 anyhow to store the stack or queue necessary to iterate over the tree, so you
 may as well just use the current .keys or .values properties, which just
 return an array.  (Because of the way that ranges are designed, you can't just
 use the call stack, unless you use fibers or something, which is obviously not
 a win from an efficiency point of view.)
 
 Both of these are important in the more common, general case of sane hashing
 performance.

Ranges, ranges! That's all I hear these days, and it looks to me like the
continuing advance of D toward being a complete meta-language.

Where do I see ranges described in terms that an old hand can understand?

I'm constantly having to roll my own in many areas when I see how the meta
stuff is implemented - like x ~= c to add a character to the end of an array -
reallocation every time?

I thought D was supposed to be a systems programming language, not something
that was guaranteed to win the universe obfuscated code  competition.

I've been trying to keep my projects up to date and compilable with D2.0xx, but
I think I'm going to give up on that and rewrite them for whatever the current
version of D1 is.

I seriously think that the crew who are driving the development should realize
that not all computer programmers are going to have an IQ of 140, and that any
practical language should be comprehensible to a majority of its users.

Maybe the problem I'm complaining about is just a lack of documentation.
Generating said from comments really does not hack it - comments are always
skimped, and usually lie.

Before something like Ranges are implemented, there should be some sort of RFC
process where they are properly described rather than a reliance on D users to
have read every thread of the newsgroup, and remembered it all.

Apr 26 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Steve Teale (steve.teale britseyeview.com)'s article
 dsimcha Wrote:
 So I tried to create a struct to allow iteration over the keys or values of a
 builtin associative array using a lazy forward range.  This would allow one to
 pass the keys or values of an AA to any function that expects a regular old
 forward range w/o any heap activity, eager copying, etc. and with O(1)
 auxiliary memory usage.  Now that ranges are the lingua franca of iteration in
 D, this seems like a pretty necessary feature.

 The problem is that D's current AAs are based on a hash table with collision
 resolution done using binary trees.  I guess this decision was made to allow
 for O(log N) worst case performance even when there are ridiculous amounts of
 hash collisions.  However, this is a very bad decision because, in exchange
 for more bulletproof performance in corner cases, you lose:

 1.  Every element is now (void*).sizeof bytes larger because you have to store
 a left and right pointer.
 2.  You can't iterate over a binary tree in O(1) auxiliary space, at least not
 in any way I'm aware of.  This means that a lazy range to iterate over the
 keys or values is relatively useless because it would generate heap activity
 anyhow to store the stack or queue necessary to iterate over the tree, so you
 may as well just use the current .keys or .values properties, which just
 return an array.  (Because of the way that ranges are designed, you can't just
 use the call stack, unless you use fibers or something, which is obviously not
 a win from an efficiency point of view.)

 Both of these are important in the more common, general case of sane hashing
 performance.

 Ranges, ranges! That's all I hear these days, and it looks to me like the

continuing advance of D toward being a complete meta-language.
 Where do I see ranges described in terms that an old hand can understand?
 I'm constantly having to roll my own in many areas when I see how the meta
stuff

is implemented - like x ~= c to add a character to the end of an array -
reallocation every time?
 I thought D was supposed to be a systems programming language, not something

that was guaranteed to win the universe obfuscated code  competition.
 I've been trying to keep my projects up to date and compilable with D2.0xx, but

I think I'm going to give up on that and rewrite them for whatever the current
version of D1 is.
 I seriously think that the crew who are driving the development should realize

that not all computer programmers are going to have an IQ of 140, and that any
practical language should be comprehensible to a majority of its users.
 Maybe the problem I'm complaining about is just a lack of documentation.

Generating said from comments really does not hack it - comments are always
skimped, and usually lie.
 Before something like Ranges are implemented, there should be some sort of RFC

process where they are properly described rather than a reliance on D users to
have read every thread of the newsgroup, and remembered it all.

But there was an RFC, and it was even called that.  It just happened to take
place
on this newsgroup.  Also, keep in mind that D2 is still in alpha.  Worrying
about
how to explain this stuff to people who aren't heavily involved with D at this
point would slow down the pace of evolution and generate more confusion.  The
time
to do this is when the dust has settled a little.

Apr 26 2009

Leandro Lucarella <llucax gmail.com> writes:

Steve Teale, el 26 de abril a las 14:39 me escribiste:
 Before something like Ranges are implemented, there should be some sort
 of RFC process where they are properly described rather than a reliance
 on D users to have read every thread of the newsgroup, and remembered it
 all.

A PEP[1]-like process would be great.

[1] http://www.python.org/dev/peps/

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
<o_O> parakenotengobarraespaciadora
<o_O> aver
<o_O> estoyarreglandolabarraporkeserompiounapatita

Apr 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steve Teale wrote:
 Before something like Ranges are implemented, there should be some
 sort of RFC process where they are properly described rather than a
 reliance on D users to have read every thread of the newsgroup, and
 remembered it all.

There was. Incidentally, it was called "RFC on range design for D2".

I understand your frustration, but if you think for a minute, you 
realize your comments are uncalled for. We have discussed ranges at 
length and the community has had a long time to establish semantics and 
even names. I have (early, often, and repeatedly) warned the community 
that there will be a Phobos update that is bound to break a lot of code. 
I actively tried to massage several breaking changes into one single 
release so as to sweeten the pill. That all took a lot of time and 
thought from several people (Walter, Sean, and myself) who have better 
things to do. Now it would be great if you put yourself in our shoes, 
read your comments again, and tell me how they reflect on the writer.

You mention you want to see ranges described in terms that an old hand 
can understand. That's great. But that one good request was marred by 
comments that show you are more interested in reaffirming your 
prejudice, than in overcoming it.


Andrei

Apr 26 2009

Georg Wrede <georg.wrede iki.fi> writes:

Steve Teale wrote:
 Ranges, ranges! That's all I hear these days, and it looks to me like
 the continuing advance of D toward being a complete meta-language.
 
 Where do I see ranges described in terms that an old hand can
 understand?
 
 I'm constantly having to roll my own in many areas when I see how the
 meta stuff is implemented - like x ~= c to add a character to the end
 of an array - reallocation every time?
 
 I thought D was supposed to be a systems programming language, not
 something that was guaranteed to win the universe obfuscated code
 competition.
 
 I've been trying to keep my projects up to date and compilable with
 D2.0xx, but I think I'm going to give up on that and rewrite them for
 whatever the current version of D1 is.
 
 I seriously think that the crew who are driving the development
 should realize that not all computer programmers are going to have an
 IQ of 140, and that any practical language should be comprehensible
 to a majority of its users.

It will, be patient. See below.

 Maybe the problem I'm complaining about is just a lack of
 documentation. Generating said from comments really does not hack it
 - comments are always skimped, and usually lie.
 
 Before something like Ranges are implemented, there should be some
 sort of RFC process where they are properly described rather than a
 reliance on D users to have read every thread of the newsgroup, and
 remembered it all.

There's an illusion. And that illusion comes from the D newsgroups 
having "wrong" names.

The D2 newsgroup should have a name like "D2 discussion -- for D 
language development folks, enter at your own risk". And a *D1* 
newsgroup should then be for anybody who actually uses the language for 
something. Currently, actually, the D.learn newsgroup has sort-of 
assumed this functionality.

Being as it is (a de facto D2 development newsgroup), d.D will contain 
legthy discussions that meander and roam, possibly for months, before 
something crystallises, and the issue is settled (at least for the 
time). A lack of proper documentation belongs to this setup. (But then, 
one really has to congratulate Andrei, Phobos docs have *never* been as 
good as they are today!!!)


About ranges: I suspect that once D2 is solidified (and we start working 
on D3 :-) ), ranges will be easy to use, easy to understand, and 
practical. Incidentally, that same complaint could have been said about 
templates. It took more than a year to get the discussion settled down a 
bit, and today templates seem not impossible at all to begin using 
(unless you want to begin right with the hairy stuff). And earlier, it 
took two months of intense discussion here before Unicode, UTF and their 
relation to char, wchar and dchar got generally understood. Today you 
can program in D without much thinking about UTF, and things "just 
work". Even abroad.

But to sum it all, any project whose purpose is anything else than 
learning D2, should be written in D1.

Apr 27 2009

Derek Parnell <derek psych.ward> writes:

On Mon, 27 Apr 2009 11:32:30 +0300, Georg Wrede wrote:

 Phobos docs have *never* been as good as they are today!!!

Really?! I still find them very confusing. Maybe its just the formatting
but I often find it so hard to understand or find what I'm after that I
refer to the source code to work it out.

 About ranges: I suspect that once D2 is solidified (and we start working 
 on D3 :-) ), ranges will be easy to use, easy to understand, and 
 practical. Incidentally, that same complaint could have been said about 
 templates. It took more than a year to get the discussion settled down a 
 bit, and today templates seem not impossible at all to begin using 
 (unless you want to begin right with the hairy stuff).

It must be me, but I still find D2 templates as clear as mud. The syntax is
butt-ugly, counter intuitive, and reader-hostile, IMHO. I'm sure that C++
is worse, but that is not the issue.

I'll have to wait until somebody writes the "D Templates for Dummies" book.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Apr 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Mon, 27 Apr 2009 11:32:30 +0300, Georg Wrede wrote:
 
 Phobos docs have *never* been as good as they are today!!!

 
 Really?! I still find them very confusing. Maybe its just the formatting
 but I often find it so hard to understand or find what I'm after that I
 refer to the source code to work it out.

Suggestions are always welcome.

Andrei

Apr 27 2009

bearophile <bearophileHUGS lycos.com> writes:

Derek Parnell:
 [...] I still find D2 templates as clear as mud. The syntax is
 butt-ugly, counter intuitive, and reader-hostile, IMHO.

I find templates in D1 easy to use in most situations. And template constraints
of D2 too are easy.
What kind of syntax/semantics do you suggest to improve the current situation?
If you have ideas I am interested.

Bye,
bearophile

Apr 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Georg Wrede wrote:
 There's an illusion. And that illusion comes from the D newsgroups 
 having "wrong" names.
 
 The D2 newsgroup should have a name like "D2 discussion -- for D 
 language development folks, enter at your own risk". And a *D1* 
 newsgroup should then be for anybody who actually uses the language for 
 something. Currently, actually, the D.learn newsgroup has sort-of 
 assumed this functionality.

Yes, well put. I think it would be great to define a digitalmars.d2 or 
digitalmars.d2-design newsgroup.

Andrei

Apr 27 2009

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

Andrei Alexandrescu wrote:

 Georg Wrede wrote:
 There's an illusion. And that illusion comes from the D newsgroups  
 having "wrong" names.
  The D2 newsgroup should have a name like "D2 discussion -- for D  
 language development folks, enter at your own risk". And a *D1*  
 newsgroup should then be for anybody who actually uses the language for  
 something. Currently, actually, the D.learn newsgroup has sort-of  
 assumed this functionality.

 Yes, well put. I think it would be great to define a digitalmars.d2 or  
 digitalmars.d2-design newsgroup.

And what would we do the day D987 came out? Start another group called
digitalmars.d987-design? Or just create one called digitalmars.d.design,
and use that no matter the version of D being discussed?

--
 Simen

Apr 27 2009

Christopher Wright <dhasenan gmail.com> writes:

Simen Kjaeraas wrote:
 Andrei Alexandrescu wrote:
 
 Georg Wrede wrote:
 There's an illusion. And that illusion comes from the D newsgroups 
 having "wrong" names.
  The D2 newsgroup should have a name like "D2 discussion -- for D 
 language development folks, enter at your own risk". And a *D1* 
 newsgroup should then be for anybody who actually uses the language 
 for something. Currently, actually, the D.learn newsgroup has sort-of 
 assumed this functionality.

 Yes, well put. I think it would be great to define a digitalmars.d2 or 
 digitalmars.d2-design newsgroup.

 
 And what would we do the day D987 came out? Start another group called
 digitalmars.d987-design? Or just create one called digitalmars.d.design,
 and use that no matter the version of D being discussed?
 
 -- 
 Simen

How about digitalmars.dnext?

Apr 27 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 28 Apr 2009 01:38:27 +0400, Christopher Wright <dhasenan gmail.com>
wrote:

 Simen Kjaeraas wrote:
 Andrei Alexandrescu wrote:

 Georg Wrede wrote:
 There's an illusion. And that illusion comes from the D newsgroups  
 having "wrong" names.
  The D2 newsgroup should have a name like "D2 discussion -- for D  
 language development folks, enter at your own risk". And a *D1*  
 newsgroup should then be for anybody who actually uses the language  
 for something. Currently, actually, the D.learn newsgroup has sort-of  
 assumed this functionality.

 Yes, well put. I think it would be great to define a digitalmars.d2 or  
 digitalmars.d2-design newsgroup.

  And what would we do the day D987 came out? Start another group called
 digitalmars.d987-design? Or just create one called digitalmars.d.design,
 and use that no matter the version of D being discussed?
  -- Simen

 How about digitalmars.dnext?

Maybe digitalmars.future? :)

Apr 27 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Yet another strike against the current AA implementation