www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC, ensureHeaped

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
I just recently helped someone with an issue of saving an array to stack  
data beyond the existence of that stack frame.  However, the error was one  
level deep, something like this:

int[] globalargs;

void foo(int[] args...)
{
    globalargs = args;
}

void bar()
{
    foo(1,2,3); // passes stack data to foo.
}

One thing I suggested is, you have to dup args.  But what if you call it  
like this?

void bar()
{
    foo([1,2,3]);
}

Then you just wasted time duping that argument.  Instead of a defensive  
dup, what if we had a function ensureHeaped (better name suggestions?)  
that ensured the data was on the heap?  If it wasn't, it dups the original  
onto the heap.  It would be less expensive than a dup when the data is  
already on the heap, but probably only slightly more expensive than a  
straight dup when the data isn't on the heap.

Would such a function make sense or be useful?

-Steve
Nov 12 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/12/10 5:10 AM, Steven Schveighoffer wrote:
 I just recently helped someone with an issue of saving an array to stack
 data beyond the existence of that stack frame. However, the error was
 one level deep, something like this:

 int[] globalargs;

 void foo(int[] args...)
 {
 globalargs = args;
 }

 void bar()
 {
 foo(1,2,3); // passes stack data to foo.
 }

 One thing I suggested is, you have to dup args. But what if you call it
 like this?

 void bar()
 {
 foo([1,2,3]);
 }

 Then you just wasted time duping that argument. Instead of a defensive
 dup, what if we had a function ensureHeaped (better name suggestions?)
 that ensured the data was on the heap? If it wasn't, it dups the
 original onto the heap. It would be less expensive than a dup when the
 data is already on the heap, but probably only slightly more expensive
 than a straight dup when the data isn't on the heap.

 Would such a function make sense or be useful?

 -Steve

Sounds good, but if we offer it we should also define the primitive isOnStack() or something. Andrei
Nov 12 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 Then you just wasted time duping that argument.  Instead of a defensive  
 dup, what if we had a function ensureHeaped (better name suggestions?)  

I have created a bug report to avoid this whole pair of threads to be lost in the dusts of time: http://d.puremagic.com/issues/show_bug.cgi?id=5212 Feel free to add a note about your ensureHeaped() idea at the end of that enhancement request :-) (To that enhancement request I have not added my idea of the onheap attribute because I think it's too much complex to implement according to the design style of the D compiler). Bye, bearophile
Nov 13 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Sat, 13 Nov 2010 13:19:25 -0500
bearophile <bearophileHUGS lycos.com> wrote:

 Steven Schveighoffer:
=20
 Then you just wasted time duping that argument.  Instead of a defensive=


 dup, what if we had a function ensureHeaped (better name suggestions?) =


=20
 I have created a bug report to avoid this whole pair of threads to be los=

 http://d.puremagic.com/issues/show_bug.cgi?id=3D5212
=20
 Feel free to add a note about your ensureHeaped() idea at the end of that=

=20
 (To that enhancement request I have not added my idea of the  onheap attr=

esign style of the D compiler).
=20
 Bye,
 bearophile

I was the one bitten by the bug. I think it's really a naughty feature, was= about to create a bug entry when saw Bearophile's post. In my opinion, if void f(int[] ints) {doWhateverWith(ints);} works, then void f(int[] ints...) {doWhateverWith(ints);} must just work as well. I consider variadic args as just syntactic honey for clients of a func, typ= e, lib. There should be no visible semantic difference, even less bugs (and= certainly not segfault when the code does not manually play with memory!).= But I may have wrong expectations about this feature in D, due to how vari= adics work in other languages I have used. It was hard to debug even with the help of 3 experienced D programmers. Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 13 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 13 Nov 2010 16:09:32 -0500, spir <denis.spir gmail.com> wrote:

 On Sat, 13 Nov 2010 13:19:25 -0500
 bearophile <bearophileHUGS lycos.com> wrote:

 Steven Schveighoffer:

 Then you just wasted time duping that argument.  Instead of a  

 dup, what if we had a function ensureHeaped (better name suggestions?)

I have created a bug report to avoid this whole pair of threads to be lost in the dusts of time: http://d.puremagic.com/issues/show_bug.cgi?id=5212 Feel free to add a note about your ensureHeaped() idea at the end of that enhancement request :-) (To that enhancement request I have not added my idea of the onheap attribute because I think it's too much complex to implement according to the design style of the D compiler). Bye, bearophile

I was the one bitten by the bug. I think it's really a naughty feature, was about to create a bug entry when saw Bearophile's post. In my opinion, if void f(int[] ints) {doWhateverWith(ints);} works, then void f(int[] ints...) {doWhateverWith(ints);} must just work as well.

I don't really agree. The ... version is optimized so you can pass typesafe variadic args. If the compiler would generate a heap allocation for that, then you may be wasting a lot of heap allocations. One thing you may learn in D is that heap allocation == crappy performance. The less you allocate the faster your code gets. It's one of the main reasons Tango is so damned fast. To have the language continually working against that goal is going to great for inexperienced programmers but hell for people trying to squeeze performance out of it. I think what we need however, is a way to specify intentions inside the function. If you intend to escape this data, then the runtime/compiler should make it easy to avoid re-duping something.
 I consider variadic args as just syntactic honey for clients of a func,  
 type, lib. There should be no visible semantic difference, even less  
 bugs (and certainly not segfault when the code does not manually play  
 with memory!). But I may have wrong expectations about this feature in  
 D, due to how variadics work in other languages I have used.

 It was hard to debug even with the help of 3 experienced D programmers.

To be fair, it was easy to spot when you gave us the pertinent code :) -Steve
Nov 15 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

To have the language continually working against that goal is going to great
for inexperienced programmers but hell for people trying to squeeze performance
out of it.<

The experienced programmers may write "scope int[] a...", and have no heap allocations. All the other people need first of all a program that doesn't contain hard to spot bugs, and a fast progam then. Such people don't stick the "scope" there, so in this case the compiler performs the test you were talking about, if it's on the heap it doesn't copy it and takes a slice of it, otherwise if the data was on the heap it dups it.
I think what we need however, is a way to specify intentions inside the
function.  If you intend to escape this data, then the runtime/compiler should
make it easy to avoid re-duping something.<

This is like for the "automatic" closures. The right design in a modern language is to use the safer strategy on default, and the less safe on request. If you want, a new compiler switch may be added that lists all the spots in the code where a closure or hidden heap allocation occurs, useful for performance tuning (an idea by Denis Koroskin): http://d.puremagic.com/issues/show_bug.cgi?id=5070 I have even suggested a transitive noheap annotation, similar to nothrow, that makes sure a function contains no heap allocations and doesn't call other things that perform heap allocations: http://d.puremagic.com/issues/show_bug.cgi?id=5219 The proliferation of function attributes produces "interesting" results: noheap safe nothrow pure real sin(in real x) { ... }
To be fair, it was easy to spot when you gave us the pertinent code :)

I didn't even know/remember that that array data is on the stack. That error will give bad surprises to some D newbies that are not fluent in C. It's a problem of perception: typesafe variadic arguments don't look like normal function arguments that you know are usually on the stack, they look like dynamic arrays, and in D most dynamic arrays are allocated on the heap (it's easy and useful to take a dynamic-array-slice of a stack allocated array, but in this case the code shows that the slice doesn't contain heap data). If your function has a signature similar to this one: void foo(int[3] arr...) { It's not too much hard to think that 'arr' is on the stack. But dynamic arrays don't give that image: void foo(int[] arr...) { This is why I think it's better for the compiler to test if the arr data is on the stack, and dup it otherwise (unless a 'scope' is present, in this case both the test and allocation aren't present). Bye, bearophile
Nov 15 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 The experienced programmers may write "scope int[] a...", and have no  
 heap allocations.

This is a good idea. This isn't what I thought spir was saying, I thought he wanted the function to always allocate.

I have also suggested that when "scope" is not present, DMD may automatically add a runtime test similar to the one done by ensureHeaped and dup the array data only if it's on the stack. So even when you don't use "scope" it doesn't always copy.
 The only issue I see here is that scope should really be the default,  
 because that is what you want most of the time.

If the variadics is in the signature of a free function then I agree with you. But if it's inside the this() of a class, then often you want that data on the heap.
 However, the compiler  
 cannot prove that the data doesn't escape so it can't really enforce that  
 as the default.

If you look at my original answer I have suggested something like heaped, that's attached to an array and makes sure its data is on the heap. This is done with a mix (when possible) of static analysis and runtime tests (in the other cases). But I have not added this idea to the enhancement request of scoped variadics because it looks too much hard to implement in D/DMD.
 I believe the compiler cannot really be made to enforce that all passed-in  
 data will be heap-allocated when passed to foo.  A runtime check would be  
 a very good safety net.

Static analysis is able to do this and more, but it requires some logic added to the compiler (and such logic probably doesn't work in all cases).
 I have even suggested a transitive  noheap annotation, similar to  
  nothrow, that makes sure a function contains no heap allocations and  
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting" results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Indeed, it's a lot, and I am not sure it's a good idea. I have had this idea reading one or two articles written by people that write high performance games in C++. They need to keep the frame rate constantly higher than a minimum, like 30/s or 60/s. To do this they have to avoid C heap allocations inside certain large loops (D GC heap allocations may be even worse). Using noheap is a burden, but it may help you write code with a more deterministic performance. Maybe someday it will be possible to implement noheap with user-defined attributes plus static reflection, in D. But then the standard library will not use that user-defined noheap attribute, making it not so useful. So if you want it to be transitive, Phobos needs to be aware of it. Bye, bearophile
Nov 16 2010
parent Johann MacDonagh <johann.macdonagh..no spam..gmail.com> writes:
On 11/16/2010 12:58 PM, bearophile wrote:
 Steven Schveighoffer:

 The experienced programmers may write "scope int[] a...", and have no
 heap allocations.

This is a good idea. This isn't what I thought spir was saying, I thought he wanted the function to always allocate.

I have also suggested that when "scope" is not present, DMD may automatically add a runtime test similar to the one done by ensureHeaped and dup the array data only if it's on the stack. So even when you don't use "scope" it doesn't always copy.
 The only issue I see here is that scope should really be the default,
 because that is what you want most of the time.

If the variadics is in the signature of a free function then I agree with you. But if it's inside the this() of a class, then often you want that data on the heap.
 However, the compiler
 cannot prove that the data doesn't escape so it can't really enforce that
 as the default.

If you look at my original answer I have suggested something like heaped, that's attached to an array and makes sure its data is on the heap. This is done with a mix (when possible) of static analysis and runtime tests (in the other cases). But I have not added this idea to the enhancement request of scoped variadics because it looks too much hard to implement in D/DMD.
 I believe the compiler cannot really be made to enforce that all passed-in
 data will be heap-allocated when passed to foo.  A runtime check would be
 a very good safety net.

Static analysis is able to do this and more, but it requires some logic added to the compiler (and such logic probably doesn't work in all cases).
 I have even suggested a transitive  noheap annotation, similar to
  nothrow, that makes sure a function contains no heap allocations and
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting" results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Indeed, it's a lot, and I am not sure it's a good idea. I have had this idea reading one or two articles written by people that write high performance games in C++. They need to keep the frame rate constantly higher than a minimum, like 30/s or 60/s. To do this they have to avoid C heap allocations inside certain large loops (D GC heap allocations may be even worse). Using noheap is a burden, but it may help you write code with a more deterministic performance. Maybe someday it will be possible to implement noheap with user-defined attributes plus static reflection, in D. But then the standard library will not use that user-defined noheap attribute, making it not so useful. So if you want it to be transitive, Phobos needs to be aware of it. Bye, bearophile

I'm for the "safe by default, you have to work to be unsafe". In this case, the compiler should have noticed the data was being escaped and passed 1,2,3 as a heap (or perhaps auto-duped at the point it was assigned). It does this kind of thing when you have a closure (nested delegate). The variadic syntax is confusing (although arguably probably the best way to deal with it). An average developer expects assignments to always be safe. Assigning a function scope static array to a global static or dynamic array results in either copying or a heap allocation / copying. There's no need to "think about it" as you would in C. You simply assign and it works. Although I do agree some kind of compiler switch that tells you when there hidden heap allocations would be nice for performance tuning. - Johann
Nov 21 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/10 4:40 AM, Steven Schveighoffer wrote:
 On Mon, 15 Nov 2010 17:02:27 -0500, bearophile
 <bearophileHUGS lycos.com> wrote:
 I have even suggested a transitive  noheap annotation, similar to
  nothrow, that makes sure a function contains no heap allocations and
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting" results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Hm, interestingly a data qualifier noheap would not need to be transitive as data on the stack may refer to data on the heap. Andrei
Nov 16 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 Pure is hard enough to deal with (especially since it we probably have made it 
 the default, but it's too late for that now).

Weakly pure on default isn't good for a language that is supposed to b e somewhat compatible with C syntax, I think it breaks too many C functions. Bye, bearophile
Nov 16 2010
parent reply Rainer Deyke <rainerd eldwood.com> writes:
On 11/16/2010 21:53, Steven Schveighoffer wrote:
 It makes me think that this is going to be extremely confusing for a
 while, because people are so used to pure being equated with a
 functional language, so when they see a function is pure but takes
 mutable data, they will be scratching their heads.  It would be awesome
 to make weakly pure the default, and it would also make it so we have to
 change much less code.

Making functions weakly pure by default means that temporarily adding a tiny debug printf to any function will require a shitload of cascading 'impure' annotations. I would consider that completely unacceptable. (Unless, of course, purity is detected automatically without the use of annotations at all.) -- Rainer Deyke - rainerd eldwood.com
Nov 16 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

It makes me think that this is going to be extremely confusing for a while,
because people are so used to pure being equated with a functional language, so
when they see a function is pure but takes mutable data, they will be
scratching their heads.<

I agree, it's a (small) problem. Originally 'pure' in D was closer to the correct definition of purity. Then its semantics was changed and it was not replaced by strongpure/ weakpure annotations, so there is now a bit of semantic mismatch. ------------------------ Rainer Deyke:
 Making functions weakly pure by default means that temporarily adding a
 tiny debug printf to any function will require a shitload of cascading
 'impure' annotations.  I would consider that completely unacceptable.

To face this problem I have proposed a pureprintf() function (or purewriteln), that's a kind of alias of printf (or writeln), the only differences between pureprintf() and printf() are the name and D seeing the first one as strongly pure. The pureprintf() is meant only for *unreliable* debug prints, not for the normal program console output. ------------------------ spir:
Output in general, programmer feedback in particuliar, should simply not be
considered effect.

You are very wrong.
 The following is imo purely referentially transparent and effect-free (where
effect
 means changing state); it always executes the same way, produces the same
result,
 and never influences later processes else as via said result:
 
 uint square(uint n) {
     uint sq = n*n;
     writefln("%s^2 = %s", n, sq);
     return sq;
 }

If we replace that function signature with this (assuming writefln is considered pure): pure uint square(uint n) { ... Then the following code will print one or two times according to how much optimizations the compiler is performing: void main() { uint x = square(10) + square(10); } Generally in DMD if you compile with -O you will see only one print. If you replace the signature with this one: pure double square(double n) { ... You will see two prints. In general the compiler is able to replace two calls with same arguments to a strongly pure function with a single call. DMD doesn't do it on floating point numbers to respect its not-optimization FP rules, but LDC doesn't respect them if you use the -enable-unsafe-fp-math compiler switch, so if you use -enable-unsafe-fp-math you will probably see only one print. Generally if the compiler sees code like: uint x = foo(x) + bar(x); And both foo and bar are strongly pure, the compiler must be free to call them in any order it likes, because they are side-effects-free. So normal printing functions can't be allowed inside pure functions, because printing is a big side effect (even memory allocation is a side effect, because I may cast the dynamic array pointer to size_t and then use this number. Even exceptions are a side effect, but probably they give less troubles than printing). I have suggested the pureprintf() that allows the user to remember its printing will be unreable (printing may appear or disappear according to compiler used, optimization levels, day of the week). Bye, bearophile
Nov 17 2010
prev sibling parent Rainer Deyke <rainerd eldwood.com> writes:
On 11/17/2010 05:10, spir wrote:
 Output in general, programmer feedback in particuliar, should simply
 not be considered effect. It is transitory change to dedicated areas
 of memory -- not state. Isn't this the sense of "output", after all?

My debug output actually goes through my logging library which, among other things, maintains a list of log messages in memory. If this is considered "pure", then we might as well strip "pure" from the language, because it has lost all meaning. -- Rainer Deyke - rainerd eldwood.com
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 15 Nov 2010 17:02:27 -0500, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

 To have the language continually working against that goal is going to  
 great for inexperienced programmers but hell for people trying to  
 squeeze performance out of it.<

The experienced programmers may write "scope int[] a...", and have no heap allocations.

This is a good idea. This isn't what I thought spir was saying, I thought he wanted the function to always allocate. At first glance, I thought your idea might be bad, because duping an array decouples it from the original, but then I realized -- there *is* no original. This is the only reference to that data, so you can't change any expectations. The only issue I see here is that scope should really be the default, because that is what you want most of the time. However, the compiler cannot prove that the data doesn't escape so it can't really enforce that as the default. I have the same issue with closures (the compiler is too eager to allocate closures because it is too conservative). But I don't know how this can be fixed without redesigning the compilation model.
 I think what we need however, is a way to specify intentions inside the  
 function.  If you intend to escape this data, then the runtime/compiler  
 should make it easy to avoid re-duping something.<

This is like for the "automatic" closures. The right design in a modern language is to use the safer strategy on default, and the less safe on request.

This is not always possible, I still see a good need for ensuring heaped data. For example: int[] y; foo(int[] x...) { y = ensureHeaped(x); } bar(int[] x) { foo(x); } baz() { int[3] x; bar(x); } I believe the compiler cannot really be made to enforce that all passed-in data will be heap-allocated when passed to foo. A runtime check would be a very good safety net.
 If you want, a new compiler switch may be added that lists all the spots  
 in the code where a closure or hidden heap allocation occurs, useful for  
 performance tuning (an idea by Denis Koroskin):
 http://d.puremagic.com/issues/show_bug.cgi?id=5070

Also a good idea.
 I have even suggested a transitive  noheap annotation, similar to  
  nothrow, that makes sure a function contains no heap allocations and  
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting" results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.
 To be fair, it was easy to spot when you gave us the pertinent code :)

I didn't even know/remember that that array data is on the stack. That error will give bad surprises to some D newbies that are not fluent in C.

I didn't know until about a month and a half ago (in dcollections, this bug was prominent in all the array-based classes). Only after inspecting the disassembly did I realize. I agree we need some sort of protection or alert for this -- it's too simple to make this mistake. -Steve
Nov 16 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 16 Nov 2010 13:04:32 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 11/16/10 4:40 AM, Steven Schveighoffer wrote:
 On Mon, 15 Nov 2010 17:02:27 -0500, bearophile
 <bearophileHUGS lycos.com> wrote:
 I have even suggested a transitive  noheap annotation, similar to
  nothrow, that makes sure a function contains no heap allocations and
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting"  
 results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Hm, interestingly a data qualifier noheap would not need to be transitive as data on the stack may refer to data on the heap.

I think he means transitive the same way pure is transitive. Not sure what the term would be, functionally transitive? in other words, if your function is marked noheap, it cannot allocate any memory, which means it cannot call any *other* functions that allocate memory. -Steve
Nov 16 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 16, 2010 10:31:43 Steven Schveighoffer wrote:
 On Tue, 16 Nov 2010 13:04:32 -0500, Andrei Alexandrescu
 
 <SeeWebsiteForEmail erdani.org> wrote:
 On 11/16/10 4:40 AM, Steven Schveighoffer wrote:
 On Mon, 15 Nov 2010 17:02:27 -0500, bearophile
 
 <bearophileHUGS lycos.com> wrote:
 I have even suggested a transitive  noheap annotation, similar to
  nothrow, that makes sure a function contains no heap allocations and
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting"
 results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Hm, interestingly a data qualifier noheap would not need to be transitive as data on the stack may refer to data on the heap.

I think he means transitive the same way pure is transitive. Not sure what the term would be, functionally transitive? in other words, if your function is marked noheap, it cannot allocate any memory, which means it cannot call any *other* functions that allocate memory.

Pure is hard enough to deal with (especially since it we probably have made it the default, but it's too late for that now). We shouldn't even consider adding anything more like that without a _really_ good reason. - Jonathan M Davis
Nov 16 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 16 November 2010 12:37:10 bearophile wrote:
 Jonathan M Davis:
 Pure is hard enough to deal with (especially since it we probably have
 made it the default, but it's too late for that now).

Weakly pure on default isn't good for a language that is supposed to b e somewhat compatible with C syntax, I think it breaks too many C functions.

Well, like I said, it's too late at this point, and really, it would be good to have a nice way to deal with C functions and purity (particularly since most of them are pure anyway), but the result at present is that most functions should be marked with pure. And if you're marking more functions with pure than not, that would imply that the default should be (at least ideally) impure. Regardless, however, it's not reasonable for D to go for impure rather than pure at this point. - Jonathan M Davis
Nov 16 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 16 Nov 2010 16:04:18 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday 16 November 2010 12:37:10 bearophile wrote:
 Jonathan M Davis:
 Pure is hard enough to deal with (especially since it we probably have
 made it the default, but it's too late for that now).

Weakly pure on default isn't good for a language that is supposed to b e somewhat compatible with C syntax, I think it breaks too many C functions.

Well, like I said, it's too late at this point, and really, it would be good to have a nice way to deal with C functions and purity (particularly since most of them are pure anyway), but the result at present is that most functions should be marked with pure. And if you're marking more functions with pure than not, that would imply that the default should be (at least ideally) impure. Regardless, however, it's not reasonable for D to go for impure rather than pure at this point.

everything you are saying seems to be backwards, stop it! ;) 1. currently, the default is impure. 2. Most functions will naturally be weakly pure, so making *pure* the default would seem more useful. It seems backwards to me to think pure functions should be the default, I mean, this isn't a functional language! But you also have to forget everything you know about pure, because a weakly pure function is a very useful idiom, and it is most certainly not compatible with functional languages. It's both imperative and can accept and return mutable data. It makes me think that this is going to be extremely confusing for a while, because people are so used to pure being equated with a functional language, so when they see a function is pure but takes mutable data, they will be scratching their heads. It would be awesome to make weakly pure the default, and it would also make it so we have to change much less code. -Steve
Nov 16 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 16 November 2010 20:53:04 Steven Schveighoffer wrote:
 On Tue, 16 Nov 2010 16:04:18 -0500, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 On Tuesday 16 November 2010 12:37:10 bearophile wrote:
 Jonathan M Davis:
 Pure is hard enough to deal with (especially since it we probably have
 made it the default, but it's too late for that now).

Weakly pure on default isn't good for a language that is supposed to b e somewhat compatible with C syntax, I think it breaks too many C functions.

Well, like I said, it's too late at this point, and really, it would be good to have a nice way to deal with C functions and purity (particularly since most of them are pure anyway), but the result at present is that most functions should be marked with pure. And if you're marking more functions with pure than not, that would imply that the default should be (at least ideally) impure. Regardless, however, it's not reasonable for D to go for impure rather than pure at this point.

everything you are saying seems to be backwards, stop it! ;) 1. currently, the default is impure. 2. Most functions will naturally be weakly pure, so making *pure* the default would seem more useful. It seems backwards to me to think pure functions should be the default, I mean, this isn't a functional language! But you also have to forget everything you know about pure, because a weakly pure function is a very useful idiom, and it is most certainly not compatible with functional languages. It's both imperative and can accept and return mutable data. It makes me think that this is going to be extremely confusing for a while, because people are so used to pure being equated with a functional language, so when they see a function is pure but takes mutable data, they will be scratching their heads. It would be awesome to make weakly pure the default, and it would also make it so we have to change much less code.

II was not trying to separate out weakly pure and strongly pure. pure is pure as far as marking the functions go. Whether that purity strong or weak depends on the parameters. And since most functions should at least be weakly pure, you end up marking most functions with pure. Ideally, you'd be marking functions for the uncommon case rather than the common one. I do think that a serious downside to using pure to mark weak purity is that it's pretty much going to bury the difference. You're not using global variables, so you mark the function as pure. Whether it's actually strongly pure and thus the compiler can optimize it is then an optimization detail (though you can of course figure it out if you want to). I expect that that's pretty much what the situation is going to end up being. Of course, the fact that C functions aren't marked as pure (even though in most cases they are) tends to put a damper on things, and the fact that you have to create multiple versions of the same function in different static if blocks when the purity depends on a templated function or type that the function is using also puts a major damper on things. However, the overall trend will likely be to mark next to everything as pure. It would certainly be cool to have weakly pure be the default, but that would require adding impure or something similar for cases where a function can't even be weakly pure. I would think that ideally, you'd make the default weakly pure, have impure (or something similar) to mark functions which can't even be weakly pure, and have full-on pure just be detected by the compiler (since it should be able to do that if weakly pure is the default). pure could be dropped entirely, or you could keep it to enforce that a function actually be strongly pure, forcing you to change the function if something changes to make it only weakly pure or outright impure. But I don't see that change having any chance of being made. Even if you kept pure and made it strongly pure only, adding impure (or whatever you wanted to call it) would mean adding a keyword, which always seems to go over badly around here. It would also mean changing a fair bit of code (mostly due to stdin, stdout, and C functions I expect). I think that it would be ultimately worth it though, as long as we were willing to pay the pain up front. - Jonathan M Davis
Nov 16 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday 16 November 2010 23:03:05 Rainer Deyke wrote:
 On 11/16/2010 21:53, Steven Schveighoffer wrote:
 It makes me think that this is going to be extremely confusing for a
 while, because people are so used to pure being equated with a
 functional language, so when they see a function is pure but takes
 mutable data, they will be scratching their heads.  It would be awesome
 to make weakly pure the default, and it would also make it so we have to
 change much less code.

Making functions weakly pure by default means that temporarily adding a tiny debug printf to any function will require a shitload of cascading 'impure' annotations. I would consider that completely unacceptable. (Unless, of course, purity is detected automatically without the use of annotations at all.)

It has already been argued that I/O should be exempt (at least for debugging purposes), and I think that that would could be acceptable for weakly pure functions. But it's certainly true that as it stands, dealing with I/O and purity doesn't work very well. And since you have to try and mark as much as possible pure (to make it weakly pure at least) if you want much hope of being able to have much of anything be strongly pure, it doesn't take long before you can't actually have I/O much of anywhere - even for debugging. It's definitely a problem. - Jonathan M Davis
Nov 16 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Wed, 17 Nov 2010 00:03:05 -0700
Rainer Deyke <rainerd eldwood.com> wrote:

 Making functions weakly pure by default means that temporarily adding a
 tiny debug printf to any function will require a shitload of cascading
 'impure' annotations.  I would consider that completely unacceptable.

Output in general, programmer feedback in particuliar, should simply not be= considered effect. It is transitory change to dedicated areas of memory --= not state. Isn't this the sense of "output", after all? (One cannot read i= t back, thus it has no consequence on future process.) The following is imo= purely referentially transparent and effect-free (where effect means chang= ing state); it always executes the same way, produces the same result, and = never influences later processes else as via said result: uint square(uint n) { uint sq =3D n*n; writefln("%s^2 =3D %s", n, sq); return sq; } Sure, the physical machine's state has changed, but it's not the same machi= ne (state) as the one the program runs on (as the one the program can play = with). There is some bizarre confusion. [IMO, FP's notion of purity is at best improper for imperative programming = (& at worst requires complicated hacks for using FP itself). We need to fin= d our own way to make programs easier to understand and reason about.] Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 17 2010
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On Tue, 16 Nov 2010 23:28:37 -0800
Jonathan M Davis <jmdavisProg gmx.com> wrote:

 It has already been argued that I/O should be exempt (at least for debugg=

 purposes), and I think that that would could be acceptable for weakly pur=

 functions. But it's certainly true that as it stands, dealing with I/O an=

 purity doesn't work very well. And since you have to try and mark as much=

 possible pure (to make it weakly pure at least) if you want much hope of =

 able to have much of anything be strongly pure, it doesn't take long befo=

 can't actually have I/O much of anywhere - even for debugging. It's defin=

 problem.

(See also my previous post on this thread). What we are missing is a clear notion of program state, distinct from physi= cal machine. A non-referentially transparent function is one that reads fro= m this state; between 2 runs of the function, this state may have been chan= ged by the program itself, so that execution is influenced. Conversely, an = effect-ive function is one that changes state; such a change may influence = parts of the program that read it, including possibly itself. This true program state is not the physical machine's one. Ideally, there w= ould be in the core language's organisation a clear definition of what stat= e is -- it could be called "state", or "world". An approximation in super s= imple imperative languages is the set of global variables. (Output does not= write onto globals -- considering writing onto video port or memory state = change is close to nonsense ;-) In pure OO, this is more or less the set of= objects / object fields. (A func that does not affect any object field is = effect-free.) State is something the program can read (back); all the rest, such as writi= ng to unreachable parts of memory like for output, cannot have any conseque= nce on future process (*). I'm still far to be clear on this topic; as of n= ow, I think only assignments to state, as so defined, should be considered = effects. This would lead to a far more practicle notion of "purity", I guess, esp fo= r imperative and/or OO programming. Denis (*) Except possibly when using low level direct access to (pseudo) memory a= ddresses. Even then, one cannot read plain output ports, or write to plain = input ports, for instance. -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 02:03:05 -0500, Rainer Deyke <rainerd eldwood.com>  
wrote:

 On 11/16/2010 21:53, Steven Schveighoffer wrote:
 It makes me think that this is going to be extremely confusing for a
 while, because people are so used to pure being equated with a
 functional language, so when they see a function is pure but takes
 mutable data, they will be scratching their heads.  It would be awesome
 to make weakly pure the default, and it would also make it so we have to
 change much less code.

Making functions weakly pure by default means that temporarily adding a tiny debug printf to any function will require a shitload of cascading 'impure' annotations. I would consider that completely unacceptable.

As would I. But I think in the case of debugging, we can have "trusted pure." This can be achieved by using extern(C) pure runtime functions.
 (Unless, of course, purity is detected automatically without the use of
 annotations at all.)

That would be ideal, but the issue is that the compiler may only have the signature and not the implementation. D would need to change its compilation model for this to work (and escape analysis, and link-time optimizations, etc.) -Steve
Nov 17 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 21 Nov 2010 17:54:25 -0500, Johann MacDonagh  
<johann.macdonagh..no spam..gmail.com> wrote:

 On 11/16/2010 12:58 PM, bearophile wrote:
 Steven Schveighoffer:

 The experienced programmers may write "scope int[] a...", and have no
 heap allocations.

This is a good idea. This isn't what I thought spir was saying, I thought he wanted the function to always allocate.

I have also suggested that when "scope" is not present, DMD may automatically add a runtime test similar to the one done by ensureHeaped and dup the array data only if it's on the stack. So even when you don't use "scope" it doesn't always copy.
 The only issue I see here is that scope should really be the default,
 because that is what you want most of the time.

If the variadics is in the signature of a free function then I agree with you. But if it's inside the this() of a class, then often you want that data on the heap.
 However, the compiler
 cannot prove that the data doesn't escape so it can't really enforce  
 that
 as the default.

If you look at my original answer I have suggested something like heaped, that's attached to an array and makes sure its data is on the heap. This is done with a mix (when possible) of static analysis and runtime tests (in the other cases). But I have not added this idea to the enhancement request of scoped variadics because it looks too much hard to implement in D/DMD.
 I believe the compiler cannot really be made to enforce that all  
 passed-in
 data will be heap-allocated when passed to foo.  A runtime check would  
 be
 a very good safety net.

Static analysis is able to do this and more, but it requires some logic added to the compiler (and such logic probably doesn't work in all cases).
 I have even suggested a transitive  noheap annotation, similar to
  nothrow, that makes sure a function contains no heap allocations and
 doesn't call other things that perform heap allocations:
 http://d.puremagic.com/issues/show_bug.cgi?id=5219
 The proliferation of function attributes produces "interesting"  
 results:
  noheap  safe nothrow pure real sin(in real x) { ... }

This is a bit much. Introducing these attributes is viral -- once you go noheap, anything you call must be noheap, and the majority of functions will need to be marked noheap. The gain is marginal at best anyways.

Indeed, it's a lot, and I am not sure it's a good idea. I have had this idea reading one or two articles written by people that write high performance games in C++. They need to keep the frame rate constantly higher than a minimum, like 30/s or 60/s. To do this they have to avoid C heap allocations inside certain large loops (D GC heap allocations may be even worse). Using noheap is a burden, but it may help you write code with a more deterministic performance. Maybe someday it will be possible to implement noheap with user-defined attributes plus static reflection, in D. But then the standard library will not use that user-defined noheap attribute, making it not so useful. So if you want it to be transitive, Phobos needs to be aware of it. Bye, bearophile

I'm for the "safe by default, you have to work to be unsafe". In this case, the compiler should have noticed the data was being escaped and passed 1,2,3 as a heap (or perhaps auto-duped at the point it was assigned). It does this kind of thing when you have a closure (nested delegate). The variadic syntax is confusing (although arguably probably the best way to deal with it). An average developer expects assignments to always be safe. Assigning a function scope static array to a global static or dynamic array results in either copying or a heap allocation / copying. There's no need to "think about it" as you would in C. You simply assign and it works. Although I do agree some kind of compiler switch that tells you when there hidden heap allocations would be nice for performance tuning.

Let's say you give the compiler this .di file: void foo(int[] nums...); And an object file, how does the compiler know whether nums escapes or not? The answer is, it cannot. That is the problem with D's compilation model which makes analysis impossible. One must declare the intentions first (via the function signature), and then adhere to the intentions. Bearophile's idea of using scope is good, we can probably make that work. -Steve
Nov 22 2010