www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Array literals REALLY should be immutable

reply Don <nospam nospam.com> writes:
I think this is quite horrible. [1, 2, 3] looks like an array literal, 
but it isn't -- it's an array constructor. It doesn't look like a 
function call. It shouldn't be.

Q1. How do you declare an array literal [1,2,3]?
A. It took me four attempts before I got it.

==========================

int main()
{
    immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time
    static int[] x2 = [1, 2, 3];    // NO - uses thread local storage
    enum int[] x3 = [1, 2, 3];      // NO - not indexable at run time.

    static immutable int[] x4 = [1, 2, 3];  // OK
    static const int[] x5 = [1, 2, 3];      // also OK

    for (int i=0; i< 3; ++i) {
        if (x4[i]==3) return i;
    }
    return 0;
}

(x3 is currently accepted, but that's a bug -- the whole point of 'enum' 
is that you can't take the address of it).

This is really ugly and non-intuitive for something so simple. x1 should 
just work.

Q2: How do you create such an array literal and pass it in a function call?
A. ??? Is this even possible right now?

My code is *full* of these guys. For example, function approximations 
use them (look at any of the special functions code in Tango.math, or 
etc.gamma). Unit tests are full of them. Everyone uses look-up tables.

Bug 2356 is a consequence of this.

By constrast, the stupid array constructors we have now can be 
implemented in a trivial library function:

T[] array(T)(T[] x...) { return x.dup; }

I really don't see how syntax sugar for something so simple can be 
justified, at the expense of basic functionality (lookup tables, 
essentially). Especially when it's creating an inconsistency with string 
literals.
Nov 12 2009
next sibling parent Moritz Warning <moritzwarning web.de> writes:
On Thu, 12 Nov 2009 14:28:05 +0100, Don wrote:

 I think this is quite horrible. [1, 2, 3] looks like an array literal,
 but it isn't -- it's an array constructor. It doesn't look like a
 function call. It shouldn't be.

I've hit the problem around four times last week in combination with C bindings. I agree with the proposal, but I miss action.
Nov 12 2009
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Nov 2009 16:28:05 +0300, Don <nospam nospam.com> wrote:

 I think this is quite horrible. [1, 2, 3] looks like an array literal,  
 but it isn't -- it's an array constructor. It doesn't look like a  
 function call. It shouldn't be.

 Q1. How do you declare an array literal [1,2,3]?
 A. It took me four attempts before I got it.

 ==========================

 int main()
 {
     immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time
     static int[] x2 = [1, 2, 3];    // NO - uses thread local storage
     enum int[] x3 = [1, 2, 3];      // NO - not indexable at run time.

     static immutable int[] x4 = [1, 2, 3];  // OK
     static const int[] x5 = [1, 2, 3];      // also OK

     for (int i=0; i< 3; ++i) {
         if (x4[i]==3) return i;
     }
     return 0;
 }

 (x3 is currently accepted, but that's a bug -- the whole point of 'enum'  
 is that you can't take the address of it).

 This is really ugly and non-intuitive for something so simple. x1 should  
 just work.

 Q2: How do you create such an array literal and pass it in a function  
 call?
 A. ??? Is this even possible right now?

 My code is *full* of these guys. For example, function approximations  
 use them (look at any of the special functions code in Tango.math, or  
 etc.gamma). Unit tests are full of them. Everyone uses look-up tables.

 Bug 2356 is a consequence of this.

 By constrast, the stupid array constructors we have now can be  
 implemented in a trivial library function:

 T[] array(T)(T[] x...) { return x.dup; }

 I really don't see how syntax sugar for something so simple can be  
 justified, at the expense of basic functionality (lookup tables,  
 essentially). Especially when it's creating an inconsistency with string  
 literals.

Can't agree more. I see no problem writing [1, 2, 3].dup; In fact, I can count all the uses of *dynamic* array literals on the fingers of one hand. I mostly need then for either indexing or iteration so the contents is read-only. I strongly believe that "No hidden allocation" policy should be adopted by D/Phobos (it is already adopted by Tango with a great success).
Nov 12 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Denis Koroskin (2korden gmail.com)'s article
 I strongly believe that "No hidden allocation" policy should be adopted by
 D/Phobos (it is already adopted by Tango with a great success).

I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it? 2. How do you really design high-level library functions if they're not allowed to allocate memory? If you require the user to provide all kinds of details about where the memory they use comes from then you lose some of the high level-ness and make it seem more like an ugly C API that doesn't "just work" and requires attention to the irrelevant the 90% of the time that you don't care about an extra allocation. The solution I personally use in my dstats lib, which works pretty well in the limited case of arrays of primitives, but might not generalize, is: a. For stuff that returns an array, the last argument to the function is an optional buffer. If it is provided and is big enough, the results are returned in it. If it is not provided or is too small, a new one is allocated. b. For temporary buffers used within a function, I use a thread-local second stack (TempAlloc). While this is not **guaranteed** never to result in an allocation (if we're out of space in our current chunk of memory, a new one will be allocated), it very seldom does and only when the only alternative would be to crash, throw an exception, etc.
Nov 12 2009
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Bill Baxter (wbaxter gmail.com)'s article
 2009/11/12 Denis Koroskin <2korden gmail.com>:
 // untested
 void mkdirRecurse(string path) {
    char* buffer = alloca(path.length);
    memcpy(buffer, path);

    foreach (i, c; buffer[0..path.length]) {
        if (c == '/') {
            buffer[i] = 0;
            mkdir(buffer);
            buffer[i] = '/';
        }
    }
 }

 There are a lot of functions that allocate without a clear reason.)

to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb

I don't understand this attitude. There are definitely times when readability and maintainability count more than performance, but library code that will be used in hundreds of different places isn't one of them. Knuth says we should forget about small efficiencies about 97% of the time. He's right. However, when you are writing this kind of generic library code, the odds are pretty good that at least one place where it's used is going to be in the 3%.
Nov 12 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:
 
 == Quote from Denis Koroskin (2korden gmail.com)'s article
 I strongly believe that "No hidden allocation" policy should be 
 adopted by
 D/Phobos (it is already adopted by Tango with a great success).

I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?

I can't give a formal definition of that, but for me a is allowed to allocate if it produces something new or unique. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it.

Please bugzilla that, thanks. I'll fix. Andrei
Nov 12 2009
prev sibling next sibling parent reply Max Samukha <spambox d-coding.com> writes:
On Thu, 12 Nov 2009 14:28:05 +0100, Don <nospam nospam.com> wrote:

I think this is quite horrible. [1, 2, 3] looks like an array literal, 
but it isn't -- it's an array constructor. It doesn't look like a 
function call. It shouldn't be.

I absolutely agree. One note: I hope that x3 will remain valid and be indexable with a compile-time value.
Nov 12 2009
parent Don <nospam nospam.com> writes:
Max Samukha wrote:
 On Thu, 12 Nov 2009 14:28:05 +0100, Don <nospam nospam.com> wrote:
 
 I think this is quite horrible. [1, 2, 3] looks like an array literal, 
 but it isn't -- it's an array constructor. It doesn't look like a 
 function call. It shouldn't be.

I absolutely agree. One note: I hope that x3 will remain valid and be indexable with a compile-time value.

Yes, that's the intention. See bug 2559.
Nov 12 2009
prev sibling next sibling parent Eldar Insafutdinov <e.insafutdinov gmail.com> writes:
Don Wrote:

 I think this is quite horrible. [1, 2, 3] looks like an array literal, 
 but it isn't -- it's an array constructor. It doesn't look like a 
 function call. It shouldn't be.
 
 Q1. How do you declare an array literal [1,2,3]?
 A. It took me four attempts before I got it.
 
 ==========================
 
 int main()
 {
     immutable int[] x1 = [1, 2, 3]; // NO - not evaluated at compile time
     static int[] x2 = [1, 2, 3];    // NO - uses thread local storage
     enum int[] x3 = [1, 2, 3];      // NO - not indexable at run time.
 
     static immutable int[] x4 = [1, 2, 3];  // OK
     static const int[] x5 = [1, 2, 3];      // also OK
 
     for (int i=0; i< 3; ++i) {
         if (x4[i]==3) return i;
     }
     return 0;
 }
 
 (x3 is currently accepted, but that's a bug -- the whole point of 'enum' 
 is that you can't take the address of it).
 
 This is really ugly and non-intuitive for something so simple. x1 should 
 just work.
 
 Q2: How do you create such an array literal and pass it in a function call?
 A. ??? Is this even possible right now?
 
 My code is *full* of these guys. For example, function approximations 
 use them (look at any of the special functions code in Tango.math, or 
 etc.gamma). Unit tests are full of them. Everyone uses look-up tables.
 
 Bug 2356 is a consequence of this.
 
 By constrast, the stupid array constructors we have now can be 
 implemented in a trivial library function:
 
 T[] array(T)(T[] x...) { return x.dup; }
 
 I really don't see how syntax sugar for something so simple can be 
 justified, at the expense of basic functionality (lookup tables, 
 essentially). Especially when it's creating an inconsistency with string 
 literals.
 

I agree too. that will be consistent.
Nov 12 2009
prev sibling next sibling parent grauzone <none example.net> writes:
Don wrote:
 I think this is quite horrible. [1, 2, 3] looks like an array literal, 
 but it isn't -- it's an array constructor. It doesn't look like a 
 function call. It shouldn't be.

Can we make int[3] a = [1,2,x]; Just Work (tm)? Because right now (D1), it allocates an array literal, and then copies it into the static array. Incredibly stupid.
Nov 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 I strongly believe that "No hidden allocation" policy should be adopted  
 by
 D/Phobos (it is already adopted by Tango with a great success).

I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?

I can't give a formal definition of that, but for me a is allowed to allocate if it produces something new or unique. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it.
 2.  How do you really design high-level library functions if they're not  
 allowed
 to allocate memory?  If you require the user to provide all kinds of  
 details about
 where the memory they use comes from then you lose some of the high  
 level-ness and
 make it seem more like an ugly C API that doesn't "just work" and  
 requires
 attention to the irrelevant the 90% of the time that you don't care  
 about an extra
 allocation.  The solution I personally use in my dstats lib, which works  
 pretty
 well in the limited case of arrays of primitives, but might not  
 generalize, is:

     a.  For stuff that returns an array, the last argument to the  
 function is an
 optional buffer.  If it is provided and is big enough, the results are  
 returned in
 it.  If it is not provided or is too small, a new one is allocated.

     b.  For temporary buffers used within a function, I use a  
 thread-local second
 stack  (TempAlloc).  While this is not **guaranteed** never to result in  
 an
 allocation (if we're out of space in our current chunk of memory, a new  
 one will
 be allocated), it very seldom does and only when the only alternative  
 would be to
 crash, throw an exception, etc.

-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Nov 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Nov 2009 19:49:58 +0300, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Denis Koroskin (2korden gmail.com)'s article
 I strongly believe that "No hidden allocation" policy should be adopted  
 by
 D/Phobos (it is already adopted by Tango with a great success).

I can see the value in this, but two issues: 1. What counts as a "hidden" allocation? How non-obvious does it have to be that something requires an allocation? If something really has to allocate and it's not obvious from the nature of the function, is it enough to just document it?

I can't give a formal definition of that, but for me a function is allowed to allocate if that allocation is returned back to the user. If function allocates and the memory become unreferenced after function returns, then this allocation is redundant and should be get rid of. For example, void mkdirRecurse(string pathname) shouldn't allocate, but it does, because the author didn't care about allocations when implemented it. (It invokes mkdir() for each directory in a path, and mkdir allocates a new string to make sure it end with \0. Alternatively, a copy of path could be created only once - on a stack buffer - and get reused by putting \0 in place of slashes to terminate it. Something like this: // untested void mkdirRecurse(string path) { char* buffer = alloca(path.length); memcpy(buffer, path); foreach (i, c; buffer[0..path.length]) { if (c == '/') { buffer[i] = 0; mkdir(buffer); buffer[i] = '/'; } } } There are a lot of functions that allocate without a clear reason.)
 2.  How do you really design high-level library functions if they're not  
 allowed
 to allocate memory?  If you require the user to provide all kinds of  
 details about
 where the memory they use comes from then you lose some of the high  
 level-ness and
 make it seem more like an ugly C API that doesn't "just work" and  
 requires
 attention to the irrelevant the 90% of the time that you don't care  
 about an extra
 allocation.  The solution I personally use in my dstats lib, which works  
 pretty
 well in the limited case of arrays of primitives, but might not  
 generalize, is:

     a.  For stuff that returns an array, the last argument to the  
 function is an
 optional buffer.  If it is provided and is big enough, the results are  
 returned in
 it.  If it is not provided or is too small, a new one is allocated.

     b.  For temporary buffers used within a function, I use a  
 thread-local second
 stack  (TempAlloc).  While this is not **guaranteed** never to result in  
 an
 allocation (if we're out of space in our current chunk of memory, a new  
 one will
 be allocated), it very seldom does and only when the only alternative  
 would be to
 crash, throw an exception, etc.

Yes, this is a good solution.
Nov 12 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
2009/11/12 Denis Koroskin <2korden gmail.com>:

 // untested
 void mkdirRecurse(string path) {
 =A0 =A0char* buffer =3D alloca(path.length);
 =A0 =A0memcpy(buffer, path);

 =A0 =A0foreach (i, c; buffer[0..path.length]) {
 =A0 =A0 =A0 =A0if (c =3D=3D '/') {
 =A0 =A0 =A0 =A0 =A0 =A0buffer[i] =3D 0;
 =A0 =A0 =A0 =A0 =A0 =A0mkdir(buffer);
 =A0 =A0 =A0 =A0 =A0 =A0buffer[i] =3D '/';
 =A0 =A0 =A0 =A0}
 =A0 =A0}
 }

 There are a lot of functions that allocate without a clear reason.)

I'm pretty sure the reason is that it means library code that's easier to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb
Nov 12 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 12 Nov 2009 20:26:44 +0300, Bill Baxter <wbaxter gmail.com> wrote:

 2009/11/12 Denis Koroskin <2korden gmail.com>:

 // untested
 void mkdirRecurse(string path) {
    char* buffer = alloca(path.length);
    memcpy(buffer, path);

    foreach (i, c; buffer[0..path.length]) {
        if (c == '/') {
            buffer[i] = 0;
            mkdir(buffer);
            buffer[i] = '/';
        }
    }
 }

 There are a lot of functions that allocate without a clear reason.)

I'm pretty sure the reason is that it means library code that's easier to write, understand and maintain. But yeh, if you give me the choice of two different functions, one that allocates and one that doesn't, otherwise identical, I'll pick the non-allocating version. --bb

It also means that the former function can't be used in programs that disable GC (kernels, embedded development etc). Quality is in the details like that. Java/C# don't follow this rule, but they are not systems programming languages, and their GCs are a lot better that D's one.
Nov 12 2009
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don wrote:
 Especially when it's creating an inconsistency with string 
 literals.

The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.
Nov 12 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 12 Nov 2009 14:46:29 -0500, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Don wrote:
 Especially when it's creating an inconsistency with string literals.

The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.

I thought so too, but I think Don is right. A library function can solve that problem: auto arr = array(1,2,x); BTW, there is legitimate inconsistency here: int[] x = [1,2,3]; // compiles and does what you expect char[] str = "abc"; // should allocate a mutable string on the heap, should it not? -Steve
Nov 12 2009
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Don wrote:
 Especially when it's creating an inconsistency with string literals.

The inconsistency bothers me, too, but then there's the case: int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.

I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition. The latter is just a standard lookup table, that results in no code generation. I don't see why these two very different operations should share the same syntax -- they don't actually have much in common. It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are completely identical. C++ got away with using the same syntax for these two totally different things, because (1) it doesn't have any kind of constant folding/CTFE, so it can look at the array entries and determine whether it's immutable or not; and (2) it ignores multi-core issues. The fact that the language doesn't have any syntax for an immutable array literal is really a problem. Some people are getting around it by using a CTFE function to convert all the values to a string literal, then casting that string literal to (say) an array of ints. That's currently the only way to make the compiler generate decent code, and it's quite dreadful. BTW, I'm pretty sure that making array literals immutable would simplify the compiler. EG, I've noticed that mutable array literals cause many problems for the interpreter.
Nov 13 2009
parent Don <nospam nospam.com> writes:
Denis Koroskin wrote:
 On Fri, 13 Nov 2009 12:57:52 +0300, Don <nospam nospam.com> wrote:
 
 Walter Bright wrote:
 Don wrote:
 Especially when it's creating an inconsistency with string literals.

int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.

I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition.

With thread-local-by-default in mind, this is not an issue.

You mean the race condition is not an issue? It is an issue because the compiler needs to deal with it (perhaps by using thread local variables!) But the second case is not implemented in DMD anyway.
 The latter is just a standard lookup table, that results in no code 
 generation.
 I don't see why these two very different operations should share the 
 same syntax -- they don't actually have much in common.

 It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are 
 completely identical.

They aren't: "abcd" has a null-terminator past the string, and ['a', 'b', 'c', 'd'] doesn't.

You're right. Though, it'd be easy to add a null terminator to the end of memory allocated to char-typed array literals, and make them identical in every respect.
 C++ got away with using the same syntax for these two totally 
 different things, because (1) it doesn't have any kind of constant 
 folding/CTFE, so it can look at the array entries and determine 
 whether it's immutable or not; and (2) it ignores multi-core issues.

 The fact that the language doesn't have any syntax for an immutable 
 array literal is really a problem. Some people are getting around it 
 by using a CTFE function to convert all the values to a string 
 literal, then casting that string literal to (say) an array of ints. 
 That's currently the only way to make the compiler generate decent 
 code, and it's quite dreadful.

 BTW, I'm pretty sure that making array literals immutable would 
 simplify the compiler. EG, I've noticed that mutable array literals 
 cause many problems for the interpreter.


Nov 13 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 13 Nov 2009 12:57:52 +0300, Don <nospam nospam.com> wrote:

 Walter Bright wrote:
 Don wrote:
 Especially when it's creating an inconsistency with string literals.

int x; ... [1, 2, x] That can't be made immutable. Shouldn't it work? There's no analog for that for string literals, so the inconsistency isn't quite complete.

I don't think it should work. [1, 2, x] is totally different from, and a far more complicated beast than [1, 2, 3]. [1, 2, x] either allocates memory and performs some form of memory copy, or else it pokes the 'x' value into a half-initialised static array, exposing the code to a possible race condition.

With thread-local-by-default in mind, this is not an issue.
 The latter is just a standard lookup table, that results in no code  
 generation.
 I don't see why these two very different operations should share the  
 same syntax -- they don't actually have much in common.

 It'd be nice to able to say that "abcd" and ['a', 'b', 'c', 'd'] are  
 completely identical.

They aren't: "abcd" has a null-terminator past the string, and ['a', 'b', 'c', 'd'] doesn't.
 C++ got away with using the same syntax for these two totally different  
 things, because (1) it doesn't have any kind of constant folding/CTFE,  
 so it can look at the array entries and determine whether it's immutable  
 or not; and (2) it ignores multi-core issues.

 The fact that the language doesn't have any syntax for an immutable  
 array literal is really a problem. Some people are getting around it by  
 using a CTFE function to convert all the values to a string literal,  
 then casting that string literal to (say) an array of ints. That's  
 currently the only way to make the compiler generate decent code, and  
 it's quite dreadful.

 BTW, I'm pretty sure that making array literals immutable would simplify  
 the compiler. EG, I've noticed that mutable array literals cause many  
 problems for the interpreter.

Nov 13 2009