www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Changeset 442, implicit Vs explicit

reply bearophile <bearophileHUGS lycos.com> writes:
Do you see a problem here?

import std.stdio: writeln;
int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
void main() {
    writeln(arr);
}

This code compiles and runs fine, I think according to the D specs, but it can
hide a bug.

There are three main situations where you use an array literal to define a
fixed-size array:
1) You want a static array that contains as many items as in the literal. This
is a common case, maybe the most common in my code. For this I and other people
have suggested a simple syntax that helps avoid to count the items:
int[$] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
2) You want an array with N items. In this situation you put N in the type
definition, on the left as in that 'arr' array. This situation can lead to bugs
in D code because the compiler accepts literals with a length different from N.
Changeset 442 fixes part of this situation but I don't know if it fixes the
whole problem (because it'a a big changeset and I don't understand all the
things it changes).
(Don has split my bug report to avoid fixing the whole problem. I don't think
mine is an enhancement request as Don writes, I think it's a bug in the D
specs.)

The changeset 442:
http://dsource.org/projects/dmd/changeset/442

It's relative to this bug reported by Don:
http://d.puremagic.com/issues/show_bug.cgi?id=3974

That comes from this bug report of mine:
http://d.puremagic.com/issues/show_bug.cgi?id=3849

My opinion is that those two cases cover most situations. So I think it's
positive to raise errors if the literal has less items than the specified N,
because that 'arr' literal is likely a bug, it defines trailing zeros in an
implicit way. As the second item in the Python Zen says: "Explicit is better
than implicit.". I desire to avoid sources of bugs in D, when possible because
a programmer wants to give all his/her/hir attention to the more serious
sources of bugs, like data corruption, database mistakes, etc, instead of
error-prone minutiae of the language syntax.

A third case is when you really want to define a static array using a literal
that has less items than the static array. I think this is an uncommon case,
what are the real use cases of this? But we might want to support it anyway.
Two examples:

int[4] datas = [1, 2];
int[3][3] mat = [[1,2,3], [4,5,6]];

In D there is another way to write that (using the associative array syntax!
this smells bad):
int[4] datas = [0:1, 1:2];

Another possibile syntax not currently supported, that I don't like:
int[4] datas = [1, 2,,];

A better and explicit syntax, that uses the * array operator as in Python (this
mul and concat operations are done at compile-time):
int[4] datas = [1, 2] ~ ([0] * 2);

Or even, using Phobos (this concat operation has to be done at compile-time):
int[4] datas = [1, 2] ~ StaticArr!(2);

Another:
 implicit_filling int[4] datas = [1, 2];
Or:
 implicit_array_filling int[4] datas = [1, 2];

If you have other ideas you can show them.
In practice many things are better than nothing, that is than a silent implicit
initialization.

Bye,
bearophile
Apr 27 2010
parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 Do you see a problem here?
 
 import std.stdio: writeln;
 int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
 void main() {
     writeln(arr);
 }
 
 This code compiles and runs fine, I think according to the D specs, but it can
hide a bug.
 (Don has split my bug report to avoid fixing the whole problem. I don't think
mine is an enhancement request as Don writes, I think it's a bug in the D
specs.)
I share your opinion that it's a poor design, but it's not a bug. It's explicitly supported in the spec. By contrast, a literal with too many items is clearly a bug (it's an array bounds error). Especially when the compiler crashes.
 A third case is when you really want to define a static array using a literal
that has less items than the static array. I think this is an uncommon case,
what are the real use cases of this? But we might want to support it anyway.
Two examples:
 
 int[4] datas = [1, 2];
 int[3][3] mat = [[1,2,3], [4,5,6]];
 
 In D there is another way to write that (using the associative array syntax!
this smells bad):
 int[4] datas = [0:1, 1:2];
I do not know why this syntax is supported, but it's in the spec. Note that you can even write: int [4] datas = [1:2, 3, 0:1]; That syntax explicitly allows members to remain uninitialized. (BTW, it's much, much older than associative array syntax).
Apr 27 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don wrote:
 bearophile wrote:
 Do you see a problem here?

 import std.stdio: writeln;
 int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
 void main() {
     writeln(arr);
 }

 This code compiles and runs fine, I think according to the D specs, 
 but it can hide a bug.
 (Don has split my bug report to avoid fixing the whole problem. I 
 don't think mine is an enhancement request as Don writes, I think it's 
 a bug in the D specs.)
I share your opinion that it's a poor design, but it's not a bug. It's explicitly supported in the spec.
The idea is that for long arrays, often only the start of it needs initialization to anything but 0.
 By contrast, a literal with too many items is clearly a bug (it's an 
 array bounds error). Especially when the compiler crashes.
Yes, and thanks for your patch to fix it!
 A third case is when you really want to define a static array using a 
 literal that has less items than the static array. I think this is an 
 uncommon case, what are the real use cases of this? But we might want 
 to support it anyway. Two examples:

 int[4] datas = [1, 2];
 int[3][3] mat = [[1,2,3], [4,5,6]];

 In D there is another way to write that (using the associative array 
 syntax! this smells bad):
 int[4] datas = [0:1, 1:2];
I do not know why this syntax is supported,
It's so if you only need a few items in a large array initialized. C99 supports it as well.
 but it's in the spec. Note 
 that you can even write:
 int [4] datas = [1:2, 3, 0:1];
 That syntax explicitly allows members to remain uninitialized.
 (BTW, it's much, much older than associative array syntax).
Apr 27 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 The idea is that for long arrays, often only the start of it needs 
 initialization to anything but 0.
I got the idea well, but my idea is that it's bug-prone syntax, because it's partially implicit (I have put a bug in a program of mine because of that syntax). How often do you need that? Is such frequency worth the risk? Can you tell me some cases? I have not had to use it so far :-) And the syntax like the arr[$] = [...] can avoid some of those bugs because you don't need to count items. And even if you are right, and it's a common need (other people here can confirm), then I am asking for an explicit syntax to ask for a partial initialization, to avoid bugs.
C99 supports it as well.<
C is known to be the less safe language among the ones used today (today people Bye, bearophile
Apr 27 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 The idea is that for long arrays, often only the start of it needs 
 initialization to anything but 0.
I got the idea well, but my idea is that it's bug-prone syntax, because it's partially implicit (I have put a bug in a program of mine because of that syntax). How often do you need that? Is such frequency worth the risk? Can you tell me some cases?
I have used it, though I use statically initialized arrays at all very rarely. An example is the _ctype[] array which leaves the upper 128 entries unspecified, and therefore 0.
 I have not had to use it so far :-) And the syntax
 like the arr[$] = [...] can avoid some of those bugs because you don't need
 to count items.
 
 And even if you are right, and it's a common need (other people here can
 confirm), then I am asking for an explicit syntax to ask for a partial
 initialization, to avoid bugs.
Anytime you statically initialize an array with more than a small number of values, it's bug-prone, even if the number of elements match the dimension. There's no check against transposing entries, or mis-typing the entries themselves. Your options are to either write unit tests for the table, or have the table generated by another program. Dmd uses program-generated tables for this reason.
 C99 supports it as well.<
C is known to be the less safe language among the ones used today (today etc).
I know, I just wished to point out that it was not a unique language element.
Apr 27 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 I have used it, though I use statically initialized arrays at all very rarely.
 An example is the _ctype[] array which leaves the upper 128 entries
unspecified,
 and therefore 0.
Both Phobos and druntime seem to contain a copy of the module ctype (o.O), it contains the array: immutable ubyte _ctype[128] = [_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, immutable ubyte _ctype[128] = [... In both cases the array is defined as 128 items and I have counted it contains 128 items. So I think you are wrong. But even if you meant something like this: immutable ubyte _ctype2[256] = [/*128 items here*/]; My opinion is that it's bug-safer something like: implicit_filling immutable ubyte _ctype2[256] = [/*128 items here*/]; Othrerwise this will another bit of work to do for future D lints :-)
 Anytime you statically initialize an array with more than a small number of
values,
 it's bug-prone, even if the number of elements match the dimension. There's no
check
 against transposing entries, or mis-typing the entries themselves.
I agree, literals can hide many other kinds of bugs. Bugs caused by the length correspondence is just one of them and probably it's not even the most common. But: - keeping this souce of possible bugs doesn't help; - having the compiler test for the equality of the two lengths at compile-time doesn't give the programmer a false sense of security, it's just a natural extra test the programmer (me) expects the compiler to perform. So I think this test is not going to increase the bug count in any case.
 Your options are to either write unit tests for the table, or have the table
 generated by another program.
One the main points of unit testing is (just as Literate programming, that was present in D1, and it is very common in the Haskell world, see for example: http://www.imperialviolet.org/binary/jpeg/ ) to offer a secont point of view to see a block of code. This allows to spot bugs much more efficiently (in literate programming the second point of view is the textual decription). Tests are code too, so they too can contain bugs. So when possible you often want your tests to be at a bit higher level of abstraction (to "summarize" the code), so they can be shorter, and hopefully contain less bugs than the code they test (in practice often my unit tests are longer than the code they test). If I have code like this it's legal D2 code: // ... enum int N_DIRECTIONS = 4; // ... // ... enum string[N_DIRECTIONS] cardinal_directions = ["north", "south", "east"]; // ... Then you suggest to write a unit test for the table to spot its bug, but what can I put inside this unittest? This is not so useful, because it doesn't catch the bug: unittest { // tests of global data static assert(cardinal_directions.length == 4); } One thing I can do is to test each item, for example: unittest { // tests of global data static assert(cardinal_directions == ["north", "south", "east", "west"]); } Or: unittest { // tests of global data static assert(cardinal_directions[0] == "north"); static assert(cardinal_directions[1] == "south"); static assert(cardinal_directions[2] == "east"); static assert(cardinal_directions[3] == "west"); } Such unit tests catch the bug, but in both cases they just repeat the contents of the data, so they aren't at a higher level compared to the code they test. So they are not so useful. I can't help but think that the right solution for this situation is at at a compiler level, not at unit test level. You have not commented about the other syntax I (and others) have suggested (that has a different purpose, it's not meant to replace the item count test): arr[$] = [/*...*/]; So I guess you are not so interested in this too. Bye, bearophile
Apr 27 2010
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 
 I have used it, though I use statically initialized arrays at all very
 rarely. An example is the _ctype[] array which leaves the upper 128 entries
 unspecified, and therefore 0.
Both Phobos and druntime seem to contain a copy of the module ctype (o.O), it contains the array: immutable ubyte _ctype[128] = [_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, immutable ubyte _ctype[128] = [... In both cases the array is defined as 128 items and I have counted it contains 128 items. So I think you are wrong.
No, it also exists in the C runtime.
 You have not commented about the other syntax I (and others) have suggested
 (that has a different purpose, it's not meant to replace the item count
 test): arr[$] = [/*...*/]; So I guess you are not so interested in this too.
D is full of syntax, at some point adding more and more syntax to deal with more and more obscure cases is not a net improvement. There's a point of diminishing returns.
Apr 27 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 No, it also exists in the C runtime.
OK, the same array is specified three times.
 D is full of syntax, at some point adding more and more syntax to deal with
more 
 and more obscure cases is not a net improvement. There's a point of
diminishing 
 returns.
OK. I have updated the bug report: http://d.puremagic.com/issues/show_bug.cgi?id=3849 Bye and thank you for your answers, bearophile
Apr 27 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 D is full of syntax, at some point adding more and more syntax to deal with
more
 and more obscure cases is not a net improvement. There's a point of diminishing
 returns.
On this I can add another general note about language design :-) Lisp/Scheme programmers think that minimizing the syntax allows an usable form of macros, and they seem to even think minimizing the syntax is good in general too. A language like Dylan has Algol-like syntax and Lisp-like syntax and macros, but its macros can be harder to write. So I agree with them that a S-expression-based language allows for simpler to use macros, but I don't agree with them that removing most syntax helps normal programming too. The recent very Lisp-like Clojure language adds a little amount of special syntax for some collections, etc. From the languages I have learnt I have seen that learning the syntax takes just a certain amount of time in the beginning. For me programming in languages have had to learn. For me it's less easy to keep in memory special cases, special cases of special cases, a long number of language traps, possible run-time bugs that the compiler&runtime are not able to catch for me, etc. Python has ten times the syntax of Scheme, but for me learning Python didn't take much more than than learning to write programs in Scheme. That's why more keywords is not a disaster. What's bad is when the same keyword is used for subtly different purposes in similar contexts :-) This can cause bugs and troubles. So I think language designers have to minimize not just the total amount of syntax but the amount of traps, special cases, special cases of special cases, implicit behaviours, bug-prone syntaxes, unnatural syntaxes, or names that are subtly different in both their letters and semantics (like for example the chomp() and chop() functions of std.string). When I have a clean syntax, easy to read and explicit, that has a single purpose and it offers no common traps, then I can learn to use it and use it reliably even if it's not so common in programs and even if there's lot of other syntax in the language. There is of course a limit in the amount of syntaxes that we can accept. In this case of static arrays I agree that I can live without the [$] syntax, even if it's not hard to understand what its purpose is, but I can accept this missing feature much better if the language enforces the length of the literal to be the same as the the specified length in the type :-) Bye, bearophile
Apr 27 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-04-27 11:50:55 -0400, bearophile <bearophileHUGS lycos.com> said:

 My opinion is that it's bug-safer something like:
 
  implicit_filling immutable ubyte _ctype2[256] = [/*128 items here*/];
Ouch! I agree with the idea, but can't you find a better syntax? What about this: immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s. immutable ubyte b[256] = [1,2,3,4]; // error: not enough values -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 27 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:
 Ouch! I agree with the idea, but can't you find a better syntax? What 
 about this:
 
 	immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s.
 	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values
I think this is better (note the last comma) to avoid confusion with FP values: immutable float a[10] = [1., 2., 3., 4., ...]; Nice, and better than mine. Basically here I can accept almost any syntax that is explicit :-) Thank you :-) Bye, bearophile
Apr 27 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-04-27 13:55:04 -0400, bearophile <bearophileHUGS lycos.com> said:

 Michel Fortin:
 Ouch! I agree with the idea, but can't you find a better syntax? What
 about this:
 
 	immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s.
 	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values
I think this is better (note the last comma) to avoid confusion with FP values: immutable float a[10] = [1., 2., 3., 4., ...];
I see your point. What I suggested is that "4..." would pad the remaining part with 4s. If you add a comma, it looks like a separate value, and I'd understand the padding to be done with zero for int, nan for floats, in other words the default value for the type. Another idea would be to use the AA syntax (which you can use in a regular array too) to specify a default value for slots where no value has been defined: immutable ubyte a[256] = [1,2,3, default: 4]; immutable ubyte b[256] = [1,2,3,4]; // error: not enough values and no default value immutable ubyte c[256] = [40:1, 41:2, 42:3, default:4];
 Nice, and better than mine. Basically here I can accept almost any 
 syntax that is explicit :-)
 Thank you :-)
You're welcome. Let's hope it helps your case. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 27 2010