digitalmars.D - Changeset 442, implicit Vs explicit

bearophile (38/38) Apr 27 2010 Do you see a problem here?

Don (10/27) Apr 27 2010 I share your opinion that it's a poor design, but it's not a bug. It's

Walter Bright (6/44) Apr 27 2010 The idea is that for long arrays, often only the start of it needs

bearophile (8/11) Apr 27 2010 I got the idea well, but my idea is that it's bug-prone syntax, because ...

Walter Bright (11/31) Apr 27 2010 I have used it, though I use statically initialized arrays at all very r...

bearophile (45/53) Apr 27 2010 Both Phobos and druntime seem to contain a copy of the module ctype (o.O...

Walter Bright (5/20) Apr 27 2010 D is full of syntax, at some point adding more and more syntax to deal w...

bearophile (6/10) Apr 27 2010 OK. I have updated the bug report:
bearophile (11/14) Apr 27 2010 On this I can add another general note about language design :-)

Michel Fortin (9/12) Apr 27 2010 Ouch! I agree with the idea, but can't you find a better syntax? What

bearophile (7/12) Apr 27 2010 I think this is better (note the last comma) to avoid confusion with FP ...

Michel Fortin (17/29) Apr 27 2010 I see your point. What I suggested is that "4..." would pad the

bearophile <bearophileHUGS lycos.com> writes:

Do you see a problem here?

import std.stdio: writeln;
int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
void main() {
    writeln(arr);
}

This code compiles and runs fine, I think according to the D specs, but it can
hide a bug.

There are three main situations where you use an array literal to define a
fixed-size array:
1) You want a static array that contains as many items as in the literal. This
is a common case, maybe the most common in my code. For this I and other people
have suggested a simple syntax that helps avoid to count the items:
int[$] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
2) You want an array with N items. In this situation you put N in the type
definition, on the left as in that 'arr' array. This situation can lead to bugs
in D code because the compiler accepts literals with a length different from N.
Changeset 442 fixes part of this situation but I don't know if it fixes the
whole problem (because it'a a big changeset and I don't understand all the
things it changes).
(Don has split my bug report to avoid fixing the whole problem. I don't think
mine is an enhancement request as Don writes, I think it's a bug in the D
specs.)

The changeset 442:
http://dsource.org/projects/dmd/changeset/442

It's relative to this bug reported by Don:
http://d.puremagic.com/issues/show_bug.cgi?id=3974

That comes from this bug report of mine:
http://d.puremagic.com/issues/show_bug.cgi?id=3849

My opinion is that those two cases cover most situations. So I think it's
positive to raise errors if the literal has less items than the specified N,
because that 'arr' literal is likely a bug, it defines trailing zeros in an
implicit way. As the second item in the Python Zen says: "Explicit is better
than implicit.". I desire to avoid sources of bugs in D, when possible because
a programmer wants to give all his/her/hir attention to the more serious
sources of bugs, like data corruption, database mistakes, etc, instead of
error-prone minutiae of the language syntax.

A third case is when you really want to define a static array using a literal
that has less items than the static array. I think this is an uncommon case,
what are the real use cases of this? But we might want to support it anyway.
Two examples:

int[4] datas = [1, 2];
int[3][3] mat = [[1,2,3], [4,5,6]];

In D there is another way to write that (using the associative array syntax!
this smells bad):
int[4] datas = [0:1, 1:2];

Another possibile syntax not currently supported, that I don't like:
int[4] datas = [1, 2,,];

A better and explicit syntax, that uses the * array operator as in Python (this
mul and concat operations are done at compile-time):
int[4] datas = [1, 2] ~ ([0] * 2);

Or even, using Phobos (this concat operation has to be done at compile-time):
int[4] datas = [1, 2] ~ StaticArr!(2);

Another:
 implicit_filling int[4] datas = [1, 2];
Or:
 implicit_array_filling int[4] datas = [1, 2];

If you have other ideas you can show them.
In practice many things are better than nothing, that is than a silent implicit
initialization.

Bye,
bearophile

Apr 27 2010

Don <nospam nospam.com> writes:

bearophile wrote:
 Do you see a problem here?
 
 import std.stdio: writeln;
 int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
 void main() {
     writeln(arr);
 }
 
 This code compiles and runs fine, I think according to the D specs, but it can
hide a bug.

 (Don has split my bug report to avoid fixing the whole problem. I don't think
mine is an enhancement request as Don writes, I think it's a bug in the D
specs.)

I share your opinion that it's a poor design, but it's not a bug. It's 
explicitly supported in the spec.
By contrast, a literal with too many items is clearly a bug (it's an 
array bounds error). Especially when the compiler crashes.

 A third case is when you really want to define a static array using a literal
that has less items than the static array. I think this is an uncommon case,
what are the real use cases of this? But we might want to support it anyway.
Two examples:
 
 int[4] datas = [1, 2];
 int[3][3] mat = [[1,2,3], [4,5,6]];
 
 In D there is another way to write that (using the associative array syntax!
this smells bad):
 int[4] datas = [0:1, 1:2];

I do not know why this syntax is supported, but it's in the spec. Note 
that you can even write:
int [4] datas = [1:2, 3, 0:1];
That syntax explicitly allows members to remain uninitialized.
(BTW, it's much, much older than associative array syntax).

Apr 27 2010

Walter Bright <newshound1 digitalmars.com> writes:

Don wrote:
 bearophile wrote:
 Do you see a problem here?

 import std.stdio: writeln;
 int[24] arr = [10,2,15,15,14,12,3,7,13,5,9,9,7,9,9,9,11,15,1,1,12,5,14];
 void main() {
     writeln(arr);
 }

 This code compiles and runs fine, I think according to the D specs, 
 but it can hide a bug.

 
 (Don has split my bug report to avoid fixing the whole problem. I 
 don't think mine is an enhancement request as Don writes, I think it's 
 a bug in the D specs.)

 
 I share your opinion that it's a poor design, but it's not a bug. It's 
 explicitly supported in the spec.

The idea is that for long arrays, often only the start of it needs 
initialization to anything but 0.

 By contrast, a literal with too many items is clearly a bug (it's an 
 array bounds error). Especially when the compiler crashes.

Yes, and thanks for your patch to fix it!

 A third case is when you really want to define a static array using a 
 literal that has less items than the static array. I think this is an 
 uncommon case, what are the real use cases of this? But we might want 
 to support it anyway. Two examples:

 int[4] datas = [1, 2];
 int[3][3] mat = [[1,2,3], [4,5,6]];

 In D there is another way to write that (using the associative array 
 syntax! this smells bad):
 int[4] datas = [0:1, 1:2];

 
 I do not know why this syntax is supported,

It's so if you only need a few items in a large array initialized. C99 supports 
it as well.

 but it's in the spec. Note 
 that you can even write:
 int [4] datas = [1:2, 3, 0:1];
 That syntax explicitly allows members to remain uninitialized.
 (BTW, it's much, much older than associative array syntax).

Apr 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 The idea is that for long arrays, often only the start of it needs 
 initialization to anything but 0.

I got the idea well, but my idea is that it's bug-prone syntax, because it's
partially implicit (I have put a bug in a program of mine because of that
syntax).
How often do you need that? Is such frequency worth the risk? Can you tell me
some cases? I have not had to use it so far :-)
And the syntax like the arr[$] = [...] can avoid some of those bugs because you
don't need to count items.

And even if you are right, and it's a common need (other people here can
confirm), then I am asking for an explicit syntax to ask for a partial
initialization, to avoid bugs.


C99 supports it as well.<

C is known to be the less safe language among the ones used today (today people


Bye,
bearophile

Apr 27 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 The idea is that for long arrays, often only the start of it needs 
 initialization to anything but 0.

 
 I got the idea well, but my idea is that it's bug-prone syntax, because it's
 partially implicit (I have put a bug in a program of mine because of that
 syntax). How often do you need that? Is such frequency worth the risk? Can
 you tell me some cases?

I have used it, though I use statically initialized arrays at all very rarely. 
An example is the _ctype[] array which leaves the upper 128 entries
unspecified, 
and therefore 0.

 I have not had to use it so far :-) And the syntax
 like the arr[$] = [...] can avoid some of those bugs because you don't need
 to count items.
 
 And even if you are right, and it's a common need (other people here can
 confirm), then I am asking for an explicit syntax to ask for a partial
 initialization, to avoid bugs.

Anytime you statically initialize an array with more than a small number of
values,
it's bug-prone, even if the number of elements match the dimension. There's no
check
against transposing entries, or mis-typing the entries themselves.

Your options are to either write unit tests for the table, or have the table 
generated by another program.

Dmd uses program-generated tables for this reason.

 C99 supports it as well.<

 
 C is known to be the less safe language among the ones used today (today

 etc).

I know, I just wished to point out that it was not a unique language element.

Apr 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 I have used it, though I use statically initialized arrays at all very rarely.
 An example is the _ctype[] array which leaves the upper 128 entries
unspecified,
 and therefore 0.

Both Phobos and druntime seem to contain a copy of the module ctype (o.O), it
contains the array:
immutable ubyte _ctype[128] = [_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,
immutable ubyte _ctype[128] = [...

In both cases the array is defined as 128 items and I have counted it contains
128 items. So I think you are wrong.

But even if you meant something like this:

immutable ubyte _ctype2[256] = [/*128 items here*/];

My opinion is that it's bug-safer something like:

 implicit_filling immutable ubyte _ctype2[256] = [/*128 items here*/];

Othrerwise this will another bit of work to do for future D lints :-)


 Anytime you statically initialize an array with more than a small number of
values,
 it's bug-prone, even if the number of elements match the dimension. There's no
check
 against transposing entries, or mis-typing the entries themselves.

I agree, literals can hide many other kinds of bugs. Bugs caused by the length
correspondence is just one of them and probably it's not even the most common.
But:
- keeping this souce of possible bugs doesn't help;
- having the compiler test for the equality of the two lengths at compile-time
doesn't give the programmer a false sense of security, it's just a natural
extra test the programmer (me) expects the compiler to perform. So I think this
test is not going to increase the bug count in any case.


 Your options are to either write unit tests for the table, or have the table
 generated by another program.

One the main points of unit testing is (just as Literate programming, that was
present in D1, and it is very common in the Haskell world, see for example:
http://www.imperialviolet.org/binary/jpeg/ ) to offer a secont point of view to
see a block of code. This allows to spot bugs much more efficiently (in
literate programming the second point of view is the textual decription).

Tests are code too, so they too can contain bugs. So when possible you often
want your tests to be at a bit higher level of abstraction (to "summarize" the
code), so they can be shorter, and hopefully contain less bugs than the code
they test (in practice often my unit tests are longer than the code they test).

If I have code like this it's legal D2 code:

// ...
enum int N_DIRECTIONS = 4;
// ...
// ...
enum string[N_DIRECTIONS] cardinal_directions = ["north", "south", "east"];
// ...


Then you suggest to write a unit test for the table to spot its bug, but what
can I put inside this unittest?

This is not so useful, because it doesn't catch the bug:

unittest { // tests of global data
  static assert(cardinal_directions.length == 4);
}


One thing I can do is to test each item, for example:

unittest { // tests of global data
  static assert(cardinal_directions == ["north", "south", "east", "west"]);
}

Or:

unittest { // tests of global data
  static assert(cardinal_directions[0] == "north");
  static assert(cardinal_directions[1] == "south");
  static assert(cardinal_directions[2] == "east");
  static assert(cardinal_directions[3] == "west");
}


Such unit tests catch the bug, but in both cases they just repeat the contents
of the data, so they aren't at a higher level compared to the code they test.
So they are not so useful.

I can't help but think that the right solution for this situation is at at a
compiler level, not at unit test level.

You have not commented about the other syntax I (and others) have suggested
(that has a different purpose, it's not meant to replace the item count test):
arr[$] = [/*...*/];
So I guess you are not so interested in this too.

Bye,
bearophile

Apr 27 2010

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 Walter Bright:
 
 I have used it, though I use statically initialized arrays at all very
 rarely. An example is the _ctype[] array which leaves the upper 128 entries
 unspecified, and therefore 0.

 
 Both Phobos and druntime seem to contain a copy of the module ctype (o.O), it
 contains the array: immutable ubyte _ctype[128] =
 [_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL,_CTL, immutable ubyte _ctype[128] = [...
 
 In both cases the array is defined as 128 items and I have counted it
 contains 128 items. So I think you are wrong.

No, it also exists in the C runtime.


 You have not commented about the other syntax I (and others) have suggested
 (that has a different purpose, it's not meant to replace the item count
 test): arr[$] = [/*...*/]; So I guess you are not so interested in this too.

D is full of syntax, at some point adding more and more syntax to deal with
more 
and more obscure cases is not a net improvement. There's a point of diminishing 
returns.

Apr 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 No, it also exists in the C runtime.

OK, the same array is specified three times.


 D is full of syntax, at some point adding more and more syntax to deal with
more 
 and more obscure cases is not a net improvement. There's a point of
diminishing 
 returns.

OK. I have updated the bug report:
http://d.puremagic.com/issues/show_bug.cgi?id=3849

Bye and thank you for your answers,
bearophile

Apr 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:
 D is full of syntax, at some point adding more and more syntax to deal with
more
 and more obscure cases is not a net improvement. There's a point of diminishing
 returns.

On this I can add another general note about language design :-)

Lisp/Scheme programmers think that minimizing the syntax allows an usable form
of macros, and they seem to even think minimizing the syntax is good in general
too. A language like Dylan has Algol-like syntax and Lisp-like syntax and
macros, but its macros can be harder to write.

So I agree with them that a S-expression-based language allows for simpler to
use macros, but I don't agree with them that removing most syntax helps normal
programming too. The recent very Lisp-like Clojure language adds a little
amount of special syntax for some collections, etc.

From the languages I have learnt I have seen that learning the syntax takes
just a certain amount of time in the beginning. For me programming in languages

have had to learn. For me it's less easy to keep in memory special cases,
special cases of special cases, a long number of language traps, possible
run-time bugs that the compiler&runtime are not able to catch for me, etc.
Python has ten times the syntax of Scheme, but for me learning Python didn't
take much more than than learning to write programs in Scheme.

That's why more keywords is not a disaster. What's bad is when the same keyword
is used for subtly different purposes in similar contexts :-) This can cause
bugs and troubles.

So I think language designers have to minimize not just the total amount of
syntax but the amount of traps, special cases, special cases of special cases,
implicit behaviours, bug-prone syntaxes, unnatural syntaxes, or names that are
subtly different in both their letters and semantics (like for example the
chomp() and chop() functions of std.string).

When I have a clean syntax, easy to read and explicit, that has a single
purpose and it offers no common traps, then I can learn to use it and use it
reliably even if it's not so common in programs and even if there's lot of
other syntax in the language.

There is of course a limit in the amount of syntaxes that we can accept. In
this case of static arrays I agree that I can live without the [$] syntax, even
if it's not hard to understand what its purpose is, but I can accept this
missing feature much better if the language enforces the length of the literal
to be the same as the the specified length in the type :-)

Bye,
bearophile

Apr 27 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-04-27 11:50:55 -0400, bearophile <bearophileHUGS lycos.com> said:

 My opinion is that it's bug-safer something like:
 
  implicit_filling immutable ubyte _ctype2[256] = [/*128 items here*/];

Ouch! I agree with the idea, but can't you find a better syntax? What 
about this:

	immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s.
	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Michel Fortin:
 Ouch! I agree with the idea, but can't you find a better syntax? What 
 about this:
 
 	immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s.
 	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values

I think this is better (note the last comma) to avoid confusion with FP values:
immutable float a[10] = [1., 2., 3., 4., ...];

Nice, and better than mine. Basically here I can accept almost any syntax that
is explicit :-)
Thank you :-)

Bye,
bearophile

Apr 27 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-04-27 13:55:04 -0400, bearophile <bearophileHUGS lycos.com> said:

 Michel Fortin:
 Ouch! I agree with the idea, but can't you find a better syntax? What
 about this:
 
 	immutable ubyte a[256] = [1,2,3,4...]; // rest of array is padded with 4s.
 	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values

 
 I think this is better (note the last comma) to avoid confusion with FP values:
 immutable float a[10] = [1., 2., 3., 4., ...];

I see your point. What I suggested is that "4..." would pad the 
remaining part with 4s. If you add a comma, it looks like a separate 
value, and I'd understand the padding to be done with zero for int, nan 
for floats, in other words the default value for the type.

Another idea would be to use the AA syntax (which you can use in a 
regular array too) to specify a default value for slots where no value 
has been defined:

	immutable ubyte a[256] = [1,2,3, default: 4];
	immutable ubyte b[256] = [1,2,3,4]; // error: not enough values and no 
default value
	immutable ubyte c[256] = [40:1, 41:2, 42:3, default:4];


 Nice, and better than mine. Basically here I can accept almost any 
 syntax that is explicit :-)
 Thank you :-)

You're welcome. Let's hope it helps your case.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 27 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Changeset 442, implicit Vs explicit