www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D shouldn't be different from Java without good reason

reply James McComb <ned jamesmccomb.id.au> writes:
Hi. I wonder if there are any other developers out there who, like me, 
were initially very excited about D, but are now starting to worry that 
the language is maybe starting to go off the rails a bit...

Since D is advertised as a simpler, garbage-collected C++, it's natural 

feature, many developers (such as me) assume that the feature is going 
to be similar to the same feature in Java, unless Walter has some 
compelling reason to do things differently. Seen from this perspective, 
some features in D stick out like a sore thumb:

FEATURES I WOULD HAVE EXPECTED TO BE LIKE JAVA
(Is there really a compelling reason to it differently?)

1. A type-safe boolean. Java has one, D doesn't. I know this has been 
discussed to death, but people coming from a Java background want to 
know: What is the compelling reason to not have a type-safe boolean?

2. Built-in string types. Java has one, D has THREE. This feels wrong, 
because it feels like a failure of encapsulation (the encoding of the 
string should be hidden from the programmer) and it doesn't scale (there 
is no utf-7 type). To put it another way... What is the compelling 
problem to which having three built-in string types is the solution?

I don't think that these differences from Java are necessary or beneficial.

James McComb
Nov 14 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
sigh.

oh well. here we go again...

 1. A type-safe boolean. Java has one, D doesn't. I know this has been
 discussed to death, but people coming from a Java background want to
 know: What is the compelling reason to not have a type-safe boolean?
Is type-safety really that important? eh. I don't notice. I can't recall making a type-safety bug when writing my D code like MinTL or the gmp wrappers or anything else. Maybe I've been lucky. I just type "bool" and get on with my coding.
 2. Built-in string types. Java has one, D has THREE. This feels wrong,
 because it feels like a failure of encapsulation (the encoding of the
 string should be hidden from the programmer) and it doesn't scale (there
 is no utf-7 type). To put it another way... What is the compelling
 problem to which having three built-in string types is the solution?
Java has StringBuffers and char[], too. So it has three types just to do what D does with one - in this case wchar[]. Actually in JDK 5 they introduced a new StringBuilder (a single-threaded version of StringBuffer). So now Java has four types for wchar[]. Given all the platform differences in defining char and wchar I think D's choices make a nice balance between platform dependence and platform independence. They are simple and fast - just right for good string handling. -Ben
Nov 14 2004
parent reply "Glen Perkins" <please.dont email.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message 
news:cn98ni$n3l$1 digitaldaemon.com...
 sigh.

 oh well. here we go again...
I haven't seen the reasoning behind either decision presented in the FAQ. If these are asked frequently enough to make you start your answers this way, wouldn't that be the fault of the FAQ, not the questioner? Clearly the rightness of these design decisions is not obvious, unlike so many other aspects of D's design, so most developers investigating D will at least wonder about them even if they don't ask.
 1. A type-safe boolean. Java has one, D doesn't. I know this has 
 been
 discussed to death, but people coming from a Java background want 
 to
 know: What is the compelling reason to not have a type-safe 
 boolean?
Is type-safety really that important? eh. I don't notice.
Well, hold on a minute. I have no position on this issue since I know so little about it, but I keep seeing "type safety" listed as a reason for why some feature in D is superior to its equivalent in C. If type safety isn't really important, then maybe D isn't, either. If it is important enough to cite repeatedly as an advantage, then what's the story with booleans? Again, I'm not arguing for or against this design decision. I haven't even looked at it. I have wondered about it, though, and if the justification is "type safety doesn't matter", then I have to wonder even more.
 2. Built-in string types. Java has one, D has THREE. This feels 
 wrong,
 because it feels like a failure of encapsulation (the encoding of 
 the
 string should be hidden from the programmer) and it doesn't scale 
 (there
 is no utf-7 type). To put it another way... What is the compelling
 problem to which having three built-in string types is the 
 solution?
Java has StringBuffers and char[], too. So it has three types just to do what D does with one - in this case wchar[]. Actually in JDK 5 they introduced a new StringBuilder (a single-threaded version of StringBuffer). So now Java has four types for wchar[].
This isn't quite right. The point is that Java has a single *default* string type, and that's a huge advantage. The number of non-defaults used for special cases is almost irrelevant. The default String in Java is used for almost everything and is almost always fast enough, which I define very practically. If the performance of String could improve dramatically without improving the app itself, then String is "fast enough". Anything more has no benefit, so if it has any cost, it is a net negative. I said "almost always fast enough" because by my definition there are times when it is not: at many bottlenecks. In those cases, and only in those cases, I wish there were a lot more options. Java's performance problem is caused by severe constraints on optimization options that come from the unfortunate requirement that it be able to run identically and safely--as object code--anywhere. Thank goodness, D hasn't chosen to burden itself with such constraints. That means that D could go ahead and have a good default that was powerful, easy, and consistent, supplemented by a great toolbox of optimizations that would be used only at bottlenecks. After all, performance is only relevant at bottlenecks, while simplicity is relevant everywhere. Unfortunately, having no default string type means that integrating code from multiple designers (as in using libraries, for example) will almost always result in the needless complexity of multiple text formats in non-bottleneck code. It won't do me much good to declare a personal default, either, if I use libraries written by others. I'm not sure what would make the best default string type for D, but I think that not having one will have costs. Whether those costs will turn out to be significant is hard to say for sure. I suspect that D people will do what C people have done for years: overweight the things that are easy to measure (time to execute a toy for loop a million times) and underweight more important things that are harder to measure (time wasted debating which string type to use in non-bottleneck code, bugs introduced when trying to change string type used in module B source code to match module A, etc.) If so, the costs of this approach could end up being serious without it being recognized, even after the fact. Well, we'll see. Any way you look at it, it's a huge improvement over both C++ and Java in so many ways....
Nov 15 2004
parent Ben Hinkle <bhinkle4 juno.com> writes:
Glen Perkins wrote:

 
 "Ben Hinkle" <bhinkle4 juno.com> wrote in message
 news:cn98ni$n3l$1 digitaldaemon.com...
 sigh.

 oh well. here we go again...
I haven't seen the reasoning behind either decision presented in the FAQ. If these are asked frequently enough to make you start your answers this way, wouldn't that be the fault of the FAQ, not the questioner? Clearly the rightness of these design decisions is not obvious, unlike so many other aspects of D's design, so most developers investigating D will at least wonder about them even if they don't ask.
I haven't looked at the FAQ in a while. I'll check it out and add some rants there, too ;-)
 1. A type-safe boolean. Java has one, D doesn't. I know this has
 been
 discussed to death, but people coming from a Java background want
 to
 know: What is the compelling reason to not have a type-safe
 boolean?
Is type-safety really that important? eh. I don't notice.
Well, hold on a minute. I have no position on this issue since I know so little about it, but I keep seeing "type safety" listed as a reason for why some feature in D is superior to its equivalent in C. If type safety isn't really important, then maybe D isn't, either. If it is important enough to cite repeatedly as an advantage, then what's the story with booleans? Again, I'm not arguing for or against this design decision. I haven't even looked at it. I have wondered about it, though, and if the justification is "type safety doesn't matter", then I have to wonder even more.
Removing all implicit conversions would be a pain. Imagine if one had an int "x" and short "y" and you want to add them. Without implicit conversions one would have to write x+cast(int)y. So D must have some implicit conversions and the question is which ones. Java chose not to have implicit conversions between numeric types and bool. C++ and D chose to allow it. For D it makes more sense since "bool" is "bit" and naturally one wants to be able to implicitly convert between bits, bytes and ints, etc. To me it's not a big deal.
 2. Built-in string types. Java has one, D has THREE. This feels
 wrong,
 because it feels like a failure of encapsulation (the encoding of
 the
 string should be hidden from the programmer) and it doesn't scale
 (there
 is no utf-7 type). To put it another way... What is the compelling
 problem to which having three built-in string types is the
 solution?
Java has StringBuffers and char[], too. So it has three types just to do what D does with one - in this case wchar[]. Actually in JDK 5 they introduced a new StringBuilder (a single-threaded version of StringBuffer). So now Java has four types for wchar[].
This isn't quite right. The point is that Java has a single *default* string type, and that's a huge advantage. The number of non-defaults used for special cases is almost irrelevant. The default String in Java is used for almost everything and is almost always fast enough, which I define very practically. If the performance of String could improve dramatically without improving the app itself, then String is "fast enough". Anything more has no benefit, so if it has any cost, it is a net negative. I said "almost always fast enough" because by my definition there are times when it is not: at many bottlenecks. In those cases, and only in those cases, I wish there were a lot more options. Java's performance problem is caused by severe constraints on optimization options that come from the unfortunate requirement that it be able to run identically and safely--as object code--anywhere. Thank goodness, D hasn't chosen to burden itself with such constraints. That means that D could go ahead and have a good default that was powerful, easy, and consistent, supplemented by a great toolbox of optimizations that would be used only at bottlenecks. After all, performance is only relevant at bottlenecks, while simplicity is relevant everywhere. Unfortunately, having no default string type means that integrating code from multiple designers (as in using libraries, for example) will almost always result in the needless complexity of multiple text formats in non-bottleneck code. It won't do me much good to declare a personal default, either, if I use libraries written by others. I'm not sure what would make the best default string type for D, but I think that not having one will have costs. Whether those costs will turn out to be significant is hard to say for sure. I suspect that D people will do what C people have done for years: overweight the things that are easy to measure (time to execute a toy for loop a million times) and underweight more important things that are harder to measure (time wasted debating which string type to use in non-bottleneck code, bugs introduced when trying to change string type used in module B source code to match module A, etc.) If so, the costs of this approach could end up being serious without it being recognized, even after the fact. Well, we'll see. Any way you look at it, it's a huge improvement over both C++ and Java in so many ways....
It's clear char[] is very common in phobos and in the documentation. It is, in practice, the default string type. I would expect any library that deals with strings to take at least char[] or have some story about how to use it with char[]. If the library has some parts where the conversion to char[] would be a large performance hit then it should also support wchar[] and maybe dchar[]. The ICU library internally uses wchar[] so if one is writing an application that deals heavily with the ICU library I would use wchar[] in my app instead of char[]. Otherwise I would use char[]. It all depends on the situation. The dchar[] type is really just for Linux C calls and people who want to have character indexing be the same as code-unit indexing so I wouldn't seriously consider it in the same league as char[] and wchar[].
Nov 15 2004
prev sibling next sibling parent Dave <Dave_member pathlink.com> writes:
In article <cn93su$h4o$1 digitaldaemon.com>, James McComb says...
Hi. I wonder if there are any other developers out there who, like me, 
were initially very excited about D, but are now starting to worry that 
the language is maybe starting to go off the rails a bit...

Since D is advertised as a simpler, garbage-collected C++, it's natural 

feature, many developers (such as me) assume that the feature is going 
to be similar to the same feature in Java, unless Walter has some 
compelling reason to do things differently. Seen from this perspective, 
some features in D stick out like a sore thumb:

FEATURES I WOULD HAVE EXPECTED TO BE LIKE JAVA
(Is there really a compelling reason to it differently?)

1. A type-safe boolean. Java has one, D doesn't. I know this has been 
discussed to death, but people coming from a Java background want to 
know: What is the compelling reason to not have a type-safe boolean?

2. Built-in string types. Java has one, D has THREE. This feels wrong, 
because it feels like a failure of encapsulation (the encoding of the 
string should be hidden from the programmer) and it doesn't scale (there 
is no utf-7 type). To put it another way... What is the compelling 
problem to which having three built-in string types is the solution?
There is alot more on this topic in the archives, but to sum up: One of the primary complaints for Java is performance, especially for memory management, and one the reasons for that is the 'one character size fits all' approach to strings. Depending on the encoding, it may be more efficient to use any one of the three string types D offers (after all, the conversions can be handled by library functionality), which is why all three are supported by the core language. Since D is intended as an all-around systems-type language, this (efficiency & flexibility) makes perfect sense, IMHO. One of the problems with C/C++ is that the strings are not built-in. One of the problems with Java is that the majority of programs where UTF16 is not optimal end up suffering for it. D tries to rectify both of these problems by offering all three built-in types. - Dave
I don't think that these differences from Java are necessary or beneficial.

James McComb
Nov 14 2004
prev sibling next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
James McComb wrote:

 FEATURES I WOULD HAVE EXPECTED TO BE LIKE JAVA
 (Is there really a compelling reason to it differently?)
 
 1. A type-safe boolean. Java has one, D doesn't. I know this has been 
 discussed to death, but people coming from a Java background want to 
 know: What is the compelling reason to not have a type-safe boolean?
D doesn't have boolean *conditionals*, either. They're arithmetic. So things like "if (2)" and "if (pointer)" are perfectly legal... (So far I haven't heard any better reasons than that it makes D code more similar to C, and thus easier to adopt for old-timers?) In light of that, it has the same kind of booleans that C/C++ has. i.e. "zero is false, non-zero is true". And "true + true == 2". The default boolean type in D is "bit" (which has an alias of "bool")
 2. Built-in string types. Java has one, D has THREE. This feels wrong, 
 because it feels like a failure of encapsulation (the encoding of the 
 string should be hidden from the programmer) and it doesn't scale (there 
 is no utf-7 type). To put it another way... What is the compelling 
 problem to which having three built-in string types is the solution?
As has been pointed out by others, Java has *more* than just String. Performance-conscious people have been using e.g. byte[] for ASCII ? And with the new surrogates, functions now take both "int" and "char": see http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html So the exact same things occur in Java too, as in all Unicode handling. Another main difference is that String is a class, while D has a type ? The default string type in D is "char[]" (I suggested alias: "string")
 I don't think that these differences from Java are necessary or beneficial.
But it's still on a higher plane, than old C and C++ :-) The main difference between D and Java is now that java uses objects for things like strings and arrays, while D doesn't ? D is a lot more concerned about performance and implementation, and therefore leaves a lot of such things for the end programmer. Walter has chosen to position D inbetween these two "sides". I was puzzled by this too, but it's done that way by choice. And it could be what I *like* about D. It's half-C and half-Java. (and mainly because I think it's more elegant than what C++ is...) Maybe we just need some new FAQ entries: Q: What's the default boolean type in D ? A: bit. (bool is an "alias") Q: Is that really type-safe ? A: No. (just as in C99/C++) Q: What's the default string type in D ? A: char[]. (since main() uses it) Q: Is that a single class ? A: No. (it's a primitive type) Q: Was this done by accident or by choice ? A: choice. (by Walter Bright) Q: Will this change before D version 1.0 ? A: No. (at least unlikely) And we could all get along with our lives... --anders PS. See also http://www.prowiki.org/wiki4d/wiki.cgi?FeatureRequestList
Nov 15 2004
prev sibling parent reply Bastiaan Veelo <Bastiaan.N.Veelo ntnu.no> writes:
James McComb wrote:
 1. A type-safe boolean. Java has one, D doesn't. I know this has been 
 discussed to death, but people coming from a Java background want to 
 know: What is the compelling reason to not have a type-safe boolean?
Maybe this page carries an answer? http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit regards, Bastiaan.
Nov 15 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Bastiaan Veelo wrote:

 James McComb wrote:
 1. A type-safe boolean. Java has one, D doesn't. I know this has been
 discussed to death, but people coming from a Java background want to
 know: What is the compelling reason to not have a type-safe boolean?
Maybe this page carries an answer? http://www.prowiki.org/wiki4d/wiki.cgi?BooleanNotEquBit regards, Bastiaan.
A few of the posts listed on that page are about bool/bit not begin addressable. That got me thinking about adding (either builtin or through aliases or typedefs or something) a separate type for addressable bools/bits called ... drum roll please... wbit and wbool (naturally wbool is an alias for wbit). It would have the size of a byte (hence the "w") and otherwise behave like bit (which pretty much makes it behave like C++'s bool). Making bit addressable would mess up the rule that pointers all have the same size and are convertable to void* and back. -Ben ps - I can just picture Elmer Fudd singing "kill the wbit, kill the wbit"
Nov 15 2004
next sibling parent "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:cnaqh1$65e$1 digitaldaemon.com...
 ps - I can just picture Elmer Fudd singing "kill the wbit, kill the wbit"
Thanks for the chuckle <g>.
Nov 15 2004
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 A few of the posts listed on that page are about bool/bit not begin
 addressable. That got me thinking about adding (either builtin or through
 aliases or typedefs or something) a separate type for addressable
 bools/bits called ... drum roll please... wbit and wbool (naturally wbool
 is an alias for wbit). It would have the size of a byte (hence the "w") and
 otherwise behave like bit (which pretty much makes it behave like C++'s
 bool). Making bit addressable would mess up the rule that pointers all have
 the same size and are convertable to void* and back.
You can take the address of bit variables now, and they should also work as "out" parameters. You can't take the address of bits within arrays. (or actually you can, but the pointer doesn't work) A single bit field/var occupies a byte in memory, and a bit[] field occupies (length+31)/32 bits. So "bit a; bit b;" is 2 bytes, "bit c[2];" is 4. (the multiple of four is for access-speed reasons) --anders PS. In the Mac OS X C++ compiler (g++) a "bool" is 4 bytes. A "_Bool", as used in C99, also occupies a full four. (that is, both have the same size as an "int" does...) On Linux, they seems to have a usual sizeof() 1 byte ?
Nov 15 2004
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:cnavkn$e0i$1 digitaldaemon.com...
 Ben Hinkle wrote:

 A few of the posts listed on that page are about bool/bit not begin
 addressable. That got me thinking about adding (either builtin or
through
 aliases or typedefs or something) a separate type for addressable
 bools/bits called ... drum roll please... wbit and wbool (naturally
wbool
 is an alias for wbit). It would have the size of a byte (hence the "w")
and
 otherwise behave like bit (which pretty much makes it behave like C++'s
 bool). Making bit addressable would mess up the rule that pointers all
have
 the same size and are convertable to void* and back.
You can take the address of bit variables now, and they should also work as "out" parameters. You can't take the address of bits within arrays. (or actually you can, but the pointer doesn't work) A single bit field/var occupies a byte in memory, and a bit[] field occupies (length+31)/32 bits. So "bit a; bit b;" is 2 bytes, "bit c[2];" is 4. (the multiple of four is for access-speed reasons)
I was being too vague. You are right that bits by themselves are addressable and can be used as out parameters. It's only when they get packed that life gets interesting - for better or worse.
 --anders

 PS.
 In the Mac OS X C++ compiler (g++) a "bool" is 4 bytes.
 A "_Bool", as used in C99, also occupies a full four.
 (that is, both have the same size as an "int" does...)
 On Linux, they seems to have a usual sizeof() 1 byte ?
Interesting. I hadn't really thought that bools can have different sizes but I suppose there isn't anything stopping it. Maybe int is better than byte. Dare I suggest "dbit" and "dbool" for int sized bits and bools? :-)
Nov 15 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 Interesting. I hadn't really thought that bools can have different sizes but
 I suppose there isn't anything stopping it. Maybe int is better than byte.
 Dare I suggest "dbit" and "dbool" for int sized bits and bools? :-)
Enough! Enough! (my poor stomach) :-D Hat's off for that most excellent suggestion! Henceforth, byte/char shall be known as a "wbit" when used as a bool and int/long shall similarly be known as a "dbit" when used as a bool. Thus, one can choose between bit, wbit and dbit for storing booleans. This makes it consistent with the other missing type, namely strings. Oh, the humanity --anders
Nov 15 2004
parent "Ben Hinkle" <bhinkle mathworks.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:cnb2jm$ic4$1 digitaldaemon.com...
 Ben Hinkle wrote:

 Interesting. I hadn't really thought that bools can have different sizes
but
 I suppose there isn't anything stopping it. Maybe int is better than
byte.
 Dare I suggest "dbit" and "dbool" for int sized bits and bools? :-)
Enough! Enough! (my poor stomach) :-D Hat's off for that most excellent suggestion! Henceforth, byte/char shall be known as a "wbit" when used as a bool and int/long shall similarly be known as a "dbit" when used as a bool. Thus, one can choose between bit, wbit and dbit for storing booleans. This makes it consistent with the other missing type, namely strings. Oh, the humanity --anders
Actually I was being semi-serious! It is kindof overkill but I think explicit types with different behaviors are preferable to hacking up pointers.
Nov 15 2004