digitalmars.D - Which D features to emphasize for academic review article
- TJB (11/11) Aug 09 2012 Hello D Users,
- dsimcha (30/30) Aug 09 2012 Ok, so IIUC the audience is academic BUT is people interested in
- Walter Bright (18/19) Aug 09 2012 I'd like to add to that:
- F i L (22/23) Aug 10 2012 This isn't a good feature, IMO. C# handles this much more
- Walter Bright (33/53) Aug 10 2012 It catches only a subset of these at compile time. I can craft any numbe...
- F i L (30/68) Aug 10 2012 Yes, but that's not really an issue since the compiler informs
- Walter Bright (12/30) Aug 10 2012 That is a good solution, but in my experience programmers just throw in ...
- Walter Bright (4/12) Aug 10 2012 Let me amend that. I've never seen anyone use float.nan, or whatever NaN...
- F i L (4/9) Aug 10 2012 Yes, if 'int' had a NaN state it would be great. (Though I
- F i L (22/41) Aug 11 2012 I heard somewhere before there's actually an (Intel?) CPU which
- Andrei Alexandrescu (18/20) Aug 11 2012 Actually there's something that just happened two days ago to me that's
- F i L (25/30) Aug 11 2012 My argument was never against the usefulness of NaN for
- Walter Bright (5/13) Aug 11 2012 I'd rather have a 100 easy to find bugs than 1 unnoticed one that went o...
- F i L (31/33) Aug 11 2012 That's just the thing, bugs are arguably easier to hunt down when
- Walter Bright (11/20) Aug 11 2012 Many, many programming bugs trace back to assumptions that floating poin...
- F i L (20/36) Aug 11 2012 My point was that the majority of the time there wasn't a bug
- dennis luehring (6/11) Aug 11 2012 is makes absolutely no sense to have different initialization stylel in
- Don Clugston (8/20) Aug 13 2012 Exactly. I have come to believe that there are very few algorithms
- Joseph Rushton Wakeling (14/18) Aug 13 2012 ////////
- Walter Bright (3/6) Aug 13 2012 That's called "rounding". But rounding always implies some, small, error...
- Joseph Rushton Wakeling (10/12) Aug 13 2012 Well, yes. I was just remarking on the choice of rounding and the motiv...
- bearophile (4/7) Aug 13 2012 And JavaScript programs that use integers?
- TJB (8/16) Aug 13 2012 Don,
- Don Clugston (8/26) Aug 14 2012 I found that when converting code for Special Functions from C to D, the...
- Era Scarecrow (23/31) Aug 11 2012 The compiler could always have flags specifying if variables
- Walter Bright (3/12) Aug 11 2012 Not so easy. Suppose you pass a pointer to the variable to another funct...
- Era Scarecrow (15/24) Aug 11 2012 I suppose there could be a second hidden pointer/bool as part of
- F i L (29/45) Aug 10 2012 I just want to clarify something here. In C#, only class/struct
- Walter Bright (5/13) Aug 10 2012 However, and I've seen this happen, people will satisfy the compiler com...
- Mehrdad (43/59) Aug 14 2012 Note to Walter:
- Michal Minich (10/20) Aug 14 2012 Completely agree. I find it quite useful in C#. It helps a lot in
- Don Clugston (13/67) Aug 14 2012 DMD detects uninitialized variables if you compile with -O. It's hard to...
- F i L (22/29) Aug 14 2012 I think some here are mis-interpreting Walters position
- Simen Kjaeraas (10/17) Aug 14 2012 Really? We can catch (or, should be able to) missing initialization
- F i L (3/22) Aug 14 2012 You know, I never actually thought about it much, but I think
- Era Scarecrow (20/32) Aug 14 2012 Mmmm... What if you added a command that has a file/local scope?
- Mehrdad (5/34) Aug 14 2012 C# structs, as you might recall, enforce definite initialization.
- Mehrdad (6/25) Aug 14 2012 Ah, well if he's for it, then I misunderstood. I read through the
- Walter Bright (11/19) Aug 14 2012 As I've explained before, user defined types have "default constructors"...
- Mehrdad (10/33) Aug 14 2012 Just because they _have_ a default constructor doesn't mean the
- Mehrdad (2/3) Aug 14 2012 Typo, scratch Java, it's N/A for Java.
- Walter Bright (2/4) Aug 14 2012 I know. How does that fit in with default construction?
- Mehrdad (16/20) Aug 14 2012 They aren't called unless the user calls them.
- Mehrdad (2/4) Aug 14 2012 Er, other way around I mean...
-
Walter Bright
(4/7)
Aug 14 2012
I guess they aren't really default constructors, then
. - Mehrdad (46/51) Aug 14 2012 For arrays, they're called automatically.
- Jakob Ovrum (47/77) Aug 11 2012 The compiler in languages like C# doesn't try to prove that the
- Walter Bright (16/36) Aug 11 2012 Of course it is doing what the language requires, but it is an incorrect...
- Paulo Pinto (14/21) Aug 11 2012 I have to agree here.
- Jakob Ovrum (38/57) Aug 11 2012 It is not meaningless, it's declarative. The same resulting code
- Walter Bright (19/37) Aug 12 2012 No, it is not easier to understand, because there's no way to determine ...
- simendsjo (4/15) Aug 12 2012 I have thought that many times. The same with default non-null class
- dennis luehring (3/11) Aug 12 2012 its never to late - put it back on the list for D 3 - please
- Era Scarecrow (14/21) Aug 12 2012 Agreed. If it is only a signature change then it might have been
- Jakob Ovrum (31/52) Aug 12 2012 If there is an explicit initializer, it means that the intent is
- Adam Wilson (9/58) Aug 12 2012 As a pilot, I completely agree!
- Chad J (29/80) Aug 11 2012 To address the concern of static analysis being too hard: I wish we
- Era Scarecrow (23/60) Aug 11 2012 Let's keep in mind everyone of these truths:
- bearophile (31/35) Aug 11 2012 An alternative possibility is to:
- Walter Bright (4/6) Aug 11 2012 This has been suggested repeatedly, but it is in utter conflict with the...
- Andrei Alexandrescu (32/32) Aug 11 2012 On 8/11/12 7:33 PM, Walter Bright wrote:
- bearophile (13/17) Aug 11 2012 Statistician often use the R language
- dsimcha (22/40) Aug 12 2012 For people with more advanced CS/programming knowledge, though,
- TJB (15/29) Aug 12 2012 This is exactly how I feel, and why I am turning to D. My data
- Joseph Rushton Wakeling (8/14) Aug 12 2012 The main use-case and advantage of both R and MATLAB/Octave seems to me ...
- dsimcha (13/21) Aug 12 2012 I've addressed that, too :).
- TJB (37/72) Aug 11 2012 Andrei,
- Andrei Alexandrescu (12/39) Aug 12 2012 I think this is a great angle. In our lab when I was a grad student in
- bearophile (5/7) Aug 12 2012 In Matlab there is COW:
- F i L (19/24) Aug 12 2012 I'd like to add to this. Right now I'm reworking some libraries
- Walter Bright (7/12) Aug 13 2012 There's a fair amount of low hanging optimization fruit that D makes pos...
- TJB (7/14) Aug 10 2012 How unique to D is this feature? Does this imply that things
- Walter Bright (10/23) Aug 10 2012 I attended a talk given by a physicist a few months ago where he was usi...
- Jonathan M Davis (9/12) Aug 10 2012 I think that it's pretty typical for programmers to think that something...
- TJB (9/43) Aug 10 2012 Hopefully this will help make the case that D is the best choice
- Justin Whear (7/22) Aug 09 2012 Lazy ranges are a lifesaver when dealing with big data. E.g. read a
- Paulo Pinto (2/34) Aug 09 2012 Ah, the beauty of functional programming and streams.
- Minas Mina (8/8) Aug 10 2012 1) I think compile-time function execution is a very big plus for
Hello D Users, The Software Editor for the Journal of Applied Econometrics has agreed to let me write a review of the D programming language for econometricians (econometrics is where economic theory and statistical analysis meet). I will have only about 6 pages. I have an idea of what I am going to write about, but I thought I would ask here what features are most relevant (in your minds) to numerical programmers writing codes for statistical inference. I look forward to your suggestions. Thanks, TJB
Aug 09 2012
Ok, so IIUC the audience is academic BUT is people interested in using D as a means to an end, not computer scientists? I use D for bioinformatics, which IIUC has similar requirements to econometrics. From my point of view: I'd emphasize the following: Native efficiency. (Important for large datasets and monte carlo simulations) Garbage collection. (Important because it makes it much easier to write non-trivial data structures that don't leak memory, and statistical analyses are a lot easier if the data is structured well.) Ranges/std.range/builtin arrays and associative arrays. (Again, these make data handling a pleasure.) Templates. (Makes it easier to write algorithms that aren't overly specialized to the data structure they operate on. This can also be done with OO containers but requires more boilerplate and compromises on efficiency.) Disclaimer: These last two are things I'm the primary designer and implementer of. I intentionally put them last so it doesn't look like a shameless plug. std.parallelism (Important because you can easily parallelize your simulation, etc.) dstats (https://github.com/dsimcha/dstats Important because a lot of statistical analysis code is already implemented for you. It's admittedly very basic compared to e.g. R or Matlab, but it's also in many cases better integrated and more efficient. I'd say that it has the 15% of the functionality that covers ~70% of use cases. I welcome contributors to add more stuff to it. I imagine economists would be interested in time series, which is currently a big area of missing functionality.)
Aug 09 2012
On 8/9/2012 10:40 AM, dsimcha wrote:I'd emphasize the following:I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error. 2. Support for SIMD vectors as native types. 3. Floating point values are default initialized to NaN. 4. Correct support for NaN and infinity values. 5. Correct support for unordered operations. 6. Array types do not degenerate into pointer types whenever passed to a function. In other words, array types know their dimension. 7. Array loop operations, i.e.: for (size_t i = 0; i < a.length; i++) a[i] = b[i] + c; can be written as: a[] = b[] + c; 8. Global data is thread local by default, lessening the risk of unintentional unsynchronized sharing between threads.
Aug 09 2012
Walter Bright wrote:3. Floating point values are default initialized to NaN.conveniently with just as much optimization/debugging benefit (arguably more so, because it catches NaN issues at class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable, so where in D you need to explicitly set variables to 'void' to avoid mistakes before runtime. Sorry, I'm not trying to derail this thread. I just think D's has other, much better advertising points that this one.
Aug 10 2012
On 8/10/2012 1:38 AM, F i L wrote:Walter Bright wrote:It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.3. Floating point values are default initialized to NaN.just as much optimization/debugging benefit (arguably more so, because it catches NaN class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable,so where in D you need to explicitly set variables to 'void' to avoid assignment costs,This is incorrect, as the optimizer is perfectly capable of removing dead assignments like: f = nan; f = 0.0f; The first assignment is optimized away.I just think D's has other, much better advertising points that this one.Whether you agree with it being a good feature or not, it is a feature unique to D and merits discussion when talking about D's suitability for numerical programming.
Aug 10 2012
Walter Bright wrote:It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++;Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.I had a debate on here a few months ago about the merits of default-to-NaN and others brought up similar situations. but since we can write: float z = float.nan; ... explicitly, then this could be thought of as a debugging feature available to the programmer. The problem I've always had with defaulting to NaN is that it's inconsistent with integer types, and while there may be merit to the idea of defaulting all types to NaN/Null, it's simply unavailable for half of the number spectrum. I can only speak for myself, but I much prefer consistency over anything else because it means there's less discrepancies I need to remember when hacking things together. It also steepens the learning curve. More importantly, what we have now is code where bugs-- like the one you mentioned above --are still possible with Ints, but also easy to miss since "the other number type" behaves differently and programmers may accidentally assume a NaN will propagate where it will not.This is incorrect, as the optimizer is perfectly capable of removing dead assignments like: f = nan; f = 0.0f; The first assignment is optimized away.I thought there was some optimization by avoiding assignment, but IDK enough about memory at that level. Now I'm confused as to the point of 'float x = void' type annotations. :-\Whether you agree with it being a good feature or not, it is a feature unique to D and merits discussion when talking about D's suitability for numerical programming.True, and I misspoke by saying it wasn't a "selling point". I only meant to raise issue with a feature that has been more of an annoyance rather than a boon to me personally. That said, I also agree that this thread was the wrong place to raise issue with it.
Aug 10 2012
On 8/10/2012 9:01 PM, F i L wrote:I had a debate on here a few months ago about the merits of default-to-NaN and others brought up similar situations. but since we can write: float z = float.nan; ...That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.explicitly, then this could be thought of as a debugging feature available to the programmer. The problem I've always had with defaulting to NaN is that it's inconsistent with integer types, and while there may be merit to the idea of defaulting all types to NaN/Null, it's simply unavailable for half of the number spectrum. I can only speak for myself, but I much prefer consistency over anything else because it means there's less discrepancies I need to remember when hacking things together. It also steepens the learning curve.It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.More importantly, what we have now is code where bugs-- like the one you mentioned above --are still possible with Ints, but also easy to miss since "the other number type" behaves differently and programmers may accidentally assume a NaN will propagate where it will not.Sadly, D has to map onto imperfect hardware :-( We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.I thought there was some optimization by avoiding assignment, but IDK enough about memory at that level. Now I'm confused as to the point of 'float x = void' type annotations. :-\It would be used where the static analysis is not able to detect that the initializer is dead.
Aug 10 2012
On 8/10/2012 9:32 PM, Walter Bright wrote:On 8/10/2012 9:01 PM, F i L wrote:Let me amend that. I've never seen anyone use float.nan, or whatever NaN is in the language they were using. They always use =0. I doubt that yelling at them will change anything.I had a debate on here a few months ago about the merits of default-to-NaN and others brought up similar situations. but since we can write: float z = float.nan; ...That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.
Aug 10 2012
Walter Bright wrote:Sadly, D has to map onto imperfect hardware :-( We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.Yes, if 'int' had a NaN state it would be great. (Though I remember hearing about a hardware that did support it.. somewhere).
Aug 10 2012
Walter Bright wrote:That is a good solution, but in my experience programmers just throw in an =0, as it is simple and fast, and they don't normally think about NaN's.See! Programmers just want usable default values :-PIt's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.I heard somewhere before there's actually an (Intel?) CPU which supports NaN ints... but maybe that's just hearsay.Sadly, D has to map onto imperfect hardware :-( We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.Ya, but I don't think pointers/refs and floats are comparable because one is copy semantics and the other is not. Conceptually, pointers are only references to data while numbers are actual data. It makes sense that one would default to different things. Thought if Int did have a NaN value, I'm not sure which way I would side on this issue. I still think I would prefer having some level of compile-time indication or my errors simply because it saves time when you're making something.It would be used where the static analysis is not able to detect that the initializer is dead.Good to know.However, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.Maybe the perfect solution is to have the compiler initialize the value to NaN, but it also does a bit of static analysis and gives a compiler error when it can determine your variable is being used before being assigned for the sake of productivity. In fact, for the sake of consistency, you could always enforce that (compiler error) rule on every local variable, so even ints would be required to have explicit initialization before use. I still prefer float class members to be defaulted to a usable value, for the sake of consistency with ints.
Aug 11 2012
On 8/11/12 3:11 AM, F i L wrote:I still prefer float class members to be defaulted to a usable value, for the sake of consistency with ints.Actually there's something that just happened two days ago to me that's relevant to this, particularly because it's in a different language (SQL) and different domain (Machine Learning). I was working with an iterative algorithm implemented in SQL, which performs some aggregate computation, on some 30 billions of samples. The algorithm is rather intricate, and each iteration takes the previous one's result as input. Somehow at the end there were NaNs in the sample data I was looking at (there weren't supposed to). So I started investigating; the NaNs could appear only in a rare data corruption case. And indeed before long I found 4 (four) samples out of 30 billion that were corrupt. After one iteration, there were 300K NaNs. After two iterations, a few millions. After four, 800M samples were messed up. NaNs did save the day. Although this case is not about default values but about the result of a computation (in this case 0.0/0.0), I think it still reveals the usefulness of having a singular value in the floating point realm. Andrei
Aug 11 2012
Andrei Alexandrescu wrote:[ ... ] Although this case is not about default values but about the result of a computation (in this case 0.0/0.0), I think it still reveals the usefulness of having a singular value in the floating point realm.My argument was never against the usefulness of NaN for debugging... only that it should be considered a debugging feature and explicitly defined, rather than intruding on convenience and consistency (with Int) by being the default. I completely agree NaNs are important for debugging floating point math, in fact D's default-to-NaN has caught a couple of my construction mistakes before. The problem, is that this sort of construction mistake is bigger than just floating point and NaN. You can mis-set a variable, float or not, or you can not set an int when you should have. So the question becomes not what benefit NaN is for debugging, but what a persons thought process is when creating/debugging code, and herein lies the heart of my qualm. In D we have a bit of a conceptual double standard within the number community. I have to remember these rules when I'm creating something, not just when I'm debugging it. As often as D may have caught a construction mistake specifically related to floats in my code, 10x more so it's produced NaN's where I intended a number, because I forgot about the double standard when adding a field or creating a variable. A C++ guy might not think twice about this because he's used to having to default values all the time (IDK, I'm not that guy), that's a paper-cut on someone's opinion of the language.
Aug 11 2012
On 8/11/2012 12:33 PM, F i L wrote:In D we have a bit of a conceptual double standard within the number community. I have to remember these rules when I'm creating something, not just when I'm debugging it. As often as D may have caught a construction mistake specifically related to floats in my code, 10x more so it's produced NaN's where I intended a number, because I forgot about the double standard when adding a field or creating a variable.I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.A C++ guy might not think twice about this because he's used to having to default values all the time (IDK, I'm not that guy),Only if a default constructor is defined for the type, which it often is not, and you'll get garbage for a default initialization.
Aug 11 2012
Walter Bright wrote:I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value. When variables are defaulted to Zero, I have a guarantee that any propagated NaN bug is _not_ coming from them (directly). With NaN defaults, I only have a guarantee that the value _might_ be coming said variable. Then, I also have more to be aware of when searching through code, because my ints behave differently than my floats. Arguably, you always have to be aware of this, but at least with explicit sets to NaN, I know the potential culprits earlier (because they'll have distinct assignment). With static analysis warning against local scope NaN issues, there's really only one situation where setting to NaN catches bugs, and that's when you want to guarantee that a member variable is specifically assigned a value (of some kind) during construction. This is a corner case situation because: 1. It makes no guarantees about what value is actually assigned to the variable, only that it's set to something. Which means it's either forgotten in favor of a 'if' statement, or in combination with an if statement. 2. Because of it's singular debugging potential, NaN safeguards are, most often, intentionally put in place (or in D's case, left in place). This is why I think such situations should require an explicit assignment to NaN. The "100 easy bugs" you mentioned weren't actually "bugs", they where times I forgot floats defaulted _differently_. The 10 times where NaN caught legitimate bugs, I would have had to hunt down the mistake either way, and it was trivial to do regardless of the the NaN. Even if it wasn't trivial, I could have very easily assigned NaN to questionable variables explicitly.
Aug 11 2012
On 8/11/2012 3:01 PM, F i L wrote:Walter Bright wrote:Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value.When variables are defaulted to Zero, I have a guarantee that any propagated NaN bug is _not_ coming from them (directly). With NaN defaults, I only have a guarantee that the value _might_ be coming said variable.I don't see why this is a bad thing. The fact is, with NaN you know there is a bug. With 0, you may never realize there is a problem. Andrei wrote me about the output of a program he is working on having billions of result values, and he noticed a few were NaNs, which he traced back to a bug. If the bug had set the float value to 0, there's no way he would have ever noticed the issue. It's all about daubing bugs with day-glo orange paint so you know there's a problem. Painting them with camo is not the right solution.
Aug 11 2012
Walter Bright wrote:My point was that the majority of the time there wasn't a bug introduced. Meaning the code was written an functioned as expected after I initialized the value to 0. I was only expecting the value to act similar (in initial value) as it's 'int' relative, but received a NaN in the output because I forgot to be explicit.That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value.Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.I don't see why this is a bad thing. The fact is, with NaN you know there is a bug. With 0, you may never realize there is a problem. Andrei wrote me about the output of a program he is working on having billions of result values, and he noticed a few were NaNs, which he traced back to a bug. If the bug had set the float value to 0, there's no way he would have ever noticed the issue. It's all about daubing bugs with day-glo orange paint so you know there's a problem. Painting them with camo is not the right solution.Yes, and this is an excellent argument for using NaN as a debugging practice in general, but I don't see anything in favor of defaulting to NaN. If you don't do some kind of check against code, especially with such large data sets, bugs of various kinds are going to go unchecked regardless. A bug where an initial data value was accidentally initialized to 0 (by a third party later on, for instance), could be just as hard to miss, or harder if you're expecting a NaN to appear. In fact, an explicit set to NaN might discourage a third party to assigning without first questioning the original intention. In this situation I imagine best practice would be to write: float dataValue = float.nan; // MUST BE NaN, DO NOT CHANGE! // set to NaN to ensure is-set.
Aug 11 2012
Am 12.08.2012 02:43, schrieb F i L:Yes, and this is an excellent argument for using NaN as a debugging practice in general, but I don't see anything in favor of defaulting to NaN. If you don't do some kind of check against code, especially with such large data sets, bugs of various kinds are going to go unchecked regardless.is makes absolutely no sense to have different initialization stylel in debug an release - and according to Andrei example: there are many situations where slow-debug code isn't capable to reproduce the error in a human-timespan - especially when working with million, billion datasets (like i also do...)
Aug 11 2012
On 12/08/12 01:31, Walter Bright wrote:On 8/11/2012 3:01 PM, F i L wrote:Exactly. I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point. Integer code nearly always assumes things like, x + 1 != x, x == x, (x + y) - y == x. for (y = x; y < x + 10; y = y + 1) { .... } How many times does it loop?Walter Bright wrote:Many, many programming bugs trace back to assumptions that floating point numbers act like ints. There's just no way to avoid knowing and understanding the differences.I'd rather have a 100 easy to find bugs than 1 unnoticed one that went out in the field.That's just the thing, bugs are arguably easier to hunt down when things default to a consistent, usable value.
Aug 13 2012
On 13/08/12 11:11, Don Clugston wrote:Exactly. I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point.//////// import std.stdio; void main() { real x = 1.0/9.0; writefln("x = %.128g", x); writefln("9x = %.128g", 9.0*x); } //////// ... well, that doesn't work, does it? Looks like some sort of cheat in place to make sure that the successive division and multiplication will revert to the original number.Integer code nearly always assumes things like, x + 1 != x, x == x, (x + y) - y == x.There's always good old "if(x==0)" :-)
Aug 13 2012
On 8/13/2012 5:38 AM, Joseph Rushton Wakeling wrote:Looks like some sort of cheat in place to make sure that the successive division and multiplication will revert to the original number.That's called "rounding". But rounding always implies some, small, error that can accumulate into being a very large error.
Aug 13 2012
On 13/08/12 20:04, Walter Bright wrote:That's called "rounding". But rounding always implies some, small, error that can accumulate into being a very large error.Well, yes. I was just remarking on the choice of rounding and the motivation behind it. After all, you _could_ round it instead as, x = 1.0/9.0 == 0.11111111111111 ... 111 [finite number of decimal places] but then 9*x == 0.999999999999 ... 9999 [i.e. doesn't multiply back to 1.0]. ... and this is probably more likely to result in undesirable error than the other rounding scheme. (I think the calculator app on Windows used to have this behaviour some years back.)
Aug 13 2012
Don Clugston:I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point.And JavaScript programs that use integers? Bye, bearophile
Aug 13 2012
On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:... I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point. Integer code nearly always assumes things like, x + 1 != x, x == x, (x + y) - y == x. for (y = x; y < x + 10; y = y + 1) { .... } How many times does it loop?Don, I would appreciate your thoughts on the issue of re-implementing numeric codes like BLAS and LAPACK in pure D to benefit from the many nice features listed in this discussion. Is it feasible? Worthwhile? Thanks, TJB
Aug 13 2012
On 14/08/12 05:03, TJB wrote:On Monday, 13 August 2012 at 10:11:06 UTC, Don Clugston wrote:I found that when converting code for Special Functions from C to D, the code quality improved enormously. Having 'static if' and things like float.epsilon as built-ins makes a surprisingly large difference. It encourages correct code. (For example, it makes any use of magic numbers in the code look really ugly and wrong). Unit tests help too. That probably doesn't apply so much to LAPACK and BLAS, but it would be interesting to see how far we can get with the new SIMD support.... I have come to believe that there are very few algorithms originally designed for integers, which also work correctly for floating point. Integer code nearly always assumes things like, x + 1 != x, x == x, (x + y) - y == x. for (y = x; y < x + 10; y = y + 1) { .... } How many times does it loop?Don, I would appreciate your thoughts on the issue of re-implementing numeric codes like BLAS and LAPACK in pure D to benefit from the many nice features listed in this discussion. Is it feasible? Worthwhile? Thanks, TJB
Aug 14 2012
On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on) example: int a; writeln(a++); //compile-time error, or throws an exception on at runtime (read access before being set) internally translated as: int a; bool _is_a_used = false; if (!_a__is_a_used) throw new exception("a not initialized before use!"); //passing to functions will throw the exception, //unless the signature is 'out' writeln(a); ++a; _a__is_a_used= true;Sadly, D has to map onto imperfect hardware :-(Not so much imperfect hardware, just the imperfect 'human' variable.We do have NaN values for chars (0xFF) and pointers (the villified 'null'). Think how many bugs the latter has exposed, and then think of all the floating point code with no such obvious indicator of bad initialization.
Aug 11 2012
On 8/11/2012 1:30 AM, Era Scarecrow wrote:On Saturday, 11 August 2012 at 04:33:38 UTC, Walter Bright wrote:Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?It's too bad that ints don't have a NaN value, but interestingly enough, valgrind does default initialize them to some internal NaN, making it a most excellent bug detector.The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on)
Aug 11 2012
On Saturday, 11 August 2012 at 09:26:42 UTC, Walter Bright wrote:On 8/11/2012 1:30 AM, Era Scarecrow wrote:I suppose there could be a second hidden pointer/bool as part of calls, but then it's completely incompatible with any C calling convention, meaning that is probably out of the question. Either a) pointers are low level enough that like casting; At which case it's all up to the programmer. or b) same as before that unless it's an 'out' parameter is specified, it would likely throw an exception at that point, (Since attempting to read/pass the address of an uninitialized variable is the same as accessing it directly). Afterall having a false positive is better than not being involved at all right? Of course with that in mind, specifying a variable to begin as void (uninitialized) could be it's own form of initialization? (Meaning it wouldn't be checking those even though they hold known garbage)The compiler could always have flags specifying if variables were used, and if they are false they are as good as NaN. Only downside is a performance hit unless you Mark it as a release binary. It really comes down to if it's worth implementing or considered a big change (unless it's a flag you have to specially turn on)Not so easy. Suppose you pass a pointer to the variable to another function. Does that function set it?
Aug 11 2012
F i L wrote:Walter Bright wrote:fields are defaulted to a usable value. Locals have to be explicitly set before they're used.. so, expanding on your example above: float z; if (condition1) z = 5; else z = 6; // 'else' required ... lotsa code ... if (condition2) z++; On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to explicitly assign 'z' somewhere above you need to: 1. run the program 2. get bad result 3. hunt down bug are initialized in a constructor: class Foo { float f = float.NaN; // Can't 'f' use unless Foo is // properly constructed. }It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; [...]Yes, but that's not really an issue since the compiler informs the coder of it's limitation. You're simply forced to initialize the variable in this situation.
Aug 10 2012
On 8/10/2012 9:55 PM, F i L wrote:On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to whereas in D you need to: 1. run the program 2. get bad result 3. hunt down bugHowever, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
Aug 10 2012
On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:On 8/10/2012 9:55 PM, F i L wrote:Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to enforce What you seem to be missing is that the issue you're saying is correct in theory, but too much of a corner case in practice. you're mentioning, and even when they do, they don't have nearly as much of a problem with fixing it as you seem to think. The only reason you run into this sort of problem (assuming you do, and it's not just a theoretical discussion) is that you're in the C/C++ mindset, and using variables in the C/C++ fashion. you simply _wouldn't_ try to make things so complicated when coding, and you simply _wouldn't_ run into these problems the way you /think/ you would, as a C++ programmer. Regardless, it looks to me like you two are arguing for two orthogonal issues: F i L: The compiler should detect uninitialized variables. Walter: The compiler should choose initialize variables with NaN. What I'm failing to understand is, why can't we have both? 1. Compiler _warns_ about "uninitialized variables" (or scalars, the address of the variable, in which case the compiler gives up Bonus points: Try to detect a couple of common cases (e.g. if/else) instead of giving up so easily. 2. In any case, the compiler initializes the variable with whatever default value Walter deems useful. Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs. Nothing lost, anyway. to be gained.On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to explicitly assign 'z' compile-time, whereas in D you need to: 1. run the program 2. get bad result 3. hunt down bugHowever, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.
Aug 14 2012
On Tuesday, 14 August 2012 at 10:31:30 UTC, Mehrdad wrote:Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to cases). What you seem to be missing is that the issue you're saying is correct in theory, but too much of a corner case in practice. you're mentioning, and even when they do, they don't have nearly as much of a problem with fixing it as you seem to think.hairy code (nested if/foreach/try) to make sure all cases are handled when initializing variable. Compilation errors can be simply dismissed by assigning a 'default' value to variable at the beginning the functions, but is generally a sloppy programing and you loose useful help of the compiler. applied to D http://msdn.microsoft.com/en-us/library/aa691172%28v=vs.71%29.aspx
Aug 14 2012
On 14/08/12 12:31, Mehrdad wrote:On Saturday, 11 August 2012 at 05:41:23 UTC, Walter Bright wrote:DMD detects uninitialized variables if you compile with -O. It's hard to implement the full Monty at the moment, because all that code is in the backend rather than the front-end.On 8/10/2012 9:55 PM, F i L wrote:Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to enforce What you seem to be missing is that the issue you're saying is correct in theory, but too much of a corner case in practice. mentioning, and even when they do, they don't have nearly as much of a problem with fixing it as you seem to think. The only reason you run into this sort of problem (assuming you do, and it's not just a theoretical discussion) is that you're in the C/C++ mindset, and using variables in the C/C++ fashion. simply _wouldn't_ try to make things so complicated when coding, and you simply _wouldn't_ run into these problems the way you /think/ you would, as a C++ programmer. Regardless, it looks to me like you two are arguing for two orthogonal issues: F i L: The compiler should detect uninitialized variables. Walter: The compiler should choose initialize variables with NaN. What I'm failing to understand is, why can't we have both? 1. Compiler _warns_ about "uninitialized variables" (or scalars, at address of the variable, in which case the compiler gives up trying to Bonus points: Try to detect a couple of common cases (e.g. if/else) instead of giving up so easily. 2. In any case, the compiler initializes the variable with whatever default value Walter deems useful. Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs.On the first condition, without an 'else z = ...', or if the condition was removed at a later time, then you'll get a compiler error and be forced to explicitly assign 'z' somewhere above using catches these issues at compile-time, whereas in D you need to: 1. run the program 2. get bad result 3. hunt down bugHowever, and I've seen this happen, people will satisfy the compiler complaint by initializing the variable to any old value (usually 0), because that value will never get used. Later, after other things change in the code, that value suddenly gets used, even though it may be an incorrect value for the use.Completely agree. I always thought the intention was that assigning to NaN was simply a way of catching the difficult cases that slip through compile-time checks. Which includes the situation where the compile-time checking isn't yet implemented at all. This is the first time I've heard the suggestion that it might never be implemented. The thing which is really bizarre though, is float.init. I don't know what the semantics of it are.
Aug 14 2012
Mehrdad wrote:Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to cases). [ ... ]I think some here are mis-interpreting Walters position concerning static analysis from our earlier conversation, so I'll share my impression of his thoughts. I can't speak for Walter, of course, but I'm pretty sure that early on in our conversation he agreed that having the compiler catch local scope initialization issues was a good idea, or at least, wasn't a bad one (again, correct me if I'm wrong). I doubt he would be adverse to eventually having DMD perform this sort of static analysis to help developers, though I doubt it's a high priority for him. The majority of the conversation after that was concerning struct/class fields defaults: class Foo { float x; // I think this should be 0.0f // Walter thinks it should be NaN } In this situation static analysis can't help catch issues, and we're forced to rely on a default value of some kind. Both Walter and I have stated our opinion's reasoning previously, so I won't repeat them here.
Aug 14 2012
On Tue, 14 Aug 2012 16:32:25 +0200, F i L <witte2008 gmail.com> wrote:class Foo { float x; // I think this should be 0.0f // Walter thinks it should be NaN } In this situation static analysis can't help catch issues, and we're forced to rely on a default value of some kind.Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time. -- Simen
Aug 14 2012
On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas wrote:On Tue, 14 Aug 2012 16:32:25 +0200, F i L <witte2008 gmail.com> wrote:You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.class Foo { float x; // I think this should be 0.0f // Walter thinks it should be NaN } In this situation static analysis can't help catch issues, and we're forced to rely on a default value of some kind.Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time.
Aug 14 2012
On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:Mmmm... What if you added a command that has a file/local scope? perhaps following the disable this(), it could be disable init; or disable .init. This would only work for built-in types, and possibly structs with variables that aren't explicitly set with default values. It sorta already fits with what's there. disable init; //global scope in file, like safe. struct someCipher { disable init; //local scope, in this case the whole struct. int[][] tables; //now gives compile-time error unless disable this() used. ubyte[] key = [1,2,3,4]; //explicitly defined as a default this(ubyte[] k, int[][] t){key=k;tables=t;} } void myfun() { someCipher x; //compile time error since struct fails (But not at this line unless disable this() used) someCipher y = someCipher([[1,2],[1,2]]); //should work as expected. }Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time.You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.
Aug 14 2012
On Tuesday, 14 August 2012 at 15:24:30 UTC, F i L wrote:On Tuesday, 14 August 2012 at 14:46:30 UTC, Simen Kjaeraas wrote::) We could do the same for structs and classes... what I said doesn't just apply to local variables.On Tue, 14 Aug 2012 16:32:25 +0200, F i L <witte2008 gmail.com> wrote:You know, I never actually thought about it much, but I think you're right. I guess the same rules could apply to type fields.class Foo { float x; // I think this should be 0.0f // Walter thinks it should be NaN } In this situation static analysis can't help catch issues, and we're forced to rely on a default value of some kind.Really? We can catch (or, should be able to) missing initialization of stuff with disable this(), but not floats? Classes have constructors, which lend themselves perfectly to doing exactly this (just pretend the member is a local variable). Perhaps there are problems with structs without disabled default constructors, but even those are trivially solvable by requiring a default value at declaration time.
Aug 14 2012
On Tuesday, 14 August 2012 at 14:32:26 UTC, F i L wrote:Mehrdad wrote:Ah, well if he's for it, then I misunderstood. I read through the entire thread (but not too carefully, just 1 read) and my impression was that he didn't like the idea because it would fail in some cases (and because D doesn't seem to love emitting compiler warnings in general), but if he likes it, then great. :)Note to Walter: You're obviously correct that you can make an arbitrarily complex program to make it too difficult for the compiler to cases). [ ... ]I think some here are mis-interpreting Walters position concerning static analysis from our earlier conversation, so I'll share my impression of his thoughts. I can't speak for Walter, of course, but I'm pretty sure that early on in our conversation he agreed that having the compiler catch local scope initialization issues was a good idea, or at least, wasn't a bad one (again, correct me if I'm wrong). I doubt he would be adverse to eventually having DMD perform this sort of static analysis to help developers, though I doubt it's a high priority for him.
Aug 14 2012
On 8/14/2012 3:31 AM, Mehrdad wrote:Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs.As I've explained before, user defined types have "default constructors". If builtin types do not, then you've got a barrier to writing generic code. Default initialization also applies to static arrays, tuples, structs and dynamic allocation. It seems a large inconsistency to complain about them only for local variables of basic types, and not for any aggregate type or user defined type.As for the 'rarity' of the error I mentioned, yes, it is unusual. The trouble is when it creeps unexpectedly into otherwise working code that has been working for a long time.
Aug 14 2012
On Tuesday, 14 August 2012 at 21:13:01 UTC, Walter Bright wrote:On 8/14/2012 3:31 AM, Mehrdad wrote:Just because they _have_ a default constructor doesn't mean the compiler should implicitly _call_ them on your behalf.Then you get the best of both worlds: 1. You force the programmer to manually initialize the variable in most cases, forcing him to think about the default value. It's almost no trouble for 2. In the cases where it's not possible, the language helps the programmer catch bugs.As I've explained before, user defined types have "default constructors". If builtin types do not, then you've got a barrier to writing generic code.Huh? I think you completely misread my post... I was talking about "definite assignment", i.e. the _lack_ of automatic initialization.As for the 'rarity' of the error I mentioned, yes, it is unusual. The trouble is when it creeps unexpectedly into otherwise working code that has been working for a long time.It's no "trouble" in practice, that's what I'm trying to say. It only looks like "trouble" if you look at it from the C/C++
Aug 14 2012
On Tuesday, 14 August 2012 at 21:22:14 UTC, Mehrdad wrote:Typo, scratch Java, it's N/A for Java.
Aug 14 2012
On 8/14/2012 2:22 PM, Mehrdad wrote:I was talking about "definite assignment", i.e. the _lack_ of automatic initialization.I know. How does that fit in with default construction?
Aug 14 2012
On Tuesday, 14 August 2012 at 21:58:20 UTC, Walter Bright wrote:On 8/14/2012 2:22 PM, Mehrdad wrote:They aren't called unless the user calls them. void Bar<T>(T value) { } void Foo<T>() where T : new() // generic constraint for default constructor { T uninitialized; T initialized = new T(); Bar(initialized); // error Bar(uninitialized); // OK } void Test() { Foo<int>(); Foo<Object>(); } D could take a similar approach.I was talking about "definite assignment", i.e. the _lack_ of automatic initialization.I know. How does that fit in with default construction?
Aug 14 2012
On Tuesday, 14 August 2012 at 22:57:26 UTC, Mehrdad wrote:Bar(initialized); // error Bar(uninitialized); // OKEr, other way around I mean...
Aug 14 2012
On 8/14/2012 3:57 PM, Mehrdad wrote:I guess they aren't really default constructors, then <g>. So what happens when you allocate an array of them?I know. How does that fit in with default construction?They aren't called unless the user calls them.D could take a similar approach.It could, but default construction is better (!).
Aug 14 2012
On Wednesday, 15 August 2012 at 00:32:43 UTC, Walter Bright wrote:On 8/14/2012 3:57 PM, Mehrdad wrote: I guess they aren't really default constructors, then <g>.I say potayto, you say potahto... :PSo what happens when you allocate an array of them?For arrays, they're called automatically. Well, OK, that's a bit of a simplification. It's what happens from the user perspective, not the compiler's (or runtime's). Here's the full story. And please read it carefully, since I'm __not__ saying D should - You can define a custom default constructor for classes, but not structs. - Structs _always_ have a zero-initializing default (no-parameter) constructor. - Therefore, there is no such thing as "copy construction"; it's bitwise-copied. - Ctors for _structs_ MUST initialize every field (or call the default ctor) - Ctors for _classes_ don't have this restriction. - Since initialization is "Cheap", the runtime _always_ does it, for _security_. - The above^ is IRRELEVANT to the compiler! * It enforces initialization where it can. * It explicitly tells the runtime to auto-initialize when it can't. -- You can ONLY take the address of a variable in unsafe{} blocks. -- This implies you know what you're doing, so it's not a problem. What D would do _ideally_, IMO: 1. Keep the ability to define default (no-args) and postblit constructors. 2. _Always_ force the programmer to initialize _all_ variables explicitly. * No, this is NOT what C++ does. * Yes, it is tested & DOES work well in practice. But NOT in the C++ mindset. * If the programmer _needs_ vars to be uninitialized, he can say = void. * If the programmer wants NaNs, he can just say = T.init. Bingo. It should work pretty darn well, if you actually give it a try. (Don't believe me? Put it behind a compiler switch, and see how many people start using it, and how many of them [don't] complain about it!)Well, that's so convincing, I'm left speechless!D could take a similar approach.It could, but default construction is better (!).
Aug 14 2012
On Friday, 10 August 2012 at 22:01:46 UTC, Walter Bright wrote:It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.variable is NOT set and then emits an error. It tries to prove that the variable IS set, and if it can't prove that, it's an error. It's not an incorrect diagnostic, it does exactly what it's supposed to do and the programmer has to be explicit when one takes on the responsibility of initialization. I don't see programmers I've talked to love it (I much prefer it too). Leaving a local variable initially uninitialized (or rather, not explicitly initialized) is a good way to portray the intention your program compiles, your variable is guaranteed to be initialized later but before use. This is a useful guarantee when reading/maintaining code. In D, on the other hand, it's possible to write D code like: for(size_t i; i < length; ++i) { ... } And I've actually seen this kind of code a lot in the wild. It boggles my mind that you think that this code should be legal. I think it's lazy - the intention is not clear. Is the default initializer being intentionally relied on, or was it unintentional? I've seen both cases. The for-loop example is an extreme one for demonstrative purposes, most examples are less obvious. Saying that most programmers will explicitly initialize floating point numbers to 0 instead of NaN when taking on initialization responsibility is a cop-out - float.init and float.nan are obviously the values you should be going for. The benefit is easy for programmers to understand, especially if they already understand why float.init is NaN. You say yelling at them probably won't help - why not? I personally use float.init/double.init etc. in my own code, and I'm sure other informed programmers do too. I can understand why people don't do it in, say, C, with NaN being less defined there afaik. D promotes NaN actively and programmers should be eager to leverage NaN explicitly too. non-local variables - they all have a defined default initializer that the local-variable analysis is limited to the scope of a single function body, it does not do inter-procedural analysis. I think this would be a great thing for D, and I believe that all code this change breaks is actually broken to begin with.
Aug 11 2012
On 8/11/2012 1:57 AM, Jakob Ovrum wrote:set and then emits an error. It tries to prove that the variable IS set, and if it can't prove that, it's an error. It's not an incorrect diagnostic, it does exactly what it's supposed to doOf course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required. And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.In D, on the other hand, it's possible to write D code like: for(size_t i; i < length; ++i) { ... } And I've actually seen this kind of code a lot in the wild. It boggles my mind that you think that this code should be legal. I think it's lazy - the intention is not clear. Is the default initializer being intentionally relied on, or was it unintentional? I've seen both cases. The for-loop example is an extreme one for demonstrative purposes, most examples are less obvious.That perhaps is your experience with other languages (that do not default initialize) showing. I don't think that default initialization is so awful. In fact, C++ enables one to specify default initialization for user defined types. Are you against that, too?Saying that most programmers will explicitly initialize floating point numbers to 0 instead of NaN when taking on initialization responsibility is a cop-out -You can certainly say it's a copout, but it's what I see them do. I've never seen them initialize to NaN, but I've seen the "just throw in a 0" many times.float.init and float.nan are obviously the values you should be going for. The benefit is easy for programmers to understand, especially if they already understand why float.init is NaN. You say yelling at them probably won't help - why not?Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.
Aug 11 2012
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:On 8/11/2012 1:57 AM, Jakob Ovrum wrote: Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.I have to agree here. I spend my work time between JVM and .NET based languages, and checked exceptions are on my top 5 list of what went wrong with Java. You see lots of try { ... } catch (Exception e) { e.printStackException(); } in enterprise code. -- Paulo
Aug 11 2012
On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:Of course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required. And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.That perhaps is your experience with other languages (that do not default initialize) showing. I don't think that default initialization is so awful. In fact, C++ enables one to specify default initialization for user defined types. Are you against that, too?No, because user-defined types can have explicitly initialized members. I do think that member fields relying on the default initializer are ambiguous and should be explicit, but flow analysis on aggregate members is not going to work in any current point. even though D is my personal favourite.You can certainly say it's a copout, but it's what I see them do. I've never seen them initialize to NaN, but I've seen the "just throw in a 0" many times.Again, I agree with this - except the examples are not from D, and certainly not from the future D that is being proposed. I don't blame anyone from steering away from NaN in other C-style languages. I do, however, believe that D programmers are perfectly capable of doing the right thing if informed. And let's face it - there's a lot that relies on education in D, like whether to receive a string parameter as const or immutable, and using scope on a subset of callback parameters. Both of these examples require more typing than the intuitive/straight-forward choice (always receive `string` and no `scope` on delegates), but informed D programmers still choose the more lengthy, correct version. Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.Because experience shows that even the yellers tend to do the short, convenient one rather than the longer, correct one. Bruce Eckel wrote an article about this years ago in reference to why Java exception specifications were a failure and actually caused people to write bad code, including those who knew better.I don't think the comparison is fair. Compared to Java exception specifications, the difference between '0' and 'float.nan'/'float.init' is negligible, especially in generic functions when the desired initializer would typically be 'T.init'. Java exception specifications have widespread implications for the entire codebase, while the difference between '0' and 'float.nan' is constant and entirely a local improvement.
Aug 11 2012
On 8/11/2012 7:30 AM, Jakob Ovrum wrote:On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:No, it is not easier to understand, because there's no way to determine if the intent is to: 1. initialize to a valid value -or- 2. initialize to get the compiler to stop complainingOf course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required. And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.I do, however, believe that D programmers are perfectly capable of doing the right thing if informed.Of course they are capable of it. But experience shows they simply don't.Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.Java exception specifications have widespread implications for the entire codebase, while the difference between '0' and 'float.nan' is constant and entirely a local improvement.I believe there's a lot more potential for success when you have a design where the easiest way is the correct way, and you've got to make some effort to do it wrong. Much of my attitude on that goes back to my experience at Boeing on designing things (yes, my boring Boeing anecdotes again), and Boeing's long experience with pilots and mechanics and what they actually do vs what they're trained to do. (And not only are these people professionals, not fools, but their lives depend on doing it right.) Over and over and over again, the easy way had better be the correct way. I could bore you even more with the aviation horror stories I heard that justified that attitude.
Aug 12 2012
On Sun, 12 Aug 2012 12:38:47 +0200, Walter Bright <newshound2 digitalmars.com> wrote:On 8/11/2012 7:30 AM, Jakob Ovrum wrote:I have thought that many times. The same with default non-null class references. I keep adding assert(someClass) everywhere.On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote: Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
Aug 12 2012
Am 12.08.2012 12:38, schrieb Walter Bright:On 8/11/2012 7:30 AM, Jakob Ovrum wrote:its never to late - put it back on the list for D 3 - please (and local variables are immuteable by default - or seomthing like that)Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.
Aug 12 2012
On Sunday, 12 August 2012 at 11:34:20 UTC, dennis luehring wrote:Am 12.08.2012 12:38, schrieb Walter Bright:Agreed. If it is only a signature change then it might have been possible to accept such a change; as I'm sure it would simplify quite a bit of signatures and only complicate a few. Probably default signatures to try and include are: pure, and safe (Others off hand I can't think of). Make a list of all the issues/mistakes that can be done in D3 (be it ten or fifteen years from now); who knows, maybe the future is just around the corner if there's a big enough reason for it. The largest reason not to make big changes is so people don't get fed up and quit (especially while still trying to write library code); That and this is suppose to be the 'stable' D2 language right now with language changes having to be weighted heavily on.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.its never to late - put it back on the list for D 3 - please (and local variables are immutable by default - or something like that)
Aug 12 2012
On Sunday, 12 August 2012 at 10:39:01 UTC, Walter Bright wrote:No, it is not easier to understand, because there's no way to determine if the intent is to: 1. initialize to a valid value -or- 2. initialize to get the compiler to stop complainingIf there is an explicit initializer, it means that the intent is either of those two. The latter case is probably quite rare, and might suggest a problem with the code - if the compiler can't prove your variable to be initialized, then the programmer probably has to spend some time figuring out the real answer. Legitimate cases of the compiler being too conservative can be annotated with a comment to eliminate the ambiguity. The interesting part is that you can be sure that variables *without* initializers are guaranteed to be initialized at a later point, or the program won't compile. Without the guarantee, the default value could be intended as a valid initializer or there could be a bug in the program. The current situation is not bad, I just think the one that allows for catching more errors at compile-time is much, much better.Of course they are capable of it. But experience shows they simply don't.If they do it for contagious attributes like const, immutable and pure, I'm sure they'll do it for a simple fix like using explicit 'float.nan' in the rare case the compiler can't prove initialization before use.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.I agree, but on the flip side it was easier to port D1 code to D2 this way, and that might have saved D2 from even further alienation by some D1 users during its early stages. The most common complaints I remember from the IRC channel were complaints about const and immutable which was now forced on D programs to some degree due to string literals. This made some people really apprehensive about moving their code to D2, and I can imagine the fallout would be a lot worse if they had to annotate all their impure functions etc.I believe there's a lot more potential for success when you have a design where the easiest way is the correct way, and you've got to make some effort to do it wrong. Much of my attitude on that goes back to my experience at Boeing on designing things (yes, my boring Boeing anecdotes again), and Boeing's long experience with pilots and mechanics and what they actually do vs what they're trained to do. (And not only are these people professionals, not fools, but their lives depend on doing it right.) Over and over and over again, the easy way had better be the correct way. I could bore you even more with the aviation horror stories I heard that justified that attitude.Problem is, we've pointed out the easy way has issues and is not necessarily correct.
Aug 12 2012
On Sun, 12 Aug 2012 03:38:47 -0700, Walter Bright <newshound2 digitalmars.com> wrote:On 8/11/2012 7:30 AM, Jakob Ovrum wrote:As a pilot, I completely agree! -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/On Saturday, 11 August 2012 at 09:40:39 UTC, Walter Bright wrote:No, it is not easier to understand, because there's no way to determine if the intent is to: 1. initialize to a valid value -or- 2. initialize to get the compiler to stop complainingOf course it is doing what the language requires, but it is an incorrect diagnostic because a dead assignment is required. And being a dead assignment, it can lead to errors when the code is later modified, as I explained. I also dislike on aesthetic grounds meaningless code being required.It is not meaningless, it's declarative. The same resulting code as now would be generated, but it's easier for the maintainer to understand what's being meant.I do, however, believe that D programmers are perfectly capable of doing the right thing if informed.Of course they are capable of it. But experience shows they simply don't.Consider `pure` member functions - turns out most of them are actually pure because the implicit `this` parameter is allowed to be mutated and it's rare for a member function to mutate global state, yet we all strive to correctly decorate our methods `pure` when applicable.A better design would be to have pure be the default and impure would require annotation. The same for const/immutable. Unfortunately, it's too late for that now. My fault.Java exception specifications have widespread implications for the entire codebase, while the difference between '0' and 'float.nan' is constant and entirely a local improvement.I believe there's a lot more potential for success when you have a design where the easiest way is the correct way, and you've got to make some effort to do it wrong. Much of my attitude on that goes back to my experience at Boeing on designing things (yes, my boring Boeing anecdotes again), and Boeing's long experience with pilots and mechanics and what they actually do vs what they're trained to do. (And not only are these people professionals, not fools, but their lives depend on doing it right.) Over and over and over again, the easy way had better be the correct way. I could bore you even more with the aviation horror stories I heard that justified that attitude.
Aug 12 2012
On 08/10/2012 06:01 PM, Walter Bright wrote:On 8/10/2012 1:38 AM, F i L wrote:To address the concern of static analysis being too hard: I wish we could have it but limit the amount of static analysis that's done. Something like this: the compiler will test branches of if-else statements and switch-case statements, but it will not drop into function calls with ref parameters nor will it accept initialization in looping constructs (foreach, for, while, etc). A compiler is an incorrect implementation if it implements /too much/ static analysis. The example code you give can be implemented with such limited static analysis: void lotsaCode() { ... lotsa code ... } float z; if ( condition1 ) { z = 5; lotsaCode(); z++; } else { lotsaCode(); } I will, in advance, concede that this does not prevent people from just writing "float z = 0;". In my dream-world the compiler recognizes a set of common mistake-inducing patterns like the one you mentioned and then prints helpful error messages suggesting alternative design patterns. That way, bugs are prevented and users become better programmers.Walter Bright wrote:It catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.3. Floating point values are default initialized to NaN.with just as much optimization/debugging benefit (arguably more so, because it catches NaN class Foo { float x; // defaults to 0.0f void bar() { float y; // doesn't default y ++; // ERROR: use of unassigned local float z = 0.0f; z ++; // OKAY } } This is the same behavior for any local variable,
Aug 11 2012
On Saturday, 11 August 2012 at 23:49:18 UTC, Chad J wrote:On 08/10/2012 06:01 PM, Walter Bright wrote:Let's keep in mind everyone of these truths: 1) Programmers are lazy; If you can get away with not initializing something then you'll avoid it. In C I've failed to initialized variables many times until a bug crops up and it's difficult to find sometimes, where a NaN or equiv would have quickly cropped them out before running with any real data. 2) There are a lot of inexperienced programmers. I worked for a company for a short period of time that did minimal training on a language like Java, where I ended up being seen as an utter genius (compared to even the teachers). 3) Bugs in a large environment and/or scenarios are far more difficult if not impossible to debug. I've made a program that handles merging of various dialogs (using double linked-like lists); I can debug them if they are 100 or less to work with, but after 100 (and often it's tens of thousands) it can become such a pain based on it's indirection and how the original structure was built that I refuse based on difficulty vs end results (Plus sanity). We also need to sometimes laugh at our mistakes, and learn from others. I'll recommend everyone read from rinkworks a bit if you have the time and refresh yourselves. http://www.rinkworks.com/stupid/cs_programming.shtmlIt catches only a subset of these at compile time. I can craft any number of ways of getting it to miss diagnosing it. Consider this one: float z; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; To diagnose this correctly, the static analyzer would have to determine that condition1 produces the same result as condition2, or not. This is impossible to prove. So the static analyzer either gives up and lets it pass, or issues an incorrect diagnostic. So our intrepid programmer is forced to write: float z = 0; if (condition1) z = 5; ... lotsa code ... if (condition2) z++; Now, as it may turn out, for your algorithm the value "0" is an out-of-range, incorrect value. Not a problem as it is a dead assignment, right? But then the maintenance programmer comes along and changes condition1 so it is not always the same as condition2, and now the z++ sees the invalid "0" value sometimes, and a silent bug is introduced. This bug will not remain undetected with the default NaN initialization.
Aug 11 2012
F i L:Walter Bright wrote:An alternative possibility is to: 1) Default initialize variables just as currently done in D, with 0s, NaNs, etc; 2) Where the compiler is certain a variable is read before any possible initialization, it generates a compile-time error; 3) Warnings for unused variables and unused last assignments. Where the compiler is not sure, not able to tell, or sees there is one or more paths where the variable is initialized, it gives no errors, and eventually the code will use the default initialized values, as currently done in D. The D compiler is already doing this a little, if you compile this with -O: class Foo { void bar() {} } void main() { Foo f; f.bar(); } You get at compile-time: temp.d(6): Error: null dereference in function _Dmain A side effect of those rules is that this code doesn't compile, and similarly lot of current D code: class Foo {} void main() { Foo f; assert(f is null); } Bye, bearophile3. Floating point values are default initialized to NaN.conveniently
Aug 11 2012
On 8/11/2012 2:41 PM, bearophile wrote:2) Where the compiler is certain a variable is read before any possible initialization, it generates a compile-time error;This has been suggested repeatedly, but it is in utter conflict with the whole notion of default initialization, which nobody complains about for user-defined types.
Aug 11 2012
On 8/11/12 7:33 PM, Walter Bright wrote: [snip] Allow me to insert an opinion here. This post illustrates quite well how opinionated our community is (for better or worse). The OP has asked a topical question in a matter that is interesting and also may influence the impact of the language to the larger community. Before long the thread has evolved into the familiar pattern of a debate over a minor issue on which reasonable people may disagree and that's unlikely to change. We should instead do our best to give a balanced high-level view of what D offers for econometrics. To the OP - here are a few aspects that may deserve interest: * Modeling power - from what I understand econometrics is modeling-heavy, which is more difficult to address in languages such as Fortran, C, C++, Java, Python, or the likes of Matlab. * Efficiency - D generates native code for floating point operations and has control over data layout and allocation. Speed of generated code is dependent on the compiler, and the reference compiler (dmd) does a poorer job at it than the gnu-based compiler (gdc) compiler. * Convenience - D is designed to "do what you mean" wherever possible and simplify common programming tasks, numeric or not. That makes the language comfortable to use even by a non-specialist, in particular in conjunction with appropriate libraries. A few minuses I can think of: - Maturity and availability of numeric and econometrics library is an obvious issue. There are some libraries (e.g. https://github.com/kyllingstad/scid/wiki) maintained and extended through volunteer effort. - The language's superior modeling power and level of control comes at an increase in complexity compared to languages such as e.g. Python. So the statistician would need a larger upfront investment in order to reap the associated benefits. Andrei
Aug 11 2012
Andrei Alexandrescu:- The language's superior modeling power and level of control comes at an increase in complexity compared to languages such as e.g. Python. So the statistician would need a larger upfront investment in order to reap the associated benefits.Statistician often use the R language (http://en.wikipedia.org/wiki/R_language ). Python contains much more "computer science" and CS complexity compared to R. Not just advanced stuff like coroutines, metaclasses, decorators, Abstract Base Classes, operator overloading, and so on, but even simpler things, like generators, standard library collections like heaps and deques, and so on. For some statisticians I've seen, even several parts of Python are too much hard to use or understand. I have rewritten several of their Python scripts. Bye, bearophile
Aug 11 2012
On Sunday, 12 August 2012 at 03:30:24 UTC, bearophile wrote:Andrei Alexandrescu:For people with more advanced CS/programming knowledge, though, this is an advantage of D. I find Matlab and R incredibly frustrating to use for anything but very standard matrix/statistics computations on data that's already structured the way I like it. This is mostly because the standard CS concepts you mention are at best awkward and at worst impossible to express and, being aware of them, I naturally want to take advantage of them. Using Matlab or R feels like being forced to program with half the tools in my toolbox either missing or awkwardly misshapen, so I avoid it whenever practical. (Actually, languages like C and Java that don't have much modeling power feel the same way to me now that I've primarily used D and to a lesser extent Python for the past few years. Ironically, these are the languages that are easy to integrate with R and Matlab respectively. Do most serious programmers who work in problem domains relevant to Matlab and R feel this way or is it just me?). This was my motivation for writing Dstats and mentoring Cristi's fork of SciD. D's modeling power is so outstanding that I was able to replace R and Matlab for a lot of use cases with plain old libraries written in D.- The language's superior modeling power and level of control comes at an increase in complexity compared to languages such as e.g. Python. So the statistician would need a larger upfront investment in order to reap the associated benefits.Statistician often use the R language (http://en.wikipedia.org/wiki/R_language ). Python contains much more "computer science" and CS complexity compared to R. Not just advanced stuff like coroutines, metaclasses, decorators, Abstract Base Classes, operator overloading, and so on, but even simpler things, like generators, standard library collections like heaps and deques, and so on. For some statisticians I've seen, even several parts of Python are too much hard to use or understand. I have rewritten several of their Python scripts. Bye, bearophile
Aug 12 2012
On Sunday, 12 August 2012 at 17:22:21 UTC, dsimcha wrote:... I find Matlab and R incredibly frustrating to use for anything but very standard matrix/statistics computations on data that's already structured the way I like it.This is exactly how I feel, and why I am turning to D. My data sets are huge (64 TB for just a few years of data) and my econometric methods computationally intensive and the limitations of Matlab and R are always almost instantly constraining.Using Matlab or R feels like being forced to program with half the tools in my toolbox either missing or awkwardly misshapen, so I avoid it whenever practical. Actually, languages like C and Java that don't have much modeling power feel the same way to me ...Very well put - it expresses my feeling precisely. And C++ is such a complicated beast that I feel caught in between. I'd been dreaming of a language that offers modeling power as well as efficiency.... Do most serious programmers who work in problem domains relevant to Matlab and R feel this way or is it just me?.I certainly feel the same. I only use them when I have to or for very simple prototyping.This was my motivation for writing Dstats and mentoring Cristi's fork of SciD. D's modeling power is so outstanding that I was able to replace R and Matlab for a lot of use cases with plain old libraries written in D.Thanks for your work on these packages! I will for sure be including them in my write up. I think they offer great possibilities for econometrics in D. TJB
Aug 12 2012
On 12/08/12 18:22, dsimcha wrote:For people with more advanced CS/programming knowledge, though, this is an advantage of D. I find Matlab and R incredibly frustrating to use for anything but very standard matrix/statistics computations on data that's already structured the way I like it. This is mostly because the standard CS concepts you mention are at best awkward and at worst impossible to express and, being aware of them, I naturally want to take advantage of them.The main use-case and advantage of both R and MATLAB/Octave seems to me to be the plotting functionality -- I've seen some exceptionally beautiful stuff done with R in particular, although I've not personally explored its capabilities too far. The annoyance of R in particular is the impenetrable thicket of dependencies that can arise among contributed packages; it feels very much like some are thrown over the wall and then built on without much concern for organization. :-(
Aug 12 2012
On Monday, 13 August 2012 at 01:52:28 UTC, Joseph Rushton Wakeling wrote:The main use-case and advantage of both R and MATLAB/Octave seems to me to be the plotting functionality -- I've seen some exceptionally beautiful stuff done with R in particular, although I've not personally explored its capabilities too far. The annoyance of R in particular is the impenetrable thicket of dependencies that can arise among contributed packages; it feels very much like some are thrown over the wall and then built on without much concern for organization. :-(I've addressed that, too :). https://github.com/dsimcha/Plot2kill Obviously this is a one-man project without nearly the same number of features that R and Matlab have, but like Dstats and SciD, it has probably the 20% of functionality that handles 80% of use cases. I've used it for the figures in scientific articles that I've submitted for publication and in my Ph.D. proposal and dissertation. Unlike SciD and Dstats, Plot2kill doesn't highlight D's modeling capabilities that much, but it does get the job done for simple 2D plots.
Aug 12 2012
On Sunday, 12 August 2012 at 02:28:44 UTC, Andrei Alexandrescu wrote:On 8/11/12 7:33 PM, Walter Bright wrote: [snip] Allow me to insert an opinion here. This post illustrates quite well how opinionated our community is (for better or worse). The OP has asked a topical question in a matter that is interesting and also may influence the impact of the language to the larger community. Before long the thread has evolved into the familiar pattern of a debate over a minor issue on which reasonable people may disagree and that's unlikely to change. We should instead do our best to give a balanced high-level view of what D offers for econometrics. To the OP - here are a few aspects that may deserve interest: * Modeling power - from what I understand econometrics is modeling-heavy, which is more difficult to address in languages such as Fortran, C, C++, Java, Python, or the likes of Matlab. * Efficiency - D generates native code for floating point operations and has control over data layout and allocation. Speed of generated code is dependent on the compiler, and the reference compiler (dmd) does a poorer job at it than the gnu-based compiler (gdc) compiler. * Convenience - D is designed to "do what you mean" wherever possible and simplify common programming tasks, numeric or not. That makes the language comfortable to use even by a non-specialist, in particular in conjunction with appropriate libraries. A few minuses I can think of: - Maturity and availability of numeric and econometrics library is an obvious issue. There are some libraries (e.g. https://github.com/kyllingstad/scid/wiki) maintained and extended through volunteer effort. - The language's superior modeling power and level of control comes at an increase in complexity compared to languages such as e.g. Python. So the statistician would need a larger upfront investment in order to reap the associated benefits. AndreiAndrei, Thanks for bringing this back to the original topic and for your thoughts. Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox and the like. But there are a number of econometricians who need the raw power of a natively compiled language (especially financial econometricians whose data are huge) who typically program in either Fortran or C/C++. It is really this group that I am trying to reach. I think D has a lot to offer this group in terms of programmer productivity and reliability of code. I think this applies to statisticians as well, as I see a lot of them in this latter group too. I also want to reach the MATLABers because I think they can get a lot more modeling power (I like how you put that) without too much more difficulty (see Ox - nearly as complicated as C++ but without the power). Many MATLAB and R programmers end up recoding a good part of their algorithms in C++ and calling that code from the interpreted language. I have always found this kind of mixed language programming to be messy, time consuming, and error prone. Special tools are cropping up to handle this (see Rcpp). This just proves to me the usefulness of a productive AND powerful language like D for econometricians! I am sensitive to the drawbacks you mention (especially lack of numeric libraries). I am so sick of wasting my time in C++ though that I have almost decided to just start writing my own econometric library in D. Earlier in this thread there was a discussion of extended precision in D and I mentioned the need to recode things like BLAS and LAPACK in D. Templates in D seem perfect for this problem. As an expert in template meta-programming what are your thoughts? How is this different than what is being done in SciD? It seems they are mostly concerned about wrapping the old CBLAS and CLAPACK libraries. Again, thanks for your thoughts and your TDPL book. Probably the best programming book I've ever read! TJB
Aug 11 2012
On 8/12/12 12:52 AM, TJB wrote:Thanks for bringing this back to the original topic and for your thoughts. Indeed, a lot of econometricians are using MATLAB, R, Guass, Ox and the like. But there are a number of econometricians who need the raw power of a natively compiled language (especially financial econometricians whose data are huge) who typically program in either Fortran or C/C++. It is really this group that I am trying to reach. I think D has a lot to offer this group in terms of programmer productivity and reliability of code. I think this applies to statisticians as well, as I see a lot of them in this latter group too. I also want to reach the MATLABers because I think they can get a lot more modeling power (I like how you put that) without too much more difficulty (see Ox - nearly as complicated as C++ but without the power). Many MATLAB and R programmers end up recoding a good part of their algorithms in C++ and calling that code from the interpreted language. I have always found this kind of mixed language programming to be messy, time consuming, and error prone. Special tools are cropping up to handle this (see Rcpp). This just proves to me the usefulness of a productive AND powerful language like D for econometricians!I think this is a great angle. In our lab when I was a grad student in NLP/ML there was also a very annoying trend going on: people would start with Perl for text preprocessing and Matlab for math, and then, after the proof of concept, would need to recode most parts in C++. (I recall hearing complaints about large overheads in Matlab caused by eager copy semantics, is that true?)I am sensitive to the drawbacks you mention (especially lack of numeric libraries). I am so sick of wasting my time in C++ though that I have almost decided to just start writing my own econometric library in D. Earlier in this thread there was a discussion of extended precision in D and I mentioned the need to recode things like BLAS and LAPACK in D. Templates in D seem perfect for this problem. As an expert in template meta-programming what are your thoughts? How is this different than what is being done in SciD? It seems they are mostly concerned about wrapping the old CBLAS and CLAPACK libraries.There's a large body of experience and many optimizations accumulated in these libraries, which are worth exploiting. The remaining matter is offering a convenient shell. I think Cristi's work on SciD goes that direction. Andrei
Aug 12 2012
Andrei Alexandrescu:(I recall hearing complaints about large overheads in Matlab caused by eager copy semantics, is that true?)In Matlab there is COW: http://www.matlabtips.com/copy-on-write-in-subfunctions/ Bye, bearophile
Aug 12 2012
Andrei Alexandrescu wrote:* Efficiency - D generates native code for floating point operations and has control over data layout and allocation. Speed of generated code is dependent on the compiler, and the reference compiler (dmd) does a poorer job at it than the gnu-based compiler (gdc) compiler.I'd like to add to this. Right now I'm reworking some libraries to include Simd support using DMD on Linux 64bit. A simple benchmark between DMD and GCC of 2 million simd vector addition/subtractions actually runs faster with my DMD D code than the GCC C code. Only by ~0.8 ms, and that could be due to a difference between D's sdt.datetime.StopWatch() and C's time.h/clock(), but it's consistently faster none-the-less, which is impressive. That said, it's also much easier to "accidentally slow that figure down significantly in DMD, whereas GCC usually always optimizes very well. Also, and I'm not sure this isn't just me, but I ran a DMD vs ~88ms). Now a similar test compiled with DMD 2.060 runs at optimization improvements in the internal DMD compiler over the last few version.
Aug 12 2012
On 8/12/2012 6:38 PM, F i L wrote:Also, and I'm not sure this isn't just me, but I ran a DMD (v2.057 T think) ms 2.060 optimization improvements in the internal DMD compiler over the last few version.There's a fair amount of low hanging optimization fruit that D makes possible that dmd does not take advantage of. I hope to get to this. One thing is I suspect that D can generate much better SIMD code than C/C++ can without compiler extensions. Another is that D allows values to be moved without needing a copyconstruct/destruct operation.
Aug 13 2012
On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:On 8/9/2012 10:40 AM, dsimcha wrote:How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes? TJBI'd emphasize the following:I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
Aug 10 2012
On 8/10/2012 8:31 AM, TJB wrote:On Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:I attended a talk given by a physicist a few months ago where he was using C transcendental functions. I pointed out to him that those functions were unreliable, producing wrong bits in a manner that suggested to me that they were internally truncating to double precision. He expressed astonishment and told me I must be mistaken. What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones. I encourage you to run your own tests, and draw your own conclusions.On 8/9/2012 10:40 AM, dsimcha wrote:How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes?I'd emphasize the following:I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
Aug 10 2012
On Friday, August 10, 2012 15:10:47 Walter Bright wrote:What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones.I think that it's pretty typical for programmers to think that something like a standard library function is essentially bug-free - especially for an older language like C. And unless you see results that are clearly wrong or someone else points out the problem, I don't know why you'd ever think that there was one. I certainly had no clue that C implementations had issues with floating point arithmetic before it was pointed out here. Regardless though, it's great that D gets it right. - Jonathan M Davis
Aug 10 2012
On Friday, 10 August 2012 at 22:11:23 UTC, Walter Bright wrote:On 8/10/2012 8:31 AM, TJB wrote:Hopefully this will help make the case that D is the best choice for numerical programmers. I want to do my part to convince economists. Another reason to implement BLAS and LAPACK in pure D is that the old routines like dgemm, cgemm, sgemm, and zgemm (all defined for different types) seem ripe for templatization. Almost thou convinceth me ... TJBOn Thursday, 9 August 2012 at 18:35:22 UTC, Walter Bright wrote:I attended a talk given by a physicist a few months ago where he was using C transcendental functions. I pointed out to him that those functions were unreliable, producing wrong bits in a manner that suggested to me that they were internally truncating to double precision. He expressed astonishment and told me I must be mistaken. What can I say? I run across this repeatedly, and that's exactly why Phobos (with Don's help) has its own implementations, rather than simply calling the corresponding C ones. I encourage you to run your own tests, and draw your own conclusions.On 8/9/2012 10:40 AM, dsimcha wrote:How unique to D is this feature? Does this imply that things like BLAS and LAPACK, random number generators, statistical distribution functions, and other numerical software should be rewritten in pure D rather than calling out to external C or Fortran codes?I'd emphasize the following:I'd like to add to that: 1. Proper support for 80 bit floating point types. Many compilers' libraries have inaccurate 80 bit math functions, or don't implement 80 bit floats at all. 80 bit floats reduce the incidence of creeping roundoff error.
Aug 10 2012
On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:Hello D Users, The Software Editor for the Journal of Applied Econometrics has agreed to let me write a review of the D programming language for econometricians (econometrics is where economic theory and statistical analysis meet). I will have only about 6 pages. I have an idea of what I am going to write about, but I thought I would ask here what features are most relevant (in your minds) to numerical programmers writing codes for statistical inference. I look forward to your suggestions. Thanks, TJBLazy ranges are a lifesaver when dealing with big data. E.g. read a large csv file, use filter and map to clean and transform the data, collect stats as you go, then output to a destination file. The lazy nature of most of the ranges in Phobos means that you don't need to have the data in memory, but you can write simple imperative code just as if it was.
Aug 09 2012
On Thursday, 9 August 2012 at 18:20:08 UTC, Justin Whear wrote:On Thu, 09 Aug 2012 17:57:27 +0200, TJB wrote:Ah, the beauty of functional programming and streams.Hello D Users, The Software Editor for the Journal of Applied Econometrics has agreed to let me write a review of the D programming language for econometricians (econometrics is where economic theory and statistical analysis meet). I will have only about 6 pages. I have an idea of what I am going to write about, but I thought I would ask here what features are most relevant (in your minds) to numerical programmers writing codes for statistical inference. I look forward to your suggestions. Thanks, TJBLazy ranges are a lifesaver when dealing with big data. E.g. read a large csv file, use filter and map to clean and transform the data, collect stats as you go, then output to a destination file. The lazy nature of most of the ranges in Phobos means that you don't need to have the data in memory, but you can write simple imperative code just as if it was.
Aug 09 2012
1) I think compile-time function execution is a very big plus for people doing calculations. For example: ulong fibonacci(ulong n) { .... } static x = fibonacci(50); // calculated at compile time! runtime cost = 0 !!! 2) It has support for a BigInt structure in its standard library (which is really fast!)
Aug 10 2012