www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Notes from C++ static analysis

reply "bearophile" <bearophileHUGS lycos.com> writes:
An interesting blog post found through Reddit:

http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

The post is about the heavy usage of static analysis on lot of 
C++ code. They have a Python script that shows new warnings only 
the first time they appear in the code base. This is a simple but 
very useful memory, to solve one of the most important downsides 
of warnings.

The article groups bugs in some different categories. Some of the 
D code below is derived from the article.

- - - - - - - - - - - - - - - - - -

Format strings:

The most common problem they find are errors in the format string 
of printf-like functions (despite the code is C++):

The top type of bug that /analyze finds is format string errors 
– mismatches between printf-style format strings and the 
corresponding arguments. Sometimes there is a missing argument, 
sometimes there is an extra argument, and sometimes the 
arguments don’t match, such as printing a float, long or ‘long 
long’ with %d.<

Such errors in D are less bad, because writef("%d",x) is usable for all kind of integral values. On the other hand this D program prints just "10" with no errors, ignoring the second x: import std.stdio; void main() { size_t x = 10; writefln("%d", x, x); } In a modern statically typed language I'd like such code to give a compile-time error. This is how how Rust gets this right: println(fmt!("hello, %d", j)) https://github.com/mozilla/rust/blob/master/src/libsyntax/ext/fmt.rs https://github.com/Aatch/rust-fmt In D it can be written a safe function that needs no extra static analysis: ctWritefln!"%d"(x, x); - - - - - - - - - - - - - - - - - - Variable shadowing: This is a much less common problem in D because this code gives a errors: void main() { bool result = true; if (true) { bool result = false; } foreach (i; 0 .. 10) { foreach (i; 0 .. 20) { } } for (int i = 0; i < 10; i++) { for (int i = 0; i < 20; i++) { } } } test.d(4): Error: is shadowing declaration test.main.result test.d(7): Error: is shadowing declaration test.main.i test.d(11): Error: is shadowing declaration test.main.i There are some situations where this doesn't help, but they are not common in idiomatic D code: void main() { int i, j; for (i = 0; i < 10; i++) { for (i = 0; i < 20; i++) { } } } In D this is one case similar to variable shadowing, that the compiler doesn't help you with: class Foo { int x, y, z, w; this(in int x_, in int y_, in int z_, in int w_) { this.x = x_; this.y = y_; this.z = z; this.w = w_; } } void main() { auto f = new Foo(1, 2, 3, 4); } I believe the compile should give some warning there: http://d.puremagic.com/issues/show_bug.cgi?id=3878 - - - - - - - - - - - - - - - - - - Logic bugs: bool someFunction() { return true; } uint getFlags() { return uint.max; } void main() { uint kFlagValue = 2u ^^ 14; if (someFunction() || getFlags() | kFlagValue) {} } The D compiler gives no warnings. from the article:
The code above is an expensive and misleading way to go "if ( 
true )". Visual Studio gave a clear warning that described the 
problem well:

warning C6316: Incorrect operator: tested expression is constant and non-zero. Use bitwise-and to determine whether bits are set.< See: http://msdn.microsoft.com/en-us/library/f921xb29.aspx A simpler example: enum INPUT_VALUE = 2; void f(uint flags) { if (flags | INPUT_VALUE) {} } I have just added it to Bugzilla: http://d.puremagic.com/issues/show_bug.cgi?id=10480 Another problem: void main() { bool bOnFire = true; float angle = 20.0f + bOnFire ? 5.0f : 10.0f; } D compiler gives no warnings. Visual Studio gave:
warning C6336: Arithmetic operator has precedence over question 
operator, use parentheses to clarify intent.

See: http://msdn.microsoft.com/en-us/library/ms182085.aspx I opened an ER lot of time ago, "Require parenthesization of ternary operator when compounded": http://d.puremagic.com/issues/show_bug.cgi?id=8757 - - - - - - - - - - - - - - - - - - Signed, unsigned, and tautologies: Currently this gives no warnings:
This code would have been fine if both a and b were signed – but 
one of them wasn’t, making this operation nonsensical.<

import std.algorithm: max; void main() { int a = -10; uint b = 5; auto result = max(0, a - b); }
We had quite a few places where we were checking to see if 
unsigned variables were less than zero -- now we have fewer.<

This is a well known problem, it's an issue in Bugzilla since lot of time, and it seems there is no simple way to face it in D. Bye, bearophile
Jun 26 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
 An interesting blog post found through Reddit:
 
 http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

 The most common problem they find are errors in the format string of
 printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable to use, in spite of its many problems. One of the most annoying features of iostream is the abuse of operator<< and operator>> for I/O. Format strings are an ingenious idea sorely lacking in the iostream department (though admittedly the way it was implemented in stdio is rather unsafe, due to the inability of C to do many compile-time checks).
The top type of bug that /analyze finds is format string errors –
mismatches between printf-style format strings and the corresponding
arguments. Sometimes there is a missing argument, sometimes there is
an extra argument, and sometimes the arguments don’t match, such as
printing a float, long or ‘long long’ with %d.<

Such errors in D are less bad, because writef("%d",x) is usable for all kind of integral values.

Less bad? Actually, IME format strings in D are amazingly useful! You can pretty much use %s 99% of the time, because static type inference works so well in D! The only time I actually write anything other than %s is when I need to specify floating-point formatting options, like %precision, or scientific format vs. decimal, etc.. Then throw in the array formatters %(...%), and D format strings will totally blow C's stdio out of the water.
 On the other hand this D program prints
 just "10" with no errors, ignoring the second x:
 
 import std.stdio;
 void main() {
     size_t x = 10;
     writefln("%d", x, x);
 }
 
 In a modern statically typed language I'd like such code to give a
 compile-time error.

This looks like a bug to me. Please file one. :) [...]
 There are some situations where this doesn't help, but they are not
 common in idiomatic D code:
 
 void main() {
     int i, j;
     for (i = 0; i < 10; i++) {
         for (i = 0; i < 20; i++) {
         }
     }
 }

I don't think this particular error is compiler-catchable. Sometimes, you *want* the nested loop to reuse the same index (though probably not in exactly the formulation as above, most likely the inner loop will omit the i=0 part). The compiler can't find such errors unless it reads the programmer's mind.
 In D this is one case similar to variable shadowing, that the
 compiler doesn't help you with:
 
 class Foo {
     int x, y, z, w;
     this(in int x_, in int y_, in int z_, in int w_) {
         this.x = x_;
         this.y = y_;
         this.z = z;
         this.w = w_;
     }
 }

Yeah, this one bit me before. Really hard. I had code that looked like this: class C { int x; this(int x) { x = f(x); // ouch } int f(int x) { ... } } This failed horribly, so I rewrote the //ouch line to: this.x = x; But that is still very risky, since in a member function that doesn't shadow x, the above line is equivalent to this.x = this.x. Anyway, in the end I decided that naming member function arguments after member variables is a Very Stupid Idea, and that it should never be done. It would be nice if the D compiler rejected such code. [...]
 Logic bugs:

 enum INPUT_VALUE = 2;
 void f(uint flags) {
     if (flags | INPUT_VALUE) {}
 }
 
 
 I have just added it to Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=10480

Huh? Shouldn't that be (flags & ~INPUT_VALUE)? How would the compiler catch such cases in general, though? I mean, like in arbitrarily complex boolean expressions. T -- It said to install Windows 2000 or better, so I installed Linux instead.
Jun 26 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 26.06.2013 21:07, schrieb Adam D. Ruppe:
 On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
 import std.stdio;
 void main() {
     size_t x = 10;
     writefln("%d", x, x);
 }

 In a modern statically typed language I'd like such code to
 give a compile-time error.

This looks like a bug to me. Please file one. :)

Not necessarily, since you might want a format string to be a runtime variable, like when doing translations. I could live with there being another function that does runtime though.

then you normaly quote the % with %% or something else to inactivate it - thats much more clean then just to allow it for this corner case out of the box
Jun 26 2013
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
26-Jun-2013 22:52, H. S. Teoh пишет:
[snip]
 Then throw in the array formatters %(...%), and D format strings will
 totally blow C's stdio out of the water.

-- Dmitry Olshansky
Jun 26 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
27-Jun-2013 00:24, H. S. Teoh пишет:
 On Thu, Jun 27, 2013 at 12:17:24AM +0400, Dmitry Olshansky wrote:
 26-Jun-2013 22:52, H. S. Teoh пишет:
 [snip]
 Then throw in the array formatters %(...%), and D format strings will
 totally blow C's stdio out of the water.


Right. Although, I don't think they're *infinitely* more powerful... maybe 50 times more,

What if I just dig up that another 51st range with element type T that works with writefln ? :) but to be *infinitely* more powerful requires format strings
 to be Turing-complete. ;-)


 T

-- Dmitry Olshansky
Jun 26 2013
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 26.06.2013 20:52, schrieb H. S. Teoh:
 On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
 An interesting blog post found through Reddit:

 http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

 The most common problem they find are errors in the format string of
 printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable to use, in spite of its many problems. One of the most annoying features of iostream is the abuse of operator<< and operator>> for I/O. Format strings are an ingenious idea sorely lacking in the iostream department (though admittedly the way it was implemented in stdio is rather unsafe, due to the inability of C to do many compile-time checks).

I have been an adept of iostreams since day one and never understood why people complain so much about them or the operator<< and operator>> for that matter. But I try to keep my C++ code clean from C'isms anyway. -- Paulo
Jun 26 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression
Jun 26 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 3:56 PM, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression

Oh, and the cake topper is IOStreams performs badly, too.
Jun 26 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/27/2013 01:01 AM, Walter Bright wrote:
 On 6/26/2013 3:56 PM, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood
 why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression


If it's not thread safe, why does it have to acquire mutexes?
 Oh, and the cake topper is IOStreams performs badly, too.

Yes, but that's just a default. std::ios_base::sync_with_stdio(false); std::cin.tie(0);
Jun 26 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 4:48 PM, Timon Gehr wrote:
 On 06/27/2013 01:01 AM, Walter Bright wrote:
 On 6/26/2013 3:56 PM, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood
 why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression


If it's not thread safe, why does it have to acquire mutexes?

It's not thread safe because global state can be set and reset for every << operation: a << b << setglobalstate << c << resetglobalstate << d;
 Oh, and the cake topper is IOStreams performs badly, too.

Yes, but that's just a default. std::ios_base::sync_with_stdio(false); std::cin.tie(0);

Yeah, to make it as fast as C stdio you use C stdio. That's a ringing endorsement!
Jun 26 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 4:48 PM, Timon Gehr wrote:
 On 06/27/2013 01:01 AM, Walter Bright wrote:
 Oh, and the cake topper is IOStreams performs badly, too.

Yes, but that's just a default. std::ios_base::sync_with_stdio(false); std::cin.tie(0);

That's the least of iostreams' efficiency problems. Andrei
Jun 26 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 2:47 PM, Paulo Pinto wrote:
 Am 26.06.2013 20:52, schrieb H. S. Teoh:
 On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
 An interesting blog post found through Reddit:

 http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

 The most common problem they find are errors in the format string of
 printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable to use, in spite of its many problems. One of the most annoying features of iostream is the abuse of operator<< and operator>> for I/O. Format strings are an ingenious idea sorely lacking in the iostream department (though admittedly the way it was implemented in stdio is rather unsafe, due to the inability of C to do many compile-time checks).

I have been an adept of iostreams since day one and never understood why people complain so much about them or the operator<< and operator>> for that matter.

The problems with C++ iostreams are well-known and pernicious: 1. Extremely slow by design. 2. Force mixing representation with data by design 3. Keep conversion state within, meaning they force very bizarre tricks even for simple things such as printing/scanning hex numbers. 4. Approach to exception safety has the wrong default. 5. Approach to internationalization (locales) has the most byzantine design I've ever seen. Even people who took part to the design can't figure it all out. Andrei
Jun 26 2013
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 26 June 2013 at 18:08:10 UTC, bearophile wrote:
 In D this is one case similar to variable shadowing, that the 
 compiler doesn't help you with:
         this.z = z;

I'd argue that assigning something to itself is never useful.
Jun 26 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 12:03 PM, H. S. Teoh wrote:
 But yeah, that's bad practice and the compiler should warn about it. The
 reason it doesn't, though, IIRC is because of generic code, where it
 would suck to have to special-case when two template arguments actually
 alias the same thing.

It can also occur in machine-generated code, such as what mixin's do.
Jun 26 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 26, 2013 at 08:57:46PM +0200, Adam D. Ruppe wrote:
 On Wednesday, 26 June 2013 at 18:08:10 UTC, bearophile wrote:
In D this is one case similar to variable shadowing, that the
compiler doesn't help you with:
        this.z = z;

I'd argue that assigning something to itself is never useful.

Unless opAssign does something unusual. But yeah, that's bad practice and the compiler should warn about it. The reason it doesn't, though, IIRC is because of generic code, where it would suck to have to special-case when two template arguments actually alias the same thing. T -- People say I'm arrogant, and so I am!!
Jun 26 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
 import std.stdio;
 void main() {
     size_t x = 10;
     writefln("%d", x, x);
 }
 
 In a modern statically typed language I'd like such code to 
 give a compile-time error.

This looks like a bug to me. Please file one. :)

Not necessarily, since you might want a format string to be a runtime variable, like when doing translations. I could live with there being another function that does runtime though. Things might be confusing too because of positional parameters (%1$d). You might offer something that isn't necessarily used: config.dateFormat = "%3$d/%2$d"; writeln(config.dateFormat, year, month, day);
 Anyway, in the end I decided that naming member function 
 arguments after member variables is a Very Stupid Idea,

Blargh, I do it a lot. But I would be ok with the lhs of a member when there's a parameter of the same name requiring that you call it this.x.
 How would the compiler catch such cases in general, though? I 
 mean, like in arbitrarily complex boolean expressions.

The Microsoft compiler warned about it, after constant folding, working out to if(1). I'm a little concerned that it would complain about some false positives though, which can be quite deliberate in D, like if(__ctfe).
Jun 26 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 11:08 AM, bearophile wrote:
 On the other hand this D program prints just
 "10" with no errors, ignoring the second x:

 import std.stdio;
 void main() {
 size_t x = 10;
 writefln("%d", x, x);
 }

 In a modern statically typed language I'd like such code to give a
 compile-time error.

Actually this is good because it allows to customize the format string to print only a subset of available information (I've actually used this).
 This is how how Rust gets this right:

 println(fmt!("hello, %d", j))

 https://github.com/mozilla/rust/blob/master/src/libsyntax/ext/fmt.rs
 https://github.com/Aatch/rust-fmt

This is potentially inefficient because it creates a string instead of formatting straight in the output buffer. Andrei
Jun 26 2013
next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 26.06.2013 21:33, schrieb Andrei Alexandrescu:
 On 6/26/13 11:08 AM, bearophile wrote:
 On the other hand this D program prints just
 "10" with no errors, ignoring the second x:

 import std.stdio;
 void main() {
 size_t x = 10;
 writefln("%d", x, x);
 }

 In a modern statically typed language I'd like such code to give a
 compile-time error.

Actually this is good because it allows to customize the format string to print only a subset of available information (I've actually used this).

why is there always a tiny need for such tricky stuff - isn't that only usefull in very rare cases
Jun 26 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 26.06.2013 21:53, schrieb dennis luehring:
 Am 26.06.2013 21:33, schrieb Andrei Alexandrescu:
 On 6/26/13 11:08 AM, bearophile wrote:
 On the other hand this D program prints just
 "10" with no errors, ignoring the second x:

 import std.stdio;
 void main() {
 size_t x = 10;
 writefln("%d", x, x);
 }

 In a modern statically typed language I'd like such code to give a
 compile-time error.

Actually this is good because it allows to customize the format string to print only a subset of available information (I've actually used this).

why is there always a tiny need for such tricky stuff - isn't that only usefull in very rare cases

or better said - could then someone add a description to writefln why there is a need that writefln can "handle" more values then asked in the format-string - maybe with an example that realy shows the usefullness of this feature - and why an simple enum + if/else can't handle this also very elegant
Jun 26 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 12:53 PM, dennis luehring wrote:
 Am 26.06.2013 21:33, schrieb Andrei Alexandrescu:
 On 6/26/13 11:08 AM, bearophile wrote:
 On the other hand this D program prints just
 "10" with no errors, ignoring the second x:

 import std.stdio;
 void main() {
 size_t x = 10;
 writefln("%d", x, x);
 }

 In a modern statically typed language I'd like such code to give a
 compile-time error.

Actually this is good because it allows to customize the format string to print only a subset of available information (I've actually used this).

why is there always a tiny need for such tricky stuff - isn't that only usefull in very rare cases

This is no tricky stuff, simply allows the user to better separate format from data. The call offers the data, the format string chooses what and how to show it. Obvious examples include logging lines with various levels of detail and internationalized/localized/customized messages that don't need to display all data under all circumstances. Checking printf for undefined behavior and mistakes affecting memory safety of the entire program is a noble endeavor, and kudos to the current generation of C and C++ compilers that warn upon misuse. Forcing D's writef to match exactly the format string against the number of arguments passed is a waste of flexibility and caters to programmers who can't bring themselves to unittest or even look at the program output - not even once. Our motivation is to help those out of such habits, not support them. Andrei
Jun 26 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 1:31 PM, Andrej Mitrovic wrote:
 On 6/26/13, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used this).

Note that this works: writefln("%d", x, x); But the following throws since v2.061: writeln(format("%d", x, x)); std.format.FormatException C:\dmd-git\dmd2\windows\bin\..\..\src\phobos\std\string.d(2346): Orphan format arguments: args[1..2] I find the latter to be quite useful for debugging code, and wanted this feature for a long time.

I think that's a bug in format that we need to fix. Andrei
Jun 26 2013
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 7:38 PM, Andrej Mitrovic wrote:
 On 6/27/13, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 I think that's a bug in format that we need to fix.

Absolutely. We must remove informative error messages and implement sloppy APIs in the standard library.

Apologies for the overly curt reply, which I have now canceled. OK let's do this: aside from sarcasm, do you have good arguments to line up to back up your opinion? Andrei
Jun 26 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 8:26 PM, Andrej Mitrovic wrote:
 The way I see it, write/writef is primarily used for debugging and
 benefits having some lax features, whereas format is used in more
 heavy-duty work where it's important not to screw things up at the
 call site.

Then I think all the more format should allow more arguments than format specifiers. That does help serious use.
 But the bottom line is I don't think we need to force anything on
 anybody. If anything, we could split up the internal format
 implementation and provide format and safeFormat functions.

 format("%s %s", 1);  // no exceptions

NO! This is exactly the kind of code that is buggy and useless. The right use cases involve more arguments than format specifiers. Andrei
Jun 26 2013
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 8:05 PM, Andrej Mitrovic wrote:
 If you are the type of programmer who often tests their own code, why
 are you passing more arguments than needed to format?

My point is they're needed. Andrei
Jun 26 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 10:35 PM, Peter Williams wrote:
 On 27/06/13 14:20, H. S. Teoh wrote:
 On Thu, Jun 27, 2013 at 01:56:31PM +1000, Peter Williams wrote:
 [...]
 While you're fixing it can you modify it so that the format string
 can specify the order in which the arguments are replaced? This is
 very important for i18n. I apologize if it can already do this but
 I was unable to find any documentation of format()'s format string
 other than examples with %s at the appropriate places.

You can use positional arguments for this purpose. For example: writefln("%2$s %1$s", "a", "b"); outputs "b a".

Yes, I eventually found the documentation in std.format while I expected it to be in std.string along with the documentation the format() function. A reference to std.format in the documentation for format() (in std.string) would be nice.

The only point I'd negotiate would be to not throw with positional arguments, and throw with sequential arguments. All code that cares uses positional specifiers anyway. Andrei
Jun 26 2013
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 1:50 PM, bearophile wrote:
 Andrei Alexandrescu:

 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used
 this).

Your use case is a special case that breaks a general rule.

There's no special case here.
 That
 behavour is surprising, and it risks hiding some information silently.

Doesn't surprise me one bit.
 I
 think format() is more correct here.

I think it has a bug that diminishes its usefulness.
 If you want a special behavour you
 should use a special function as partialWritefln that ignores arguments
 not present in the format string.

That behavior is not special. Andrei
Jun 26 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 8:36 PM, Jonathan M Davis wrote:
 On Wednesday, June 26, 2013 19:18:27 Andrei Alexandrescu wrote:
 On 6/26/13 1:50 PM, bearophile wrote:
 Andrei Alexandrescu:
 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used
 this).

Your use case is a special case that breaks a general rule.

There's no special case here.

I have never heard anyone other than you even suggest this sort of behavior. Granted, I may just not talk with the right people, but that at least makes it sound like what you're suggesting is a very special case.

I'm not surprised. Not many people deal with things like localization or internalization.
 That
 behavour is surprising, and it risks hiding some information silently.

Doesn't surprise me one bit.

Well, it shocks most of us.

I'm also not moved by argumentum ad populum.
 We expect the number of arguments to a function to
 match the number of parameters, and with format strings, you're basically
 declaring what the parameters are, and then the other arguments to format or
 writefln are the arguments to the format string. Most of us don't even think
 about swapping out the format string at runtime.

That is the smoking gun. Andrei
Jun 26 2013
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 11:59 PM, Jonathan M Davis wrote:
 On Wednesday, June 26, 2013 23:47:15 Andrei Alexandrescu wrote:
 That
 behavour is surprising, and it risks hiding some information silently.

Doesn't surprise me one bit.

Well, it shocks most of us.

I'm also not moved by argumentum ad populum.

ad populum obviously isn't enough. But if we make a design decision that favors 1% of our user base and causes problems for the other 99%

My point is it doesn't cause any problem except for the 0.0001% who refuse to even take ONE look at the output of the format. Andrei
Jun 27 2013
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/26/13 11:59 PM, Jonathan M Davis wrote:
 I'm just pointing out that ignoring what the majority
 thinks is not necessarily a good idea.

Of course. In this case it is. Andrei
Jun 27 2013
prev sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 06/27/13 13:16, Nicolas Sicard wrote:
 On Wednesday, 26 June 2013 at 20:50:03 UTC, bearophile wrote:
 If you want a special behavour you should use a special function as
partialWritefln that ignores arguments not present in the format string.

Or maybe just define a new format specifier (%z, for 'zap'?) to ignore one or more arguments?

%.0s artur
Jun 27 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 26, 2013 at 09:07:30PM +0200, Adam D. Ruppe wrote:
 On Wednesday, 26 June 2013 at 18:54:17 UTC, H. S. Teoh wrote:
import std.stdio;
void main() {
    size_t x = 10;
    writefln("%d", x, x);
}

In a modern statically typed language I'd like such code to give
a compile-time error.

This looks like a bug to me. Please file one. :)

Not necessarily, since you might want a format string to be a runtime variable, like when doing translations. I could live with there being another function that does runtime though.

Wait, I thought we were talking about *compile-time* warnings for extraneous arguments to writefln. If the format string is not known at compile-time, then there's nothing to be done, and as you said, it's arguably better to allow more arguments than format specifiers if you're doing i18n. But if the format string is known at compile-time, and there are extraneous arguments, then it should be a warning / error. T -- People tell me that I'm skeptical, but I don't believe it.
Jun 26 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 26 June 2013 at 20:06:43 UTC, H. S. Teoh wrote:
 But if the format string is known at compile-time, and there are
 extraneous arguments, then it should be a warning / error.

We can't do that in D today, unless we do a writefln!"fmt"(args) in addition to writefln(fmt, args...); tbh I kinda wish we could overload functions on literals though.
Jun 26 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 27, 2013 at 12:17:24AM +0400, Dmitry Olshansky wrote:
 26-Jun-2013 22:52, H. S. Teoh пишет:
 [snip]
Then throw in the array formatters %(...%), and D format strings will
totally blow C's stdio out of the water.


Right. Although, I don't think they're *infinitely* more powerful... maybe 50 times more, but to be *infinitely* more powerful requires format strings to be Turing-complete. ;-) T -- If you look at a thing nine hundred and ninety-nine times, you are perfectly safe; if you look at it the thousandth time, you are in frightful danger of seeing it for the first time. -- G. K. Chesterton
Jun 26 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/26/13, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used this).

Note that this works: writefln("%d", x, x); But the following throws since v2.061: writeln(format("%d", x, x)); std.format.FormatException C:\dmd-git\dmd2\windows\bin\..\..\src\phobos\std\string.d(2346): Orphan format arguments: args[1..2] I find the latter to be quite useful for debugging code, and wanted this feature for a long time.
Jun 26 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Actually this is good because it allows to customize the format 
 string to print only a subset of available information (I've 
 actually used this).

Your use case is a special case that breaks a general rule. That behavour is surprising, and it risks hiding some information silently. I think format() is more correct here. If you want a special behavour you should use a special function as partialWritefln that ignores arguments not present in the format string. Bye, bearophile
Jun 26 2013
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 26 Jun 2013 22:13:29 +0200
schrieb "Adam D. Ruppe" <destructionator gmail.com>:

 On Wednesday, 26 June 2013 at 20:06:43 UTC, H. S. Teoh wrote:
 But if the format string is known at compile-time, and there are
 extraneous arguments, then it should be a warning / error.

We can't do that in D today, unless we do a writefln!"fmt"(args) in addition to writefln(fmt, args...); tbh I kinda wish we could overload functions on literals though.

So the compiler would eagerly turn arguments into compile-time parameters and offer some trait to check if a particular instantiation of writefln made 'fmt' a template argument ? static if (__traits(ctKnown, fmt)) { // do static analysis of format specifiers } else { // regular code path } -- Marco
Jun 26 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 26 June 2013 at 20:51:54 UTC, Marco Leise wrote:
 So the compiler would eagerly turn arguments into compile-time
 parameters and offer some trait to check if a particular
 instantiation of writefln made 'fmt' a template argument ?

Yeah, something like that. Or making literals a different type that we can overload on (which would also be kinda cool for user defined replacements for them). writefln(T...)(__string_literal fmt, T t) distinct from string fmt. The literals would always be available at compile time, whether the function is a template or not.
Jun 26 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 26, 2013 at 11:47:32PM +0200, Paulo Pinto wrote:
 Am 26.06.2013 20:52, schrieb H. S. Teoh:

None of my C++ code uses iostream. I still find stdio.h more
comfortable to use, in spite of its many problems. One of the most
annoying features of iostream is the abuse of operator<< and
operator>> for I/O. Format strings are an ingenious idea sorely
lacking in the iostream department (though admittedly the way it was
implemented in stdio is rather unsafe, due to the inability of C to
do many compile-time checks).

I have been an adept of iostreams since day one and never understood why people complain so much about them or the operator<< and operator>> for that matter.

They're ugly, that's why. :) And misleading to anyone familiar with bit operators. But that's beside the point. The main problem is that format strings are inadequately replaced by operator<< and its ilk; C++ tried to get around them by introducing the concept of manipulators and whatnot, but that only increased the ugliness of it all. Plus, it made a string of <<'s stateful, (if the previous line sends a manipulator to cout, then subsequent <<'s are subtly modified from their usual behaviour) making it harder to read. D's writefln is far superior to C's stdio and C++'s iostream, in any case.
 But I try to keep my C++ code clean from C'isms anyway.

I tried doing that once. It was a rather painful experience. That's the problem with C++: it started out as being C + classes, but then wanted really badly to assume its own identity, so it accumulated a whole bunch of other stuff, but then it never really cut its ties with C, and the old C + classes heritage lives on. As a result, its OO system leaves a lot to be desired when compared with, say, Java, but using it for just C + classes seems underwhelming when there's so much more to the language than just that. So you end up in this limbo where it's more than C + classes, but doesn't quite make it to the level of real OO like Java, and lots of hacks and ugly corner cases creep in to try to hold the tower of cards together. Trying to be free of C'isms only exposed the flaws of C++'s OO system even more. I found that writing C + classes is still the least painful way to use C++. Fortunately, there's D to turn to. ;-) T -- Too many people have open minds but closed eyes.
Jun 26 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/26/2013 11:08 AM, bearophile wrote:
 A simpler example:

 enum INPUT_VALUE = 2;
 void f(uint flags) {
      if (flags | INPUT_VALUE) {}
 }


 I have just added it to Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=10480

We've discussed this one before. I oppose it, on grounds I added to the enhancement request.
Jun 26 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 26, 2013 23:47:32 Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood why
 people complain so much about them or the operator<< and operator>>
 for that matter.

I actually like them (particularly because I really don't like how primitive printf is with regards to types), but they fail completely when it comes to formatting, and you have to break up your strings too much. e.g. cout << "[" << value1 << ", " << value2 << "]" << endl; instead of printf("[%d, %d]\n", value1, value2); writefln fixes the typing problem that printf has in that it's not only type safe, but it allows you to just use %s everywhere (and you don't have to use c_str() on strings all over the place like with C++'s std::string), so I definitely think that what we have with D is far better than either iostreams or printf. But with C++, which I use depends on my mood and the code, because both iostreams and printf suck. They just suck differently. But I've never actually minded that they used the shift operators for I/O beyond the fact that putting a bunch of values in the middle of a string with iostreams breaks things up too much (unlike with format strings). - Jonathan M Davis
Jun 26 2013
prev sibling next sibling parent Peter Williams <pwil3058 bigpond.net.au> writes:
On 27/06/13 10:33, Walter Bright wrote:
 On 6/26/2013 4:48 PM, Timon Gehr wrote:
 On 06/27/2013 01:01 AM, Walter Bright wrote:
 On 6/26/2013 3:56 PM, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never understood
 why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression


If it's not thread safe, why does it have to acquire mutexes?

It's not thread safe because global state can be set and reset for every << operation: a << b << setglobalstate << c << resetglobalstate << d;
 Oh, and the cake topper is IOStreams performs badly, too.

Yes, but that's just a default. std::ios_base::sync_with_stdio(false); std::cin.tie(0);

Yeah, to make it as fast as C stdio you use C stdio. That's a ringing endorsement!

This form of output usually causes problems with i18n as not all languages have the same types of grammar and sometimes the order of items needs to be changed to achieve a valid grammatical form in the translation. Peter
Jun 26 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/27/13, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 I think that's a bug in format that we need to fix.

Absolutely. We must remove informative error messages and implement sloppy APIs in the standard library.
Jun 26 2013
prev sibling next sibling parent "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Thursday, 27 June 2013 at 02:40:53 UTC, Andrei Alexandrescu 
wrote:
 You are wrong.

format has thrown exceptions with such code since v2.000 (that's the year 2007). It's only in v2.061 that it has finally gotten an informative error message in the exception it throws.
 Forcing D's writef to match exactly the format string against 
 the number of arguments passed is a waste of flexibility and 
 caters to programmers who can't bring themselves to unittest or 
 even look at the program output - not even once.

If you are the type of programmer who often tests their own code, why are you passing more arguments than needed to format?
Jun 26 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/27/13, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Apologies for the overly curt reply, which I have now canceled. OK let's
 do this: aside from sarcasm, do you have good arguments to line up to
 back up your opinion?

format has thrown exceptions with such code since v2.000 (that's the year 2007). It's only in v2.061 that it has finally gotten an informative error message in the exception it throws. I'm not against having the current lax argument count handling feature for writef, but for format it's enabled me to catch bugs at the call site. The way I see it, write/writef is primarily used for debugging and benefits having some lax features, whereas format is used in more heavy-duty work where it's important not to screw things up at the call site. As for the unittesting argument (if the argument applies to format), in theory the argument is perfectly sound and I agree with you, but in the real world convenience trumps manual labor. We should unittest more, but it's super-convenient when format tells you you've screwed something up when you're e.g. writing some D-style shell scripts or when porting C code to D, maybe you don't have a lot of time to unittest constantly. --- But the bottom line is I don't think we need to force anything on anybody. If anything, we could split up the internal format implementation and provide format and safeFormat functions. format("%s %s", 1); // no exceptions safeFormat("%s %s", 1); // exception thrown All current code will de-facto still work (because otherwise they would already get runtime exceptions), and all future code could start using safer formatting functions. We could have that, or some getopt-style configuration options, such as: format("%s %s", std.format.config.safe, 1); Or some other form of wizardry (perhaps a compile-time argument). Anything goes. Let's not break too much sweat arguing for what should ultimately be a customization point.
Jun 26 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, June 26, 2013 19:18:27 Andrei Alexandrescu wrote:
 On 6/26/13 1:50 PM, bearophile wrote:
 Andrei Alexandrescu:
 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used
 this).

Your use case is a special case that breaks a general rule.

There's no special case here.

I have never heard anyone other than you even suggest this sort of behavior. Granted, I may just not talk with the right people, but that at least makes it sound like what you're suggesting is a very special case.
 That
 behavour is surprising, and it risks hiding some information silently.

Doesn't surprise me one bit.

Well, it shocks most of us. We expect the number of arguments to a function to match the number of parameters, and with format strings, you're basically declaring what the parameters are, and then the other arguments to format or writefln are the arguments to the format string. Most of us don't even think about swapping out the format string at runtime.
 I
 think format() is more correct here.

I think it has a bug that diminishes its usefulness.

It's a difference in design. It's only a bug if it's not what it was designed to do. And I think that it's clear that format was never designed to accept more arguments than format specifiers given that it's never worked that way. That doesn't necessarily mean that it _shouldn't_ work that way, but the only bug I see here is that the designs of writefln and format don't match. Which one is better designed is up for debate.
 If you want a special behavour you
 should use a special function as partialWritefln that ignores arguments
 not present in the format string.

That behavior is not special.

Well, it's special enough that most of us seem to have never even thought of it, let alone thought that it was useful or a good idea. I don't know whether it's really better to have format and writefln ignore extra arguments or not. My gut reaction is definitely that it's a very bad idea and will just lead to bugs. But clearly you have use cases for it and think that it's very useful. So, maybe it _is_ worth doing. But I'd be inclined to go with Bearophile's suggestion and make it so that a wrapper function or alternate implementation handled the ignoring of extra arguments. Then it would be clear in the code that that's what was intended, and we would get the default behavior that most of us expect. An alternative would be a template argument to writeln and format which allowed you to choose which behavior you wanted and defaulted to not ignoring arguments. - Jonathan M Davis
Jun 26 2013
prev sibling next sibling parent Peter Williams <pwil3058 bigpond.net.au> writes:
On 27/06/13 12:17, Andrei Alexandrescu wrote:
 On 6/26/13 1:31 PM, Andrej Mitrovic wrote:
 On 6/26/13, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 Actually this is good because it allows to customize the format string
 to print only a subset of available information (I've actually used
 this).

Note that this works: writefln("%d", x, x); But the following throws since v2.061: writeln(format("%d", x, x)); std.format.FormatException C:\dmd-git\dmd2\windows\bin\..\..\src\phobos\std\string.d(2346): Orphan format arguments: args[1..2] I find the latter to be quite useful for debugging code, and wanted this feature for a long time.

I think that's a bug in format that we need to fix.

While you're fixing it can you modify it so that the format string can specify the order in which the arguments are replaced? This is very important for i18n. I apologize if it can already do this but I was unable to find any documentation of format()'s format string other than examples with %s at the appropriate places. Peter
Jun 26 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jun 27, 2013 at 01:56:31PM +1000, Peter Williams wrote:
[...]
 While you're fixing it can you modify it so that the format string
 can specify the order in which the arguments are replaced?  This is
 very important for i18n.  I apologize if it can already do this but
 I was unable to find any documentation of format()'s format string
 other than examples with %s at the appropriate places.

You can use positional arguments for this purpose. For example: writefln("%2$s %1$s", "a", "b"); outputs "b a". T -- The easy way is the wrong way, and the hard way is the stupid way. Pick one.
Jun 26 2013
prev sibling next sibling parent Peter Williams <pwil3058 bigpond.net.au> writes:
On 27/06/13 14:20, H. S. Teoh wrote:
 On Thu, Jun 27, 2013 at 01:56:31PM +1000, Peter Williams wrote:
 [...]
 While you're fixing it can you modify it so that the format string
 can specify the order in which the arguments are replaced?  This is
 very important for i18n.  I apologize if it can already do this but
 I was unable to find any documentation of format()'s format string
 other than examples with %s at the appropriate places.

You can use positional arguments for this purpose. For example: writefln("%2$s %1$s", "a", "b"); outputs "b a".

Yes, I eventually found the documentation in std.format while I expected it to be in std.string along with the documentation the format() function. A reference to std.format in the documentation for format() (in std.string) would be nice. Peter
Jun 26 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 26, 2013 23:49:41 Andrei Alexandrescu wrote:
 The only point I'd negotiate would be to not throw with positional
 arguments, and throw with sequential arguments. All code that cares uses
 positional specifiers anyway.

That sounds like a good compromise. - Jonathan M Davis
Jun 26 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 26, 2013 23:47:15 Andrei Alexandrescu wrote:
 That
 behavour is surprising, and it risks hiding some information silently.

Doesn't surprise me one bit.

Well, it shocks most of us.

I'm also not moved by argumentum ad populum.

ad populum obviously isn't enough. But if we make a design decision that favors 1% of our user base and causes problems for the other 99%, then I think that we've made a big mistake. And while having most everyone disagree with one person does not make that person wrong, it _does_ make it more likely that they're wrong. So, while ad populum should not be the sole reason to make a decision, I think that it's generally a bad idea to ignore it. I'm not trying to say anything about this particular discussion and whether we should go with the majority on this; I'm just pointing out that ignoring what the majority thinks is not necessarily a good idea. - Jonathan M Davis
Jun 26 2013
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Thursday, 27 June 2013 at 02:25:54 UTC, Andrei Alexandrescu 
wrote:
 On 6/26/13 2:47 PM, Paulo Pinto wrote:
 Am 26.06.2013 20:52, schrieb H. S. Teoh:
 On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
 An interesting blog post found through Reddit:

 http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/

 The most common problem they find are errors in the format 
 string of
 printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable to use, in spite of its many problems. One of the most annoying features of iostream is the abuse of operator<< and operator>> for I/O. Format strings are an ingenious idea sorely lacking in the iostream department (though admittedly the way it was implemented in stdio is rather unsafe, due to the inability of C to do many compile-time checks).

I have been an adept of iostreams since day one and never understood why people complain so much about them or the operator<< and operator>> for that matter.

The problems with C++ iostreams are well-known and pernicious: 1. Extremely slow by design. 2. Force mixing representation with data by design 3. Keep conversion state within, meaning they force very bizarre tricks even for simple things such as printing/scanning hex numbers. 4. Approach to exception safety has the wrong default. 5. Approach to internationalization (locales) has the most byzantine design I've ever seen. Even people who took part to the design can't figure it all out. Andrei

I always liked their OO model and for the type of applications we use performance never was a problem. My iostreams experience is mostly coupled to serialization of data structures, simple console applications. Exception safety might be an issue, sadly I was never able to write portable C++ code at work that used either RTTI or exceptions. Just too many issues, which always lead to the architects forbidding their use. Thanks for the explanation. -- Paulo
Jun 27 2013
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 26 June 2013 at 22:56:41 UTC, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never 
 understood why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression

Thanks for listing those issues they actually represent areas where I never used iostreams directly. 1. our iostream usage tends to be done from a central place 2. I am yet to write portable C++ code with exceptions turned on 3. wasn't aware of it
Jun 27 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 ad populum obviously isn't enough. But if we make a design 
 decision that
 favors 1% of our user base and causes problems for the other 
 99%, then I think
 that we've made a big mistake. And while having most everyone 
 disagree with
 one person does not make that person wrong, it _does_ make it 
 more likely that
 they're wrong. So, while ad populum should not be the sole 
 reason to make a
 decision, I think that it's generally a bad idea to ignore it. 
 I'm not trying
 to say anything about this particular discussion and whether we 
 should go with
 the majority on this; I'm just pointing out that ignoring what 
 the majority
 thinks is not necessarily a good idea.

I second Andrei on that one. It is useful in all cases string format is passed as argument/configuration/whatever.
Jun 27 2013
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 26 June 2013 at 22:04:39 UTC, H. S. Teoh wrote:
 On Wed, Jun 26, 2013 at 11:47:32PM +0200, Paulo Pinto wrote:
 Am 26.06.2013 20:52, schrieb H. S. Teoh:

None of my C++ code uses iostream. I still find stdio.h more
comfortable to use, in spite of its many problems. One of the 
most
annoying features of iostream is the abuse of operator<< and
operator>> for I/O. Format strings are an ingenious idea 
sorely
lacking in the iostream department (though admittedly the way 
it was
implemented in stdio is rather unsafe, due to the inability 
of C to
do many compile-time checks).

I have been an adept of iostreams since day one and never understood why people complain so much about them or the operator<< and operator>> for that matter.

They're ugly, that's why. :) And misleading to anyone familiar with bit operators.

You just have to get used to the concept that operators are just abstract method names, like in abstract math. Operators are method calls, never think of them as anything else. This is no different from languages like Eiffel, Smalltalk and all the others that allow special characters as method names.
 But that's beside the point. The main problem is that format 
 strings are
 inadequately replaced by operator<< and its ilk; C++ tried to 
 get around
 them by introducing the concept of manipulators and whatnot, 
 but that
 only increased the ugliness of it all. Plus, it made a string 
 of <<'s
 stateful, (if the previous line sends a manipulator to cout, 
 then
 subsequent <<'s are subtly modified from their usual behaviour) 
 making
 it harder to read.

Yeah, the state fulness is a bit verbose, but I can live with it.
 D's writefln is far superior to C's stdio and C++'s iostream, 
 in any
 case.

Agreed.
 But I try to keep my C++ code clean from C'isms anyway.

I tried doing that once. It was a rather painful experience. That's the problem with C++: it started out as being C + classes, but then wanted really badly to assume its own identity, so it accumulated a whole bunch of other stuff, but then it never really cut its ties with C, and the old C + classes heritage lives on. As a result, its OO system leaves a lot to be desired when compared with, say, Java, but using it for just C + classes seems underwhelming when there's so much more to the language than just that. So you end up in this limbo where it's more than C + classes, but doesn't quite make it to the level of real OO like Java, and lots of hacks and ugly corner cases creep in to try to hold the tower of cards together. Trying to be free of C'isms only exposed the flaws of C++'s OO system even more. I found that writing C + classes is still the least painful way to use C++. Fortunately, there's D to turn to. ;-) T

When I got to use C for the first time, back in 1992, I already knew a few Basic and Pascal dialects. So C felt really primitive with no added value over Object Pascal other than being more portable to systems I could not afford anyway. The year thereafter I got my hands on Turbo C++ and then I found a world where I could have C's portability and with a bit of effort some of Pascal's safety back. Since those days, I only touched C when required for university work and on my first job. Nowadays it is mostly JVM/.NET at work, with alternative languages on my free time, including D. :) -- Paulo
Jun 27 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 June 2013 at 07:33:16 UTC, Paulo Pinto wrote:
 On Wednesday, 26 June 2013 at 22:56:41 UTC, Walter Bright wrote:
 On 6/26/2013 2:47 PM, Paulo Pinto wrote:
 I have been an adept of iostreams since day one and never 
 understood why people
 complain so much about them or the operator<< and operator>>
 for that matter.

Even if you can get past the execrable look of it, it suffers from at least 3 terrible technical problems: 1. not thread safe 2. not exception safe 3. having to acquire/release mutexes for every << operation rather than once for the whole expression

Thanks for listing those issues they actually represent areas where I never used iostreams directly. 1. our iostream usage tends to be done from a central place 2. I am yet to write portable C++ code with exceptions turned on 3. wasn't aware of it

I *used* to like c++'s streams. Very recently, I did a project with high input output (not in terms of speed, but in terms of complexity), and it was *horrible!* Streams have over fprints the ability to statically extract type, which is a win. However, it doesn't have format strings: What it has is global state, where it says "now I'm printing in hex", "now I'm printing with '0' as filler". "My next print will have 6 width"! Long story short, to print a 2-width '0' padded hex, followed by a 4 width ' ' padded decimal takes some 120 characters, completely obliterating what the original string was about. Trying to re-read the damn thing is near impossible. If you are serious about your stream manipulation, it also means you should save the state of your stream before each write, and then restore it at the end of your write (or in your catch...) IMO, the concept of manipulators and global state is plain retarded. It would have been better to have a stormat *object*, that references the object that needs to be passed, and then the stream handles the formated object printing. EG: std::cout << "0x" << std::format(5).w(2).f('0').hex() << std::endl; It is still a bit verbose, but very less so, and much less intrusive, and it keeps the formats tied to the object, rather than the stream. Fun fact, this should actually be pretty easy to implement :D...
Jun 27 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 27 June 2013 at 02:17:09 UTC, Andrei Alexandrescu 
wrote:
 On 6/26/13 1:31 PM, Andrej Mitrovic wrote:
 On 6/26/13, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>
  wrote:
 Actually this is good because it allows to customize the 
 format string
 to print only a subset of available information (I've 
 actually used this).

Note that this works: writefln("%d", x, x); But the following throws since v2.061: writeln(format("%d", x, x)); std.format.FormatException C:\dmd-git\dmd2\windows\bin\..\..\src\phobos\std\string.d(2346): Orphan format arguments: args[1..2] I find the latter to be quite useful for debugging code, and wanted this feature for a long time.

I think that's a bug in format that we need to fix. Andrei

I wanted to react on this: I'm surprised that writef and format *could* even have different behaviors (!) Don't both just forward to some sort of "format(sink, fmt, args...)" function that handles all the common code? Or are there other subtle differences, bugs, where writef would produce different output from format?
Jun 27 2013
prev sibling next sibling parent "renoX" <renozyx gmail.com> writes:
On Wednesday, 26 June 2013 at 18:08:10 UTC, bearophile wrote:
[cut]
 The most common problem they find are errors in the format 
 string of printf-like functions (despite the code is C++):

The top type of bug that /analyze finds is format string errors 
– mismatches between printf-style format strings and the 
corresponding arguments. Sometimes there is a missing argument, 
sometimes there is an extra argument, and sometimes the 
arguments don’t match, such as printing a float, long or ‘long 
long’ with %d.<

Such errors in D are less bad, because writef("%d",x) is usable for all kind of integral values. On the other hand this D program prints just "10" with no errors, ignoring the second x: import std.stdio; void main() { size_t x = 10; writefln("%d", x, x); } In a modern statically typed language I'd like such code to give a compile-time error.

An even better thing would be to have a design which reduce a lot the probability of format string error, see Scala: val name = "James" println(s"Hello, $name") renoX
Jun 27 2013
prev sibling next sibling parent "Nicolas Sicard" <dransic gmail.com> writes:
On Wednesday, 26 June 2013 at 20:50:03 UTC, bearophile wrote:
 If you want a special behavour you should use a special 
 function as partialWritefln that ignores arguments not present 
 in the format string.

Or maybe just define a new format specifier (%z, for 'zap'?) to ignore one or more arguments?
Jun 27 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 Andrei Alexandrescu:
 The only point I'd negotiate would be to not throw with 
 positional arguments, and throw with sequential arguments.
 All code that cares uses positional specifiers anyway.

That sounds like a good compromise.

OK :-) Bye, bearophile
Jun 27 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 But if we make a design decision that favors 1% of our userbase

I really think we all need to be more careful about these kinds of statements. I often see posts on the newsgroup where someone says "feature/function X is totally useless".... and it is something I actually use. In this thread, there's I think three people who said the extra arguments are a good thing (myself, Andrei, and Peter). And there's what, maybe a dozen participants in the thread (I didn't count, I think it is less though)? That's not a big enough sample to be statistically significant, but what are the odds that this thread is so skewed that only 1% of D's userbase feels this way, when 25% of the thread disagrees?
Jun 27 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/27/13, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 NO! This is exactly the kind of code that is buggy and useless. The
 right use cases involve more arguments than format specifiers.

I mistyped that, I meant: format("%s", 1, 2); // no exceptions in future release safeFormat("%s", 1, 2); // exception thrown
Jun 27 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 27 June 2013 at 13:11:55 UTC, Andrej Mitrovic wrote:
 I mistyped that, I meant:

 format("%s", 1, 2);  // no exceptions in future release
 safeFormat("%s", 1, 2);  // exception thrown

I think if there's going to be a new function anyway, it might as well be more like the ctFormat bearophile mentioned, and check it at compile time.
Jun 27 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/27/13, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 On 6/27/13, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 NO! This is exactly the kind of code that is buggy and useless. The
 right use cases involve more arguments than format specifiers.

I mistyped that, I meant: format("%s", 1, 2); // no exceptions in future release safeFormat("%s", 1, 2); // exception thrown

I'll add that the exception throwing was not introduced v2.000 (because my test-case was wrong), but was introduced in 2.062: std.format.FormatException C:\dmd-git\dmd2\windows\bin\..\..\src\phobos\std\string.d(2346): Orphan format arguments: args[1..2] I'd like to keep this behavior in either a new separate format function or as a customization point of format.
Jun 27 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/27/13, Adam D. Ruppe <destructionator gmail.com> wrote:
 I think if there's going to be a new function anyway, it might as
 well be more like the ctFormat bearophile mentioned, and check it
 at compile time.

Yeah but it's not always possible to know what the formatting string is. For example, maybe you have an enum array of format strings but a runtime index into this array which you pass to format at runtime. I've ported C samples before that used this style of formatting.
Jun 27 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 Yeah but it's not always possible to know what the formatting 
 string
 is. For example, maybe you have an enum array of format strings 
 but a
 runtime index into this array which you pass to format at 
 runtime.
 I've ported C samples before that used this style of formatting.

In some cases the format string is computed at run-time, so it can't be a template argument. A ctWritefln is for the other cases. Bye, bearophile
Jun 27 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 But the bottom line is I don't think we need to force anything 
 on anybody. If anything, we could split up the internal format
 implementation and provide format and safeFormat functions.

 format("%s %s", 1);  // no exceptions

NO! This is exactly the kind of code that is buggy and useless. The right use cases involve more arguments than format specifiers.

Currently this code is accepted (and it prints "A B10"), but I think it should be not accepted (also why is it 1-based?): import std.stdio; void main() { writefln("A%2$s B%1$s", 10); }
 The only point I'd negotiate would be to not throw with 
 positional
 arguments, and throw with sequential arguments. All code that 
 cares uses
 positional specifiers anyway.

I have opened this ER: http://d.puremagic.com/issues/show_bug.cgi?id=10489 Bye, bearophile
Jun 27 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 27 June 2013 at 19:22:08 UTC, bearophile wrote:
 (also why is it 1-based?):

It is specified that way in the Single Unix Specification for format strings. I'm not sure why they did it that way, but if we changed it, that would be surprising since the format string is otherwise similar to printf, but this would be different.
Jun 27 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, June 27, 2013 13:47:53 Adam D. Ruppe wrote:
 On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 But if we make a design decision that favors 1% of our userbase

I really think we all need to be more careful about these kinds of statements. I often see posts on the newsgroup where someone says "feature/function X is totally useless".... and it is something I actually use. In this thread, there's I think three people who said the extra arguments are a good thing (myself, Andrei, and Peter). And there's what, maybe a dozen participants in the thread (I didn't count, I think it is less though)? That's not a big enough sample to be statistically significant, but what are the odds that this thread is so skewed that only 1% of D's userbase feels this way, when 25% of the thread disagrees?

I wasn't arguing that only 1% of the users care about this particular feature. What I was objecting to was that Andrei seemed to think that argumentum ad populum was an invalid argument, and when you're talking about an API and the userbase for that API, I really don't think that argumentum ad populum is invalid. If you make a design decision that causes problems for 99% of your users, then it's a bad design decision, and I think that the fact that the majority of the users would then be against it should hold weight. For this particular feature, I don't know how many people want format to ignore extra arguments. Certainly, prior to this thread, I'd heard Andrei discuss it once, and I've never heard anyone else even mention it. And initially, everyone else in this thread thought that that it was a bad idea. So, it at least looked like the majority thought that it was a bad idea. And if that held true, then I think that that would at least be an argument for making format require that the number of arguments match the number of format specifiers. It wouldn't necessarily be enough in and of itself (particularly if the use case for allowing more is valid and doesn't really harm the people who don't use it), but I think that it would still be valid argument. So, I was objecting to Andrei's assertion that what the majority thought was not a valid argument rather than trying to specifically assert that we definitely shouldn't do this because of how many people were against it. What we do ultimately depends on what all of the arguments are and what all of the various pros and cons are. But your point is well taken about 1% vs 99% and whatnot. We certainly don't want to decide that only 1% of users care about something based on the half-a- dozen or so people who happen to have posted in a discussion on it over the course of a few hours. - Jonathan M Davis
Jun 27 2013
prev sibling next sibling parent Peter Williams <pwil3058 bigpond.net.au> writes:
On 27/06/13 23:33, bearophile wrote:
 Andrej Mitrovic:

 Yeah but it's not always possible to know what the formatting string
 is. For example, maybe you have an enum array of format strings but a
 runtime index into this array which you pass to format at runtime.
 I've ported C samples before that used this style of formatting.

In some cases the format string is computed at run-time, so it can't be a template argument. A ctWritefln is for the other cases.

In internationalized/localized code the string is highly likely to be determined at run time. Peter
Jun 27 2013
prev sibling next sibling parent Peter Williams <pwil3058 bigpond.net.au> writes:
On 28/06/13 05:52, Jonathan M Davis wrote:
 On Thursday, June 27, 2013 13:47:53 Adam D. Ruppe wrote:
 On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 But if we make a design decision that favors 1% of our userbase

I really think we all need to be more careful about these kinds of statements. I often see posts on the newsgroup where someone says "feature/function X is totally useless".... and it is something I actually use. In this thread, there's I think three people who said the extra arguments are a good thing (myself, Andrei, and Peter). And there's what, maybe a dozen participants in the thread (I didn't count, I think it is less though)? That's not a big enough sample to be statistically significant, but what are the odds that this thread is so skewed that only 1% of D's userbase feels this way, when 25% of the thread disagrees?

I wasn't arguing that only 1% of the users care about this particular feature. What I was objecting to was that Andrei seemed to think that argumentum ad populum was an invalid argument,

Plato would agree with Andrei. Peter
Jun 27 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, June 28, 2013 10:44:36 Peter Williams wrote:
 On 28/06/13 05:52, Jonathan M Davis wrote:
 On Thursday, June 27, 2013 13:47:53 Adam D. Ruppe wrote:
 On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 But if we make a design decision that favors 1% of our userbase

I really think we all need to be more careful about these kinds of statements. I often see posts on the newsgroup where someone says "feature/function X is totally useless".... and it is something I actually use. In this thread, there's I think three people who said the extra arguments are a good thing (myself, Andrei, and Peter). And there's what, maybe a dozen participants in the thread (I didn't count, I think it is less though)? That's not a big enough sample to be statistically significant, but what are the odds that this thread is so skewed that only 1% of D's userbase feels this way, when 25% of the thread disagrees?

I wasn't arguing that only 1% of the users care about this particular feature. What I was objecting to was that Andrei seemed to think that argumentum ad populum was an invalid argument,

Plato would agree with Andrei.

It's definitely true that just because a lot of people think something does not make it true (e.g. having the majority of people think that the sun goes around the earth does not make it so). But when you're debating an API, your debating what a lot of people are going to be using, and if the majority of them don't think that it's user-friendly or otherwise well-designed, then I really don't think that it makes sense to say that the fact that most of the users think that doesn't mean anything or that it's not relevant. I think that majority opinion is _very_ relevant when discussing APIs or any type of user interface. It may be the case that they're wrong and that after using a new API or user interface, they'll eventually come to the conclusion that they're wrong, but their opinion is _very_ relevant IMHO. - Jonathan M Davis
Jun 27 2013
prev sibling next sibling parent reply Peter Williams <pwil3058 bigpond.net.au> writes:
On 28/06/13 11:47, Jonathan M Davis wrote:
 On Friday, June 28, 2013 10:44:36 Peter Williams wrote:
 On 28/06/13 05:52, Jonathan M Davis wrote:
 On Thursday, June 27, 2013 13:47:53 Adam D. Ruppe wrote:
 On Thursday, 27 June 2013 at 06:59:49 UTC, Jonathan M Davis wrote:
 But if we make a design decision that favors 1% of our userbase

I really think we all need to be more careful about these kinds of statements. I often see posts on the newsgroup where someone says "feature/function X is totally useless".... and it is something I actually use. In this thread, there's I think three people who said the extra arguments are a good thing (myself, Andrei, and Peter). And there's what, maybe a dozen participants in the thread (I didn't count, I think it is less though)? That's not a big enough sample to be statistically significant, but what are the odds that this thread is so skewed that only 1% of D's userbase feels this way, when 25% of the thread disagrees?

I wasn't arguing that only 1% of the users care about this particular feature. What I was objecting to was that Andrei seemed to think that argumentum ad populum was an invalid argument,

Plato would agree with Andrei.

It's definitely true that just because a lot of people think something does not make it true (e.g. having the majority of people think that the sun goes around the earth does not make it so). But when you're debating an API, your debating what a lot of people are going to be using, and if the majority of them don't think that it's user-friendly or otherwise well-designed, then I really don't think that it makes sense to say that the fact that most of the users think that doesn't mean anything or that it's not relevant. I think that majority opinion is _very_ relevant when discussing APIs or any type of user interface. It may be the case that they're wrong and that after using a new API or user interface, they'll eventually come to the conclusion that they're wrong, but their opinion is _very_ relevant IMHO.

Yes, but voting is very seldom the best way so decide a technical issue. You want the best technical solution not the one supported by the best lobbyists. Peter
Jun 27 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/27/2013 8:22 PM, Peter Williams wrote:
 Yes, but voting is very seldom the best way so decide a technical issue.  You
 want the best technical solution not the one supported by the best lobbyists.

A sound technical argument can trump votes. There are enough cases of votes overriding technical arguments to everyone's later regret, like exported templates in C++ :-)
Jun 28 2013
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, June 28, 2013 00:48:20 Walter Bright wrote:
 On 6/27/2013 8:22 PM, Peter Williams wrote:
 Yes, but voting is very seldom the best way so decide a technical issue. 
 You want the best technical solution not the one supported by the best
 lobbyists.

overriding technical arguments to everyone's later regret, like exported templates in C++ :-)

Agreed. I just disagree with the idea that what the majority thinks is irrelevant. It _is_ relevant, but it's just one of the things to consider. The place where it's likely to matter most is when you have multiple choices which are all more or less equal. Where it's likely to matter the least is when you have strong technical arguments against the majority opinion, and the majority opinion does not have similarly strong arguments in its favor. - Jonathan M Davis
Jun 28 2013