www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Google Code Jam 2011 Language Usage

reply Peter Alexander <peter.alexander.au gmail.com> writes:
The Google Code Jam is a programming competition where you have to solve 
algorithmic problems using whatever programming language you like.

The stats of what programming languages were used in the first round 
were collected:

http://www.go-hero.net/jam/11/languages

Some select figures for languages used to solve the first question:

C++	5032
Java	2321

C	532
Haskell	100
Clojure	13
GO	13
D	5
Scheme	5

(In previous 3 years, D had between 2-4 entries for the first question, 
so not much change, despite total contestant counts increasing quite 
dramatically)

Generally, I believe people tend to use the language they are most 
familiar with, and for people that know more than one language they will 
choose the one that is most expressive. Stability of implementations 
could also be an issue.

Obviously you can't draw too many conclusions from this alone, but more 
data is always better. Take what you will from it.
May 08 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Peter Alexander:

 Some select figures for languages used to solve the first question:
 
 C++	5032
 Java	2321

 C	532
 Haskell	100
 Clojure	13
 GO	13
 D	5
 Scheme	5
The third most used language is Python.
 (In previous 3 years, D had between 2-4 entries for the first question, 
 so not much change, despite total contestant counts increasing quite 
 dramatically)
But a person from Japan has used D to be among the top ten, this is good: http://www.go-hero.net/jam/11/name/hos.lyric The first, second and third persons are using the most used language, second most used and third most used (C++, Java, Python) :-)
 Obviously you can't draw too many conclusions from this alone, but more 
 data is always better. Take what you will from it.
From those numbers it looks like D isn't gaining mindshare, unfortunately. Go appreciated, even if much lass than Python. Among the supported languages there is Cobol and Fortran, and many others, but I don't see Ada. Bye, bearophile
May 08 2011
next sibling parent Peter Alexander <peter.alexander.au gmail.com> writes:
On 8/05/11 12:39 PM, bearophile wrote:
 But a person from Japan has used D to be among the top ten, this is good:
 http://www.go-hero.net/jam/11/name/hos.lyric
Unfortunately the ranks in the first don't mean much at all. Most rounds last only a few hours, so everyone competes at the same time, but the first round last 24 hours, so most participants just come in and solve the problems whenever they want. What that means is that people at the top of the board on the first round are just those that started the competition as soon as it started. --- Interestingly, that contestant barely used any of D's features. The code he wrote may as well have been C++.
May 08 2011
prev sibling parent Keywan Ghadami <k.ghadami ibson.com> writes:
just an idea:new name for d -> d2lang
May 08 2011
prev sibling parent reply Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Sun, May 8, 2011 at 6:10 AM, Peter Alexander <peter.alexander.au 
gmail.com> wrote:

 The Google Code Jam is a programming competition where you have to solve
 algorithmic problems using whatever programming language you like.

 The stats of what programming languages were used in the first round were
 collected:

 http://www.go-hero.net/jam/11/languages

 Some select figures for languages used to solve the first question:

 C++     5032
 Java    2321

 C       532
 Haskell 100
 Clojure 13
 GO      13
 D       5
 Scheme  5

 (In previous 3 years, D had between 2-4 entries for the first question, so
 not much change, despite total contestant counts increasing quite
 dramatically)

 Generally, I believe people tend to use the language they are most familiar
 with, and for people that know more than one language they will choose the
 one that is most expressive. Stability of implementations could also be an
 issue.

 Obviously you can't draw too many conclusions from this alone, but more
 data is always better. Take what you will from it.
I was one of the D users, although I wasn't really worried about competing. I just wanted to see how D would compare after doing so many programming contests in Java. The main thing that frustrated me was that getting input in D wasn't anywhere near as straightforward as it is in Java. For the first problem, I'd do something like this in Java: Scanner in = new Scanner(System.in); int numTests = in.nextInt(); for(int test = 0; test < numTests; tests++) { //need the test index for output int numSteps = in.nextInt(); for(; numSteps < 0; numSteps--) char robot = in.nextChar(); int button = in.nextInt(); //solve the problem! } //print the output! } In D, that looked like this: string line; int num; stdin.readln(line); formattedRead(line, "%s", &num); for(int casen = 0; casen < num; casen++) { ... In a few places, I could have used stdin.readf instead of readln/formattedRead, but not many because the number of items within a test is on the same line as the items. I could have just been missing something, but something that was trivial in Java became brittle in D because I had to exactly match the whitespace for things to work. I suppose I could have read a line and used splitter to split on whitespace, but that would make me have to watch more state and would wind up looking like this: string line; stdin.readln(line); auto split = split(line); int num = to!int(split[0]); split = split[1..$]; ... Actually... now that I'm looking at that, if I wrote a Scanner-like class based on this, is there any chance it could go into Phobos? Seems like between split and to, we could get something much less brittle working.
May 08 2011
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about competing.
 I just wanted to see how D would compare after doing so many programming
 contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }
Well, I don't like D's readf either (I use scanf, 2-3x faster and better whitespace handling). That said, you really made my day. The problem is not that reading input in D is less straightforward than in Java, the problem is, that you are used to Java's way of doing IO. (which I pretty much dislike, I guess it is a matter of taste) You do not actually have to bother with string handling at all when doing IO in C/C++/D. Reading array of integers: int[100000] array; //somewhere in static storage, faster ... scanf("%d",&n); foreach(ref x;array) scanf("%d",&x); Or, some heap activity involved, and actually more keystrokes, but some people like this way: readf("%s",&n);//read number of items int[] array=to!(int[])(split(strip(readln()))); How I would have written your example in D. int numTests; scanf("%d", &numTests); foreach(test;0..numTests){ int numSteps; scanf("%d", &numSteps); foreach(step;0..numSteps){ //you have a bug in this line of your Java code introducing a looooong loop char robot; scanf("%c", &robot); int button; scanf("%d", &button); //solve the problem! } //print the output }
 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {

 ...

 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a test
 is on the same line as the items.
That is not a problem at all, you can read the first few elements with readf and the rest of the line with readln
 I could have just been missing something, but something that was trivial in
 Java became brittle in D because I had to exactly match the whitespace for
I actually think Java's way is brittle. You have to instantiate a class just to read IO.
 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];
I don't get this.
 ...

 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.
No chance, that is not the way D/Phobos works. You do not have a class for everything that would not need one. (just like Phobos does not have a writer class for output) However I agree that Phobos has to provide some better input handling, since using possibly unsafe C functions is the best way to do it by now. (I think readf is severely crippled) I may try to implement a meaningful "read" function. Timon
May 08 2011
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
Whoops, there was a mistake:

Reading array of integers:

int[100000] array; //somewhere in static storage, faster
...
scanf("%d",&n);
foreach(ref x;array[0..n]) scanf("%d",&x); // note the slice


Timon
May 08 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.
Looking forward to detailed feedback about readf. It was implemented in a hurry so definitely it has a long way to go. Andrei
May 08 2011
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I'm very happy with using Jesse's interact library for user input:
https://github.com/he-the-great/JPDLibs/tree/cmdln

Last time I've used it I combined it with std.conv since I needed
either a number or a "q" from the user, e.g.:

int input;
auto line = userInput!string("Enter value:");
if (line == "q")
{
    quit();
}
else if (!throws!(ConvException)( { input = to!int(line); } ))  // try
converting to int
{
    if (input >= -127 && input <= 127)
    {
        // do something
    }
}

Here throws() is just a custom function that asserts that a delegate throws.
May 08 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
*that checks if a delegate throws and returns true if so*
May 08 2011
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.
Looking forward to detailed feedback about readf. It was implemented in a hurry so definitely it has a long way to go. Andrei
What I consider the most important points about readf: 1. Whitespace handling is different than scanf. It is much stricter and even feels inconsistent, Eg: int a,b; readf("%s %s",&a,&b);//input "1 2\n" read. readf("%s %s",&a,&b);//input "1 2\n" read (and a==1 && b==2). readf("%s",&a);//input "1\n" read. yay. readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too. readf("%s ",&a);//input "1 \n" read. readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!) is missing. readf(" %s",&a);//input "1\n" read. readf("\t%s",&a);//input "1\n": exception is thrown. readf("%s\n",&a);//input "1\n" read. readf("%s\n",&a);//input "1 \n": exception is thrown. readf("%s\t\n",&a);//input "1\t\n" read. readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any further input. And some more, I do not remember all of them. Exceptions are most of the time only as useful as "Enforcement failed". You (almost?) never want this behavior, even at the points it marginally makes sense. It would be nice to have an optional whitespace-enforcing version that _really_ enforces it (as opposed to the current implementation), but that should not be the default. And then it should be consistent (also on skipping or exception throwing). 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its '>>' more. scanf has that problem too, but it is a C function, you _cannot_ expect it to do any better than that. D has variadic template functions that may take ref parameters. It can be done entirely pointer-free. 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why did you throw away the idea of static overloads? It would have been a powerful feature, and very useful for this case. scanf in C/C++ does not have this problem, because most modern compilers generate warnings for this. But that is making some functions "more equal than the others" 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I mistakenly claimed before). I think this is just a quality of implementation issue, but it is important. Especially for programming competitions where there are time limits, you do not want IO to unnecessarily become a mayor bottleneck. (Input files can be huge) Other than that, D is WAY the most convenient language I have ever tried to solve small algorithmic tasks in. 5. Not really readf related: There's writef(ln) and there is write(ln). And then there is readf. I will provide a proof-of-concept for the read function soon. Timon
May 08 2011
next sibling parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 8/05/11 11:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei
What I consider the most important points about readf: 1. Whitespace handling is different than scanf. It is much stricter and even feels inconsistent, Eg:
std.readf is broken. http://d.puremagic.com/issues/show_bug.cgi?id=4656 This bug makes it quite difficult to evaluate readf. I just use scanf now.
May 09 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/9/11 2:53 AM, Peter Alexander wrote:
 On 8/05/11 11:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei
What I consider the most important points about readf: 1. Whitespace handling is different than scanf. It is much stricter and even feels inconsistent, Eg:
std.readf is broken. http://d.puremagic.com/issues/show_bug.cgi?id=4656 This bug makes it quite difficult to evaluate readf. I just use scanf now.
That's not a bug, see my comment in http://d.puremagic.com/issues/show_bug.cgi?id=4656. The error message _is_ a bug though! Andrei
May 09 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrei Alexandrescu wrote:
 I've implemented readf to be a fair amount more Nazi about whitespace than
 scanf in an attempt to improve its precision. Scanf has been famously difficult
 to use for complex input parsing and validation, and I attribute some of that
 to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
 readf's insistence on precise whitespace handling if there's enough evidence
 that that serves most of our users. I personally believe that the current
 behavior (strict by default, easy to relax) is best.
In my experience readf behavior is not very useful for routine coding tasks that involve some IO. If you really need to have very strict requirements about the input format, readf does not serve you well, because a ' ' still skips all whitespace, a failure to read leaves the file pointer in an undefined position etc. All carryovers from scanf. I never want to use scanf when there is a valid chance of invalid input. As far as I can see, neither readf nor scanf can be used for sophisticated input validation or parsing of non-trivial input. You have to do it manually. How does readf make things better with strict(er) whitespace handling? What behavior is by design, what behavior is caused by bugs? Can you give a real-world example where readf design clearly beats scanf design? (as it is the default it should be almost always better, but I fail to see it) Apart from that, what about the other points I mentioned? Timon
May 09 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/9/11 12:43 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 I've implemented readf to be a fair amount more Nazi about whitespace than
 scanf in an attempt to improve its precision. Scanf has been famously difficult
 to use for complex input parsing and validation, and I attribute some of that
 to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
 readf's insistence on precise whitespace handling if there's enough evidence
 that that serves most of our users. I personally believe that the current
 behavior (strict by default, easy to relax) is best.
In my experience readf behavior is not very useful for routine coding tasks that involve some IO.
If this assessment would be reverted by simply inserting spaces in the formatting string, I'd be hard pressed to agree. I do agree that readf behavior is surprising if you expect 100% scanf compatibility. This is intentional and beneficial as I believe scanf is wanting in more than one way.
 If you really need to have very strict requirements about the input format,
readf
 does not serve you well, because a ' ' still skips all whitespace, a failure to
 read leaves the file pointer in an undefined position etc.
That is not an issue (albeit some the underlying machinery is not yet implemented). If you want to skip at most one space but no other whitespace, insert "%*1[ ]" in the formatting string. To skip any number of spaces, insert "%*[ ]". Skipping exactly one space is not supported at the formatting string level, but you can always read one character with %c and then enforce the character is ' '. I agree that that could be improved. What's needed is a specification for the minimum number of characters read, e.g. "%*1.1[ ]" for scanning and skipping exactly one space. In contrast, having e.g. %d skipping all whitespace is a losing proposition if you want to do precision parsing. This is because that behavior can't be disabled. That's why I excised it. Reading is greedy. Failure to read leaves the pointer in a defined position, but we need to improve documentation.
 All carryovers from
 scanf. I never want to use scanf when there is a valid chance of invalid input.
I agree, but that's a problem with scanf that should and could be fixed. There's almost always a chance of invalid input.
 As
 far as I can see, neither readf nor scanf can be used for sophisticated input
 validation or parsing of non-trivial input. You have to do it manually. How
does
 readf make things better with strict(er) whitespace handling?
Far as I can see, implementing Posix %[charset] extension would make readf a powerful one-stop shop for parsing input. Of course its speed needs to be up to snuff too. And of course its specification can be improved, which is where your input is very valuable.
 What behavior is by design, what behavior is caused by bugs? Can you give a
 real-world example where readf design clearly beats scanf design? (as it is the
 default it should be almost always better, but I fail to see it)

 Apart from that, what about the other points I mentioned?
I answered all of these in my other, longer post. Andrei
May 09 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/8/11 5:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.
Looking forward to detailed feedback about readf. It was implemented in a hurry so definitely it has a long way to go. Andrei
What I consider the most important points about readf:
Thanks very much for providing detailed feedback.
 1. Whitespace handling is different than scanf. It is much stricter and even
feels
 inconsistent, Eg:

 int a,b;

 readf("%s %s",&a,&b);//input "1 2\n" read.
 readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).
So far so good. By design one space in readf means "skip all whitespace".
 readf("%s",&a);//input "1\n" read. yay.
 readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.
I'm not seeing skipping in my tests; I do see an exception being thrown. Here's how I test: import std.stdio; void main() { int a, b; readf("%s",&a); assert(a == 1); readf("%s",&b); assert(b == 2); } dmd ./test && echo '1\n 2' | ./test The first input is read into 'a' and reading stops just at the \n. Next you're trying to read "\n 2" into b, which fails due to the strict whitespace handling. To fix this, you'd need to insert a space before the second "%s". I'm not hooked on this strict whitespace handling, but I think it makes a lot of sense particularly when you want to make sure the input looks exactly as you think it should. With scanf you can't have precise parsing even if you wanted; with readf all you need is to insert a space. Precision is important. For example, Hive uses a \t for field separation when streaming to a file. It is very important to figure that you have one tab there versus two (two means a NULL field was in between).
 readf("%s ",&a);//input "1 \n" read.
 readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space
(!)
 is missing.
On my machine this passes: import std.stdio; void main() { int a, b; readf("%s ",&a); assert(a == 1); readf("%s ",&b); assert(b == 2); } dmd ./test && echo '1\n 2' | ./test The explanation is that, again, a space means "skip all whitespace". So the first space eats the "\n " and the second space eats the final "\n" in the input (produced by echo). Please adjust this example so it unduly fails.
 readf(" %s",&a);//input "1\n" read.
 readf("\t%s",&a);//input "1\n": exception is thrown.
A "\t" in the formatting string for readf simply requires a tab. To skip over any number of tabs, do this: readf("%*1[\t]%s",&a); That instructs readf to read, but not store, a string consisting of at most one tab. (To skip multiple tabs drop the "1".) This functionality is not yet implemented.
 readf("%s\n",&a);//input "1\n" read.
 readf("%s\n",&a);//input "1 \n": exception is thrown.
That is as expected - if you specify \n readf expects a \n.
 readf("%s\t\n",&a);//input "1\t\n" read.
 readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
 further input.
My testbed: import std.stdio; void main() { int a, b; readf("%s\t\n",&a); assert(a == 1); readf("%s \n",&b); assert(b == 2); } dmd ./test && echo "1\t\n2 " | ./test It fails because it can't find the last \n. That's a bug.
 And some more, I do not remember all of them. Exceptions are most of the time
only
 as useful as "Enforcement failed".


 You (almost?) never want this behavior, even at the points it marginally makes
 sense. It would be nice to have an optional whitespace-enforcing version that
 _really_ enforces it
 (as opposed to the current implementation), but that should not be the default.
 And then it should be consistent (also on skipping or exception throwing).
Except for one bug and one lacking implementation artifact, I find the current behavior consistent with a strict approach to whitespace handling.
 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its
'>>'
 more.
     scanf has that problem too, but it is a C function, you _cannot_ expect it
to
 do any better than that.
     D has variadic template functions that may take ref parameters. It can be
done
 entirely pointer-free.
When I implemented readf, ref variadic arguments weren't working. I'd be hesitant to change it right now as it does not improve actual functionality and disrupts current uses. But I agree ideally it should accept parameters by reference.
 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why
did
 you throw away the idea of static overloads? It would have been a powerful
feature,
     and very useful for this case. scanf in C/C++ does not have this problem,
 because most modern compilers generate warnings for this. But that is making
some
 functions
     "more equal than the others"
One early version I had was doing that and spelled readf!"format string"(arguments); Unfortunately, sometimes runtime-computed formatting strings are needed and useful (see the recent std.log discussion...) so I decided to go with dynamic formatting for now. Once we get that right, providing an optional compile-time-checked formatting function shouldn't be too difficult with CTFE.
 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
 mistakenly claimed before). I think this is just a quality of implementation
 issue, but it is important.
I agree. I'm amazed readf is not slower actually. It uses by character file iteration, by far the slowest (and most embarrassing) code I wrote in Phobos: each character read entails one call to getc() to fetch the character, one call to ungetc() to restore the stream position, and finally one more call to getc() to move forward. The code is correct but very slow. Some C APIs provide undocumented means to peek at the next character in the stream without actually advancing the stream, which is what we need. I know how to do it on most Unixen and Walter knows how to do it on his own cstdlib implementation. We didn't have the time yet, and I'm glad the matter is under spotlight.
     Especially for programming competitions where there are time limits, you
do not
 want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
Agreed.
     Other than that, D is WAY the most convenient language I have ever tried to
 solve small algorithmic tasks in.
 5. Not really readf related: There's writef(ln) and there is write(ln). And
then
 there is readf. I will provide a proof-of-concept for the read function soon.
Good idea. I suggest you provide a template read(T)() that mimics the functionality of Java's nextInt, nextFloat etc: auto a = stdin.next!int(); auto b = stdin.next!double(); auto s = stdin.next!string("\n"); // read a string up to \n ... Andrei
May 09 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
Sry, overlooked this post.

Andrei Alexandrescu wrote:
 On 5/8/11 5:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.
Looking forward to detailed feedback about readf. It was implemented in a hurry so definitely it has a long way to go. Andrei
What I consider the most important points about readf:
Thanks very much for providing detailed feedback.
 1. Whitespace handling is different than scanf. It is much stricter and even
feels
 inconsistent, Eg:

 int a,b;

 readf("%s %s",&a,&b);//input "1 2\n" read.
 readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).
So far so good. By design one space in readf means "skip all whitespace".
 readf("%s",&a);//input "1\n" read. yay.
 readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.
I'm not seeing skipping in my tests; I do see an exception being thrown. Here's how I test: import std.stdio; void main() { int a, b; readf("%s",&a); assert(a == 1); readf("%s",&b); assert(b == 2); } dmd ./test && echo '1\n 2' | ./test
I tested inputting manually in terminal. The exception is thrown only when I provide an EOF. Seems like the input is not being skipped after all, but readf does not return until there is an EOF.
 I'm not hooked on this strict whitespace handling, but I think it makes
 a lot of sense particularly when you want to make sure the input looks
 exactly as you think it should. With scanf you can't have precise
 parsing even if you wanted; with readf all you need is to insert a space.

 Precision is important. For example, Hive uses a \t for field separation
 when streaming to a file. It is very important to figure that you have
 one tab there versus two (two means a NULL field was in between).
It should be possible to do that with scanf using %[] if I'm not mistaken.
 readf("%s ",&a);//input "1 \n" read.
 readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space
(!)
 is missing.

 On my machine this passes:

 import std.stdio;
 void main()
 {
      int a, b;
      readf("%s ",&a);
      assert(a == 1);
      readf("%s ",&b);
      assert(b == 2);
 }

 dmd ./test && echo '1\n 2' | ./test

 The explanation is that, again, a space means "skip all whitespace". So
 the first space eats the "\n " and the second space eats the final "\n"
 in the input (produced by echo). Please adjust this example so it unduly
 fails.
Again, misinterpretation on my side. Typing into the terminal expects new input until a non-whitespace character is inserted. Should be fine, but can be surprising.
 readf(" %s",&a);//input "1\n" read.
 readf("\t%s",&a);//input "1\n": exception is thrown.
A "\t" in the formatting string for readf simply requires a tab. To skip over any number of tabs, do this: readf("%*1[\t]%s",&a); That instructs readf to read, but not store, a string consisting of at most one tab. (To skip multiple tabs drop the "1".) This functionality is not yet implemented.
I did not know it would ever be! That removes many of my concerns. (and the 'read' function removes the rest)
 readf("%s\n",&a);//input "1\n" read.
 readf("%s\n",&a);//input "1 \n": exception is thrown.
That is as expected - if you specify \n readf expects a \n.
 readf("%s\t\n",&a);//input "1\t\n" read.
 readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
 further input.
My testbed: import std.stdio; void main() { int a, b; readf("%s\t\n",&a); assert(a == 1); readf("%s \n",&b); assert(b == 2); } dmd ./test && echo "1\t\n2 " | ./test It fails because it can't find the last \n. That's a bug.
At least I found one. =)
 And some more, I do not remember all of them. Exceptions are most of the time
only
 as useful as "Enforcement failed".


 You (almost?) never want this behavior, even at the points it marginally makes
 sense. It would be nice to have an optional whitespace-enforcing version that
 _really_ enforces it
 (as opposed to the current implementation), but that should not be the default.
 And then it should be consistent (also on skipping or exception throwing).
 Except for one bug and one lacking implementation artifact, I find the
 current behavior consistent with a strict approach to whitespace handling.
Agreed. Thanks for your explanations!
 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its
'>>'
 more.
     scanf has that problem too, but it is a C function, you _cannot_ expect it
to
 do any better than that.
     D has variadic template functions that may take ref parameters. It can be
done
 entirely pointer-free.
When I implemented readf, ref variadic arguments weren't working. I'd be hesitant to change it right now as it does not improve actual functionality and disrupts current uses. But I agree ideally it should accept parameters by reference.
We can have both, since it will never be possible to read in raw pointers: import std.stdio; import std.conv; private bool containsPointersImpl(T...)(){ //nesting this inside containsPointer template removes eponymous template trick. Is this a bug? foreach(t;T) static if(is(t U:U*)) return true; return false; } template containsPointers(T...){enum containsPointers=containsPointersImpl!T();} private bool onlyPointersImpl(T...)(){ foreach(t;T) static if(!is(t U:U*)) return false; return true; } template onlyPointers(T...){enum onlyPointers=onlyPointersImpl!T();} private string _readfImpl(int len){ string res="return std.stdio.stdin.readf(format,"; foreach(t;0..len) res~="&args["~to!string(t)~"], "; res~=");"; return res; } int _readf(T...)(string format, ref T args) if(!containsPointers!T){mixin(_readfImpl(T.length));} //classic definition for backwards compatibility. int _readf(T...)(string format, T args) if(onlyPointers!T){ return std.stdio.stdin.readf(format, args); } void main(){ int a; _readf(" %s",&a); writeln(a); _readf(" %s",a); writeln(a); }
 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why
did
 you throw away the idea of static overloads? It would have been a powerful
feature,
     and very useful for this case. scanf in C/C++ does not have this problem,
 because most modern compilers generate warnings for this. But that is making
some
 functions
     "more equal than the others"
One early version I had was doing that and spelled readf!"format string"(arguments); Unfortunately, sometimes runtime-computed formatting strings are needed and useful (see the recent std.log discussion...) so I decided to go with dynamic formatting for now. Once we get that right, providing an optional compile-time-checked formatting function shouldn't be too difficult with CTFE.
The problem I see here is that the dynamic version still cannot be checked when passed a statically known format string. Why did you drop the idea of allowing something like int readf(T...)(static string format, T args) ?
 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
 mistakenly claimed before). I think this is just a quality of implementation
 issue, but it is important.
I agree. I'm amazed readf is not slower actually. It uses by character file iteration, by far the slowest (and most embarrassing) code I wrote in Phobos: each character read entails one call to getc() to fetch the character, one call to ungetc() to restore the stream position, and finally one more call to getc() to move forward. The code is correct but very slow. Some C APIs provide undocumented means to peek at the next character in the stream without actually advancing the stream, which is what we need. I know how to do it on most Unixen and Walter knows how to do it on his own cstdlib implementation. We didn't have the time yet, and I'm glad the matter is under spotlight.
     Especially for programming competitions where there are time limits, you
do not
 want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
Agreed.
     Other than that, D is WAY the most convenient language I have ever tried to
 solve small algorithmic tasks in.
 5. Not really readf related: There's writef(ln) and there is write(ln). And
then
 there is readf. I will provide a proof-of-concept for the read function soon.
Good idea. I suggest you provide a template read(T)() that mimics the functionality of Java's nextInt, nextFloat etc: auto a = stdin.next!int(); auto b = stdin.next!double(); auto s = stdin.next!string("\n"); // read a string up to \n ... Andrei
Yes, I think it should support: auto a = read!int; auto b = read!double; auto s = read!string("\n"); // this could be an overload on immutability. alternative would be read!(string,"\n"); I don not know. auto x = read!(int[])(50); // read an array of 50 integers separated by whitespace auto y = read!(int[],",")(50); // read an array of 50 integers separated by commas auto z = read!(int[],", ")(50); // read an array of 50 integers separated by commas and whitespace Plus the same for every type that can be to!type(string)'d. But also: read should replace readf wherever possible in the following forms: int a; double b; string s; read(a,b,s);//reads whitespace-separated a, b and s in turn. (delimiter could be changed by template argument or so) char[] c=new char[1000]; read(c); // only relocates c if the number of read characters exceeds 1000. One problem I see: An evildoer could provide a huge input, filling up the whole RAM. I think this vulnerability is also present in readln. Any ideas? Non-string arrays are handled this way: int[100] arr; read(arr); // reads 100 integers and stores in arr read(arr[0..20]); //reads 20 integers into the first 20 slots of arr int arr[] = new arr[100]; read(arr); //ditto Rationale: reading input should not /require/ heap activity. The read function would cover all cases where no strict whitespace handling is required, and readf would take the rest! I think that would be a very nice solution. Timon
May 09 2011
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
On Sun, May 8, 2011 at 3:04 PM, Timon Gehr <timon.gehr gmx.ch> wrote:

 Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about
competing.
 I just wanted to see how D would compare after doing so many programming
 contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }
Well, I don't like D's readf either (I use scanf, 2-3x faster and better whitespace handling). That said, you really made my day. The problem is not that reading input in D is less straightforward than in Java, the problem is, that you are used to Java's way of doing IO. (which I pretty much dislike, I guess it is a matter of taste) You do not actually have to bother with string handling at all when doing IO in C/C++/D. Reading array of integers: int[100000] array; //somewhere in static storage, faster ... scanf("%d",&n); foreach(ref x;array) scanf("%d",&x);
What bothers me about that code is that you had to write a string to represent something that should be implicit. It may just be that formattedRead is more strict than scanf, but I had problems getting whitespace to behave properly with format code strings. Plus, when you just type %d, what if I want a long? What if I want an infinite precision integer? These things aren't solved by C function calls, and trying to come up with a string format code for every possible input would needlessly complicate things. Or, some heap activity involved, and actually more keystrokes, but some
 people
 like this way:
 readf("%s",&n);//read number of items

 int[] array=to!(int[])(split(strip(readln())));


 How I would have written your example in D.
 int numTests; scanf("%d", &numTests);
 foreach(test;0..numTests){
    int numSteps; scanf("%d", &numSteps);
    foreach(step;0..numSteps){ //you have a bug in this line of your Java
 code
 introducing a looooong loop
        char robot; scanf("%c", &robot);
        int button; scanf("%d", &button);
         //solve the problem!
    }
    //print the output
 }
As a note, I recently discovered while running through some D1 code that %c isn't a format code recognized by the D2 formatting functions. I realize this is C though.
 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {

 ...

 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a
test
 is on the same line as the items.
That is not a problem at all, you can read the first few elements with readf and the rest of the line with readln
The documentation seems to imply that readf reads an entire line. Was I just misunderstanding it?


 I could have just been missing something, but something that was trivial
in
 Java became brittle in D because I had to exactly match the whitespace
for I actually think Java's way is brittle. You have to instantiate a class just to read IO.
That doesn't make it brittle, that makes it heavy and/or overkill. What's brittle is when I have to exactly match whitespace, write strings for things that should be implicit, and keep track of more state than is strictly necessary. Java's Scanner is nice because you ask for an integer and get an integer, and as long as you ask for the right things in the right order, you don't have to track any state whatsoever. Keeping track of where you are in the input stream is something better left to the code doing the reading rather than the user. Your way doesn't involve state, but it also doesn't generalize to other types of streams.
 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];
I don't get this.
It's simple. I have a line that looks like this: 4 3 2 67 5 The first number is the number of numbers that follow, and the code looks like this: string line = "4 3 2 67 5"; auto split = split(line); int num = to!int(line[0]); line = line[1..$]; foreach(index; 0..num) { int cur - to!int(line[0]); line = line[1..$]; // do things } I realize this is just a more complicated version of your heap code above, but suppose I needed to read an integer, a string, and a floating point number for each item. This scales up quite nicely to that sort of thing.
 ...

 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.
No chance, that is not the way D/Phobos works. You do not have a class for everything that would not need one. (just like Phobos does not have a writer class for output)
Yes, if I had thought a bit more, I wouldn't have said class. This could just be implemented as a few simple methods for reading primitives from string ranges (or character ranges, actually, as that would be more general). I would expect something like this to appear with the stream API that we'll hopefully build at some point. A class would probably be overkill.
 However I agree that Phobos has to provide some better input handling,
 since using
 possibly unsafe C functions is the best way to do it by now. (I think readf
 is
 severely crippled) I may try to implement a meaningful "read" function.
I think that input handling like this should be built on top of a stream API, and because that API isn't here yet, improving input may be premature. Or it may be too useful to wait.
May 08 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
 Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about
 competing. I just wanted to see how D would compare after doing so many
 programming contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }
Well, I don't like D's readf either (I use scanf, 2-3x faster and better whitespace handling). That said, you really made my day. The problem is not that reading input in D is less straightforward than in Java, the problem is, that you are used to Java's way of doing IO. (which I pretty much dislike, I guess it is a matter of taste) You do not actually have to bother with string handling at all when doing IO in C/C++/D. Reading array of integers: int[100000] array; //somewhere in static storage, faster ... scanf("%d",&n); foreach(ref x;array) scanf("%d",&x); Or, some heap activity involved, and actually more keystrokes, but some people like this way: readf("%s",&n);//read number of items int[] array=to!(int[])(split(strip(readln()))); How I would have written your example in D. int numTests; scanf("%d", &numTests); foreach(test;0..numTests){ int numSteps; scanf("%d", &numSteps); foreach(step;0..numSteps){ //you have a bug in this line of your Java code introducing a looooong loop char robot; scanf("%c", &robot); int button; scanf("%d", &button); //solve the problem! } //print the output }
 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {
 
 ...
 
 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a
 test is on the same line as the items.
That is not a problem at all, you can read the first few elements with readf and the rest of the line with readln
 I could have just been missing something, but something that was trivial
 in Java became brittle in D because I had to exactly match the
 whitespace for
I actually think Java's way is brittle. You have to instantiate a class just to read IO.
 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];
I don't get this.
 ...
 
 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.
No chance, that is not the way D/Phobos works. You do not have a class for everything that would not need one. (just like Phobos does not have a writer class for output) However I agree that Phobos has to provide some better input handling, since using possibly unsafe C functions is the best way to do it by now. (I think readf is severely crippled) I may try to implement a meaningful "read" function.
stdin is already a struct in D. To do it in a more Java-like manner would likely involve having a templated read function which is templated on the type that you want to get out of stdin next. Essentially, you'd do something like std.conv.parse directly on stdin by having it as part of std.stdio.File. Now, personally, I just always read in the whole line and then use std.conv.parse on it. I'm not sure if that actually costs you anything in terms of functionality, though it might be possible to implement a templated read function on std.stdio.File more efficiently. And using parse like that, you can get much friendlier I/O which is closer to what you'd get with Scanner in Java. - Jonathan M Davis
May 08 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrew Wiley:

 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
I have tried to implement a D solution to the first problem, because its input is a bit more complex. I have used C++ code written the winner as starting point. After several failed D versions (this is BAD for D2/Phobos), I've written a Python prototype and then I have translated it to D2: import std.stdio, std.math, std.conv, std.string, std.array, std.algorithm; auto next(R)(ref R range) { auto result = range.front(); range.popFront(); return result; } void main() { auto fin = File("input.txt"); auto fout = File("output.txt", "w"); foreach (i; 0 .. to!int(fin.readln().strip())) { int[2] lastP = 1; int[2] lastT = 0; int t = 0; auto parts = splitter(fin.readln().strip(), " "); foreach (_; 0 .. to!int(next(parts))) { string s = next(parts); int q = to!int(next(parts)); int id = cast(int)(s == "B"); t = max(t, abs(q - lastP[id]) + lastT[id]) + 1; lastP[id] = q; lastT[id] = t; } } } Three problems I've found in translating the prototype: - A next() function/method is missing, but I needed it, so I have had to define it, to keep code from becoming hairy and quite less readable. to!int expects a stripped string. In my code I am never sure to have a stripped string coming from input, so I have to always add a strip(), this is dumb: foreach (i; 0 .. to!int(fin.readln().strip())) { ==> foreach (i; 0 .. to!int(fin.readln())) { std.algorithm.splitter() doesn't default to splitting on whitespace as std.string.split() does. This is bad because in this program I need to add a strip() and in general it's bad because if there are two spaces, or a newline, it causes a mess, so I'd like a new overload of splitter() that acts as split(): auto parts = splitter(fin.readln().strip(), " "); ==> auto parts = splitter(fin.readln()); Bye, bearophile
May 08 2011