digitalmars.D - Google Code Jam 2011 Language Usage

Peter Alexander (24/24) May 08 2011 The Google Code Jam is a programming competition where you have to solve...

bearophile (9/25) May 08 2011 But a person from Japan has used D to be among the top ten, this is good...

Peter Alexander (10/12) May 08 2011 Unfortunately the ranks in the first don't mean much at all.
Keywan Ghadami (1/1) May 08 2011 just an idea:new name for d -> d2lang

Andrew Wiley (44/68) May 08 2011 I was one of the D users, although I wasn't really worried about competi...

Timon Gehr (41/83) May 08 2011 Well, I don't like D's readf either (I use scanf, 2-3x faster and better

Timon Gehr (7/7) May 08 2011 Whoops, there was a mistake:
Andrei Alexandrescu (4/7) May 08 2011 Looking forward to detailed feedback about readf. It was implemented in

Andrej Mitrovic (19/19) May 08 2011 I'm very happy with using Jesse's interact library for user input:
Andrej Mitrovic (1/1) May 08 2011 *that checks if a delegate throws and returns true if so*
Timon Gehr (48/55) May 08 2011 What I consider the most important points about readf:

Peter Alexander (4/12) May 09 2011 std.readf is broken.

Andrei Alexandrescu (5/20) May 09 2011 That's not a bug, see my comment in

Timon Gehr (15/22) May 09 2011 In my experience readf behavior is not very useful for routine coding ta...

Andrei Alexandrescu (28/51) May 09 2011 If this assessment would be reverted by simply inserting spaces in the

Andrei Alexandrescu (91/147) May 09 2011 So far so good. By design one space in readf means "skip all whitespace"...

Timon Gehr (78/226) May 09 2011 using

Andrew Wiley (50/151) May 08 2011 What bothers me about that code is that you had to write a string to
Jonathan M Davis (11/115) May 08 2011 stdin is already a struct in D. To do it in a more Java-like manner woul...

bearophile (39/41) May 08 2011 I have tried to implement a D solution to the first problem, because its...

Peter Alexander <peter.alexander.au gmail.com> writes:

The Google Code Jam is a programming competition where you have to solve 
algorithmic problems using whatever programming language you like.

The stats of what programming languages were used in the first round 
were collected:

http://www.go-hero.net/jam/11/languages

Some select figures for languages used to solve the first question:

C++	5032
Java	2321

C	532
Haskell	100
Clojure	13
GO	13
D	5
Scheme	5

(In previous 3 years, D had between 2-4 entries for the first question, 
so not much change, despite total contestant counts increasing quite 
dramatically)

Generally, I believe people tend to use the language they are most 
familiar with, and for people that know more than one language they will 
choose the one that is most expressive. Stability of implementations 
could also be an issue.

Obviously you can't draw too many conclusions from this alone, but more 
data is always better. Take what you will from it.

May 08 2011

bearophile <bearophileHUGS lycos.com> writes:

Peter Alexander:

 Some select figures for languages used to solve the first question:
 
 C++	5032
 Java	2321

 C	532
 Haskell	100
 Clojure	13
 GO	13
 D	5
 Scheme	5

The third most used language is Python.


 (In previous 3 years, D had between 2-4 entries for the first question, 
 so not much change, despite total contestant counts increasing quite 
 dramatically)

But a person from Japan has used D to be among the top ten, this is good:
http://www.go-hero.net/jam/11/name/hos.lyric

The first, second and third persons are using the most used language, second
most used and third most used (C++, Java, Python) :-)


 Obviously you can't draw too many conclusions from this alone, but more 
 data is always better. Take what you will from it.

From those numbers it looks like D isn't gaining mindshare, unfortunately. Go

appreciated, even if much lass than Python.
Among the supported languages there is Cobol and Fortran, and many others, but
I don't see Ada.

Bye,
bearophile

May 08 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 8/05/11 12:39 PM, bearophile wrote:
 But a person from Japan has used D to be among the top ten, this is good:
 http://www.go-hero.net/jam/11/name/hos.lyric

Unfortunately the ranks in the first don't mean much at all.

Most rounds last only a few hours, so everyone competes at the same 
time, but the first round last 24 hours, so most participants just come 
in and solve the problems whenever they want. What that means is that 
people at the top of the board on the first round are just those that 
started the competition as soon as it started.

---

Interestingly, that contestant barely used any of D's features. The code 
he wrote may as well have been C++.

May 08 2011

Keywan Ghadami <k.ghadami ibson.com> writes:

just an idea:new name for d -> d2lang

May 08 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Sun, May 8, 2011 at 6:10 AM, Peter Alexander <peter.alexander.au 
gmail.com> wrote:

 The Google Code Jam is a programming competition where you have to solve
 algorithmic problems using whatever programming language you like.

 The stats of what programming languages were used in the first round were
 collected:

 http://www.go-hero.net/jam/11/languages

 Some select figures for languages used to solve the first question:

 C++     5032
 Java    2321

 C       532
 Haskell 100
 Clojure 13
 GO      13
 D       5
 Scheme  5

 (In previous 3 years, D had between 2-4 entries for the first question, so
 not much change, despite total contestant counts increasing quite
 dramatically)

 Generally, I believe people tend to use the language they are most familiar
 with, and for people that know more than one language they will choose the
 one that is most expressive. Stability of implementations could also be an
 issue.

 Obviously you can't draw too many conclusions from this alone, but more
 data is always better. Take what you will from it.

I was one of the D users, although I wasn't really worried about competing.
I just wanted to see how D would compare after doing so many programming
contests in Java.
The main thing that frustrated me was that getting input in D wasn't
anywhere near as straightforward as it is in Java. For the first problem,
I'd do something like this in Java:
Scanner in = new Scanner(System.in);
int numTests = in.nextInt();
for(int test = 0; test < numTests; tests++) { //need the test index for
output
int numSteps = in.nextInt();
for(; numSteps < 0; numSteps--)
char robot = in.nextChar();
int button = in.nextInt();
//solve the problem!
}
//print the output!
}


In D, that looked like this:
string line;
int num;
stdin.readln(line);
formattedRead(line, "%s", &num);
for(int casen = 0; casen < num; casen++) {

...

In a few places, I could have used stdin.readf instead of
readln/formattedRead, but not many because the number of items within a test
is on the same line as the items.
I could have just been missing something, but something that was trivial in
Java became brittle in D because I had to exactly match the whitespace for
things to work. I suppose I could have read a line and used splitter to
split on whitespace, but that would make me have to watch more state and
would wind up looking like this:
string line;
stdin.readln(line);
auto split = split(line);
int num = to!int(split[0]);
split = split[1..$];

...

Actually... now that I'm looking at that, if I wrote a Scanner-like class
based on this, is there any chance it could go into Phobos? Seems like
between split and to, we could get something much less brittle working.

May 08 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about competing.
 I just wanted to see how D would compare after doing so many programming
 contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }


Well, I don't like D's readf either (I use scanf, 2-3x faster and better
whitespace handling). That said, you really made my day.
The problem is not that reading input in D is less straightforward than in Java,
the problem is, that you are used to Java's way of doing IO. (which I pretty
much
dislike, I guess it is a matter of taste)

You do not actually have to bother with string handling at all when doing IO in
C/C++/D.

Reading array of integers:

int[100000] array; //somewhere in static storage, faster
...
scanf("%d",&n);
foreach(ref x;array) scanf("%d",&x);

Or, some heap activity involved, and actually more keystrokes, but some people
like this way:
readf("%s",&n);//read number of items

int[] array=to!(int[])(split(strip(readln())));


How I would have written your example in D.
int numTests; scanf("%d", &numTests);
foreach(test;0..numTests){
    int numSteps; scanf("%d", &numSteps);
    foreach(step;0..numSteps){ //you have a bug in this line of your Java code
introducing a looooong loop
        char robot; scanf("%c", &robot);
        int button; scanf("%d", &button);
        //solve the problem!
    }
    //print the output
}

 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {

 ...

 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a test
 is on the same line as the items.

That is not a problem at all, you can read the first few elements with readf and
the rest of the line with readln

 I could have just been missing something, but something that was trivial in
 Java became brittle in D because I had to exactly match the whitespace for

I actually think Java's way is brittle. You have to instantiate a class just to
read IO.

 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];

I don't get this.

 ...

 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.

No chance, that is not the way D/Phobos works. You do not have a class for
everything that would not need one. (just like Phobos does not have a writer
class
for output)

However I agree that Phobos has to provide some better input handling, since
using
possibly unsafe C functions is the best way to do it by now. (I think readf is
severely crippled) I may try to implement a meaningful "read" function.


Timon

May 08 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Whoops, there was a mistake:

Reading array of integers:

int[100000] array; //somewhere in static storage, faster
...
scanf("%d",&n);
foreach(ref x;array[0..n]) scanf("%d",&x); // note the slice


Timon

May 08 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.

Looking forward to detailed feedback about readf. It was implemented in 
a hurry so definitely it has a long way to go.

Andrei

May 08 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

I'm very happy with using Jesse's interact library for user input:
https://github.com/he-the-great/JPDLibs/tree/cmdln

Last time I've used it I combined it with std.conv since I needed
either a number or a "q" from the user, e.g.:

int input;
auto line = userInput!string("Enter value:");
if (line == "q")
{
    quit();
}
else if (!throws!(ConvException)( { input = to!int(line); } ))  // try
converting to int
{
    if (input >= -127 && input <= 127)
    {
        // do something
    }
}

Here throws() is just a custom function that asserts that a delegate throws.

May 08 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

*that checks if a delegate throws and returns true if so*

May 08 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.

 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei

What I consider the most important points about readf:

1. Whitespace handling is different than scanf. It is much stricter and even
feels
inconsistent, Eg:

int a,b;

readf("%s %s",&a,&b);//input "1 2\n" read.
readf("%s %s",&a,&b);//input "1  2\n" read (and a==1 && b==2).

readf("%s",&a);//input "1\n" read. yay.
readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.

readf("%s ",&a);//input "1 \n" read.
readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!)
is missing.

readf(" %s",&a);//input "1\n" read.
readf("\t%s",&a);//input "1\n": exception is thrown.

readf("%s\n",&a);//input "1\n" read.
readf("%s\n",&a);//input "1 \n": exception is thrown.

readf("%s\t\n",&a);//input "1\t\n" read.
readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
further input.

And some more, I do not remember all of them. Exceptions are most of the time
only
as useful as "Enforcement failed".


You (almost?) never want this behavior, even at the points it marginally makes
sense. It would be nice to have an optional whitespace-enforcing version that
_really_ enforces it
(as opposed to the current implementation), but that should not be the default.
And then it should be consistent (also on skipping or exception throwing).

2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its
'>>'
more.
   scanf has that problem too, but it is a C function, you _cannot_ expect it to
do any better than that.
   D has variadic template functions that may take ref parameters. It can be
done
entirely pointer-free.

3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why
did
you throw away the idea of static overloads? It would have been a powerful
feature,
   and very useful for this case. scanf in C/C++ does not have this problem,
because most modern compilers generate warnings for this. But that is making
some
functions
   "more equal than the others"

4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
mistakenly claimed before). I think this is just a quality of implementation
issue, but it is important.
   Especially for programming competitions where there are time limits, you do
not
want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
   Other than that, D is WAY the most convenient language I have ever tried to
solve small algorithmic tasks in.

5. Not really readf related: There's writef(ln) and there is write(ln). And then
there is readf. I will provide a proof-of-concept for the read function soon.


Timon

May 08 2011

Peter Alexander <peter.alexander.au gmail.com> writes:

On 8/05/11 11:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei

 What I consider the most important points about readf:

 1. Whitespace handling is different than scanf. It is much stricter and even
feels
 inconsistent, Eg:

std.readf is broken.

http://d.puremagic.com/issues/show_bug.cgi?id=4656

This bug makes it quite difficult to evaluate readf. I just use scanf now.

May 09 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/9/11 2:53 AM, Peter Alexander wrote:
 On 8/05/11 11:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei

 What I consider the most important points about readf:

 1. Whitespace handling is different than scanf. It is much stricter
 and even feels
 inconsistent, Eg:

 std.readf is broken.

 http://d.puremagic.com/issues/show_bug.cgi?id=4656

 This bug makes it quite difficult to evaluate readf. I just use scanf now.

That's not a bug, see my comment in 
http://d.puremagic.com/issues/show_bug.cgi?id=4656. The error message 
_is_ a bug though!

Andrei

May 09 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Andrei Alexandrescu wrote:
 I've implemented readf to be a fair amount more Nazi about whitespace than
 scanf in an attempt to improve its precision. Scanf has been famously difficult
 to use for complex input parsing and validation, and I attribute some of that
 to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
 readf's insistence on precise whitespace handling if there's enough evidence
 that that serves most of our users. I personally believe that the current
 behavior (strict by default, easy to relax) is best.

In my experience readf behavior is not very useful for routine coding tasks that
involve some IO.

If you really need to have very strict requirements about the input format,
readf
does not serve you well, because a ' ' still skips all whitespace, a failure to
read leaves the file pointer in an undefined position etc. All carryovers from
scanf. I never want to use scanf when there is a valid chance of invalid input.
As
far as I can see, neither readf nor scanf can be used for sophisticated input
validation or parsing of non-trivial input. You have to do it manually. How does
readf make things better with strict(er) whitespace handling?

What behavior is by design, what behavior is caused by bugs? Can you give a
real-world example where readf design clearly beats scanf design? (as it is the
default it should be almost always better, but I fail to see it)

Apart from that, what about the other points I mentioned?

Timon

May 09 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/9/11 12:43 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 I've implemented readf to be a fair amount more Nazi about whitespace than
 scanf in an attempt to improve its precision. Scanf has been famously difficult
 to use for complex input parsing and validation, and I attribute some of that
 to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
 readf's insistence on precise whitespace handling if there's enough evidence
 that that serves most of our users. I personally believe that the current
 behavior (strict by default, easy to relax) is best.

 In my experience readf behavior is not very useful for routine coding tasks
that
 involve some IO.

If this assessment would be reverted by simply inserting spaces in the 
formatting string, I'd be hard pressed to agree.

I do agree that readf behavior is surprising if you expect 100% scanf 
compatibility. This is intentional and beneficial as I believe scanf is 
wanting in more than one way.

 If you really need to have very strict requirements about the input format,
readf
 does not serve you well, because a ' ' still skips all whitespace, a failure to
 read leaves the file pointer in an undefined position etc.

That is not an issue (albeit some the underlying machinery is not yet 
implemented). If you want to skip at most one space but no other 
whitespace, insert "%*1[ ]" in the formatting string. To skip any number 
of spaces, insert "%*[ ]". Skipping exactly one space is not supported 
at the formatting string level, but you can always read one character 
with %c and then enforce the character is ' '. I agree that that could 
be improved. What's needed is a specification for the minimum number of 
characters read, e.g. "%*1.1[ ]" for scanning and skipping exactly one 
space.

In contrast, having e.g. %d skipping all whitespace is a losing 
proposition if you want to do precision parsing. This is because that 
behavior can't be disabled. That's why I excised it.

Reading is greedy. Failure to read leaves the pointer in a defined 
position, but we need to improve documentation.

 All carryovers from
 scanf. I never want to use scanf when there is a valid chance of invalid input.

I agree, but that's a problem with scanf that should and could be fixed. 
There's almost always a chance of invalid input.

 As
 far as I can see, neither readf nor scanf can be used for sophisticated input
 validation or parsing of non-trivial input. You have to do it manually. How
does
 readf make things better with strict(er) whitespace handling?

Far as I can see, implementing Posix %[charset] extension would make 
readf a powerful one-stop shop for parsing input. Of course its speed 
needs to be up to snuff too. And of course its specification can be 
improved, which is where your input is very valuable.

 What behavior is by design, what behavior is caused by bugs? Can you give a
 real-world example where readf design clearly beats scanf design? (as it is the
 default it should be almost always better, but I fail to see it)

 Apart from that, what about the other points I mentioned?

I answered all of these in my other, longer post.


Andrei

May 09 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 5/8/11 5:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since
using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.

 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei

 What I consider the most important points about readf:

Thanks very much for providing detailed feedback.

 1. Whitespace handling is different than scanf. It is much stricter and even
feels
 inconsistent, Eg:

 int a,b;

 readf("%s %s",&a,&b);//input "1 2\n" read.
 readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).

So far so good. By design one space in readf means "skip all whitespace".

 readf("%s",&a);//input "1\n" read. yay.
 readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.

I'm not seeing skipping in my tests; I do see an exception being thrown. 
Here's how I test:

import std.stdio;
void main()
{
     int a, b;
     readf("%s",&a);
     assert(a == 1);
     readf("%s",&b);
     assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The first input is read into 'a' and reading stops just at the \n. Next 
you're trying to read "\n 2" into b, which fails due to the strict 
whitespace handling. To fix this, you'd need to insert a space before 
the second "%s".

I'm not hooked on this strict whitespace handling, but I think it makes 
a lot of sense particularly when you want to make sure the input looks 
exactly as you think it should. With scanf you can't have precise 
parsing even if you wanted; with readf all you need is to insert a space.

Precision is important. For example, Hive uses a \t for field separation 
when streaming to a file. It is very important to figure that you have 
one tab there versus two (two means a NULL field was in between).

 readf("%s ",&a);//input "1 \n" read.
 readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space
(!)
 is missing.

On my machine this passes:

import std.stdio;
void main()
{
     int a, b;
     readf("%s ",&a);
     assert(a == 1);
     readf("%s ",&b);
     assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The explanation is that, again, a space means "skip all whitespace". So 
the first space eats the "\n " and the second space eats the final "\n" 
in the input (produced by echo). Please adjust this example so it unduly 
fails.

 readf(" %s",&a);//input "1\n" read.
 readf("\t%s",&a);//input "1\n": exception is thrown.

A "\t" in the formatting string for readf simply requires a tab. To skip 
over any number of tabs, do this:

readf("%*1[\t]%s",&a);

That instructs readf to read, but not store, a string consisting of at 
most one tab. (To skip multiple tabs drop the "1".) This functionality 
is not yet implemented.

 readf("%s\n",&a);//input "1\n" read.
 readf("%s\n",&a);//input "1 \n": exception is thrown.

That is as expected - if you specify \n readf expects a \n.

 readf("%s\t\n",&a);//input "1\t\n" read.
 readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
 further input.

My testbed:

import std.stdio;

void main()
{
     int a, b;
     readf("%s\t\n",&a);
     assert(a == 1);
     readf("%s \n",&b);
     assert(b == 2);
}

dmd ./test && echo "1\t\n2 " | ./test

It fails because it can't find the last \n. That's a bug.

 And some more, I do not remember all of them. Exceptions are most of the time
only
 as useful as "Enforcement failed".


 You (almost?) never want this behavior, even at the points it marginally makes
 sense. It would be nice to have an optional whitespace-enforcing version that
 _really_ enforces it
 (as opposed to the current implementation), but that should not be the default.
 And then it should be consistent (also on skipping or exception throwing).

Except for one bug and one lacking implementation artifact, I find the 
current behavior consistent with a strict approach to whitespace handling.

 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its
'>>'
 more.
     scanf has that problem too, but it is a C function, you _cannot_ expect it
to
 do any better than that.
     D has variadic template functions that may take ref parameters. It can be
done
 entirely pointer-free.

When I implemented readf, ref variadic arguments weren't working. I'd be 
hesitant to change it right now as it does not improve actual 
functionality and disrupts current uses. But I agree ideally it should 
accept parameters by reference.

 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why
did
 you throw away the idea of static overloads? It would have been a powerful
feature,
     and very useful for this case. scanf in C/C++ does not have this problem,
 because most modern compilers generate warnings for this. But that is making
some
 functions
     "more equal than the others"

One early version I had was doing that and spelled

readf!"format string"(arguments);

Unfortunately, sometimes runtime-computed formatting strings are needed 
and useful (see the recent std.log discussion...) so I decided to go 
with dynamic formatting for now. Once we get that right, providing an 
optional compile-time-checked formatting function shouldn't be too 
difficult with CTFE.

 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
 mistakenly claimed before). I think this is just a quality of implementation
 issue, but it is important.

I agree. I'm amazed readf is not slower actually. It uses by character 
file iteration, by far the slowest (and most embarrassing) code I wrote 
in Phobos: each character read entails one call to getc() to fetch the 
character, one call to ungetc() to restore the stream position, and 
finally one more call to getc() to move forward. The code is correct but 
very slow. Some C APIs provide undocumented means to peek at the next 
character in the stream without actually advancing the stream, which is 
what we need. I know how to do it on most Unixen and Walter knows how to 
do it on his own cstdlib implementation. We didn't have the time yet, 
and I'm glad the matter is under spotlight.

     Especially for programming competitions where there are time limits, you
do not
 want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)

Agreed.

     Other than that, D is WAY the most convenient language I have ever tried to
 solve small algorithmic tasks in.
 5. Not really readf related: There's writef(ln) and there is write(ln). And
then
 there is readf. I will provide a proof-of-concept for the read function soon.

Good idea. I suggest you provide a template read(T)() that mimics the 
functionality of Java's nextInt, nextFloat etc:

auto a = stdin.next!int();
auto b = stdin.next!double();
auto s = stdin.next!string("\n"); // read a string up to \n
...


Andrei

May 09 2011

Timon Gehr <timon.gehr gmx.ch> writes:

Sry, overlooked this post.

Andrei Alexandrescu wrote:
 On 5/8/11 5:57 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 On 5/8/11 3:04 PM, Timon Gehr wrote:
 However I agree that Phobos has to provide some better input handling, since




using
 possibly unsafe C functions is the best way to do it by now. (I think readf is
 severely crippled) I may try to implement a meaningful "read" function.

 Looking forward to detailed feedback about readf. It was implemented in
 a hurry so definitely it has a long way to go.

 Andrei

 What I consider the most important points about readf:

Thanks very much for providing detailed feedback.

 1. Whitespace handling is different than scanf. It is much stricter and even
feels
 inconsistent, Eg:

 int a,b;

 readf("%s %s",&a,&b);//input "1 2\n" read.
 readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).

 So far so good. By design one space in readf means "skip all whitespace".

 readf("%s",&a);//input "1\n" read. yay.
 readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.

 I'm not seeing skipping in my tests; I do see an exception being thrown.
 Here's how I test:

 import std.stdio;
 void main()
 {
      int a, b;
      readf("%s",&a);
      assert(a == 1);
      readf("%s",&b);
      assert(b == 2);
 }

 dmd ./test && echo '1\n 2' | ./test

I tested inputting manually in terminal. The exception is thrown only when I
provide an EOF. Seems like the input is not being skipped after all, but readf
does not return until there is an EOF.

 I'm not hooked on this strict whitespace handling, but I think it makes
 a lot of sense particularly when you want to make sure the input looks
 exactly as you think it should. With scanf you can't have precise
 parsing even if you wanted; with readf all you need is to insert a space.

 Precision is important. For example, Hive uses a \t for field separation
 when streaming to a file. It is very important to figure that you have
 one tab there versus two (two means a NULL field was in between).

It should be possible to do that with scanf using %[] if I'm not mistaken.

 readf("%s ",&a);//input "1 \n" read.
 readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space
(!)
 is missing.

 On my machine this passes:

 import std.stdio;
 void main()
 {
      int a, b;
      readf("%s ",&a);
      assert(a == 1);
      readf("%s ",&b);
      assert(b == 2);
 }

 dmd ./test && echo '1\n 2' | ./test

 The explanation is that, again, a space means "skip all whitespace". So
 the first space eats the "\n " and the second space eats the final "\n"
 in the input (produced by echo). Please adjust this example so it unduly
 fails.

Again, misinterpretation on my side. Typing into the terminal expects new input
until a non-whitespace character is inserted. Should be fine, but can be
surprising.

 readf(" %s",&a);//input "1\n" read.
 readf("\t%s",&a);//input "1\n": exception is thrown.

 A "\t" in the formatting string for readf simply requires a tab. To skip
 over any number of tabs, do this:

 readf("%*1[\t]%s",&a);

 That instructs readf to read, but not store, a string consisting of at
 most one tab. (To skip multiple tabs drop the "1".) This functionality
 is not yet implemented.

I did not know it would ever be! That removes many of my concerns. (and the
'read'
function removes the rest)

 readf("%s\n",&a);//input "1\n" read.
 readf("%s\n",&a);//input "1 \n": exception is thrown.

 That is as expected - if you specify \n readf expects a \n.

 readf("%s\t\n",&a);//input "1\t\n" read.
 readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
 further input.

 My testbed:

 import std.stdio;

 void main()
 {
      int a, b;
      readf("%s\t\n",&a);
      assert(a == 1);
      readf("%s \n",&b);
      assert(b == 2);
 }

 dmd ./test && echo "1\t\n2 " | ./test

 It fails because it can't find the last \n. That's a bug.

At least I found one. =)

 And some more, I do not remember all of them. Exceptions are most of the time
only
 as useful as "Enforcement failed".


 You (almost?) never want this behavior, even at the points it marginally makes
 sense. It would be nice to have an optional whitespace-enforcing version that
 _really_ enforces it
 (as opposed to the current implementation), but that should not be the default.
 And then it should be consistent (also on skipping or exception throwing).


 Except for one bug and one lacking implementation artifact, I find the
 current behavior consistent with a strict approach to whitespace handling.

Agreed. Thanks for your explanations!

 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its
'>>'
 more.
     scanf has that problem too, but it is a C function, you _cannot_ expect it
to
 do any better than that.
     D has variadic template functions that may take ref parameters. It can be
done
 entirely pointer-free.

 When I implemented readf, ref variadic arguments weren't working. I'd be
 hesitant to change it right now as it does not improve actual
 functionality and disrupts current uses. But I agree ideally it should
 accept parameters by reference.

We can have both, since it will never be possible to read in raw pointers:

import std.stdio;
import std.conv;

private bool containsPointersImpl(T...)(){ //nesting this inside containsPointer
template removes eponymous template trick. Is this a bug?
		foreach(t;T) static if(is(t U:U*)) return true;
		return false;
}

template containsPointers(T...){enum containsPointers=containsPointersImpl!T();}

private bool onlyPointersImpl(T...)(){
		foreach(t;T) static if(!is(t U:U*)) return false;
		return true;
}

template onlyPointers(T...){enum onlyPointers=onlyPointersImpl!T();}


private string _readfImpl(int len){
	string res="return std.stdio.stdin.readf(format,";
	foreach(t;0..len) res~="&args["~to!string(t)~"], ";
	res~=");";
	return res;
}

int _readf(T...)(string format, ref T args)
if(!containsPointers!T){mixin(_readfImpl(T.length));}

//classic definition for backwards compatibility.
int _readf(T...)(string format, T args) if(onlyPointers!T){
	return std.stdio.stdin.readf(format, args);
}

void main(){
	int a;
	_readf(" %s",&a);
	writeln(a);
	_readf(" %s",a);
	writeln(a);
}



 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why
did
 you throw away the idea of static overloads? It would have been a powerful
feature,
     and very useful for this case. scanf in C/C++ does not have this problem,
 because most modern compilers generate warnings for this. But that is making
some
 functions
     "more equal than the others"

 One early version I had was doing that and spelled

 readf!"format string"(arguments);

 Unfortunately, sometimes runtime-computed formatting strings are needed
 and useful (see the recent std.log discussion...) so I decided to go
 with dynamic formatting for now. Once we get that right, providing an
 optional compile-time-checked formatting function shouldn't be too
 difficult with CTFE.

The problem I see here is that the dynamic version still cannot be checked when
passed a statically known format string.

Why did you drop the idea of allowing something like

int readf(T...)(static string format, T args) ?


 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
 mistakenly claimed before). I think this is just a quality of implementation
 issue, but it is important.

 I agree. I'm amazed readf is not slower actually. It uses by character
 file iteration, by far the slowest (and most embarrassing) code I wrote
 in Phobos: each character read entails one call to getc() to fetch the
 character, one call to ungetc() to restore the stream position, and
 finally one more call to getc() to move forward. The code is correct but
 very slow. Some C APIs provide undocumented means to peek at the next
 character in the stream without actually advancing the stream, which is
 what we need. I know how to do it on most Unixen and Walter knows how to
 do it on his own cstdlib implementation. We didn't have the time yet,
 and I'm glad the matter is under spotlight.

     Especially for programming competitions where there are time limits, you
do not
 want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)

 Agreed.

     Other than that, D is WAY the most convenient language I have ever tried to
 solve small algorithmic tasks in.
 5. Not really readf related: There's writef(ln) and there is write(ln). And
then
 there is readf. I will provide a proof-of-concept for the read function soon.

 Good idea. I suggest you provide a template read(T)() that mimics the
 functionality of Java's nextInt, nextFloat etc:

 auto a = stdin.next!int();
 auto b = stdin.next!double();
 auto s = stdin.next!string("\n"); // read a string up to \n
 ...


 Andrei


Yes, I think it should support:

auto a = read!int;
auto b = read!double;
auto s = read!string("\n"); // this could be an overload on immutability.
alternative would be read!(string,"\n"); I don not know.

auto x = read!(int[])(50); // read an array of 50 integers separated by
whitespace
auto y = read!(int[],",")(50); // read an array of 50 integers separated by
commas
auto z = read!(int[],", ")(50); // read an array of 50 integers separated by
commas and whitespace

Plus the same for every type that can be to!type(string)'d.

But also: read should replace readf wherever possible in the following forms:

int a; double b; string s;
read(a,b,s);//reads whitespace-separated a, b and s in turn. (delimiter could be
changed by template argument or so)

char[] c=new char[1000];
read(c); // only relocates c if the number of read characters exceeds 1000.

One problem I see: An evildoer could provide a huge input, filling up the whole
RAM. I think this vulnerability is also present in readln. Any ideas?


Non-string arrays are handled this way:

int[100] arr;
read(arr); // reads 100 integers and stores in arr

read(arr[0..20]); //reads 20 integers into the first 20 slots of arr

int arr[] = new arr[100];
read(arr); //ditto

Rationale: reading input should not /require/ heap activity.

The read function would cover all cases where no strict whitespace handling is
required, and readf would take the rest! I think that would be a very nice
solution.


Timon

May 09 2011

Andrew Wiley <wiley.andrew.j gmail.com> writes:

On Sun, May 8, 2011 at 3:04 PM, Timon Gehr <timon.gehr gmx.ch> wrote:

 Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about

 competing.
 I just wanted to see how D would compare after doing so many programming
 contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }


 Well, I don't like D's readf either (I use scanf, 2-3x faster and better
 whitespace handling). That said, you really made my day.
 The problem is not that reading input in D is less straightforward than in
 Java,
 the problem is, that you are used to Java's way of doing IO. (which I
 pretty much
 dislike, I guess it is a matter of taste)

 You do not actually have to bother with string handling at all when doing
 IO in
 C/C++/D.

 Reading array of integers:

 int[100000] array; //somewhere in static storage, faster
 ...
 scanf("%d",&n);
 foreach(ref x;array) scanf("%d",&x);

What bothers me about that code is that you had to write a string to
represent something that should be implicit. It may just be that
formattedRead is more strict than scanf, but I had problems getting
whitespace to behave properly with format code strings.
Plus, when you just type %d, what if I want a long? What if I want an
infinite precision integer? These things aren't solved by C function calls,
and trying to come up with a string format code for every possible input
would needlessly complicate things.

Or, some heap activity involved, and actually more keystrokes, but some
 people
 like this way:
 readf("%s",&n);//read number of items

 int[] array=to!(int[])(split(strip(readln())));


 How I would have written your example in D.
 int numTests; scanf("%d", &numTests);
 foreach(test;0..numTests){
    int numSteps; scanf("%d", &numSteps);
    foreach(step;0..numSteps){ //you have a bug in this line of your Java
 code
 introducing a looooong loop
        char robot; scanf("%c", &robot);
        int button; scanf("%d", &button);
         //solve the problem!
    }
    //print the output
 }

As a note, I recently discovered while running through some D1 code that %c
isn't a format code recognized by the D2 formatting functions. I realize
this is C though.

 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {

 ...

 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a

 test
 is on the same line as the items.

 That is not a problem at all, you can read the first few elements with
 readf and
 the rest of the line with readln


The documentation seems to imply that readf reads an entire line. Was I just
misunderstanding it?



 I could have just been missing something, but something that was trivial

 in
 Java became brittle in D because I had to exactly match the whitespace

 for

 I actually think Java's way is brittle. You have to instantiate a class
 just to
 read IO.

That doesn't make it brittle, that makes it heavy and/or overkill. What's
brittle is when I have to exactly match whitespace, write strings for things
that should be implicit, and keep track of more state than is strictly
necessary. Java's Scanner is nice because you ask for an integer and get an
integer, and as long as you ask for the right things in the right order, you
don't have to track any state whatsoever. Keeping track of where you are in
the input stream is something better left to the code doing the reading
rather than the user.
Your way doesn't involve state, but it also doesn't generalize to other
types of streams.


 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];

 I don't get this.

It's simple. I have a line that looks like this:
4 3 2 67 5
The first number is the number of numbers that follow, and the code looks
like this:
string line = "4 3 2 67 5";
auto split = split(line);
int num = to!int(line[0]);
line = line[1..$];
foreach(index; 0..num) {
    int cur - to!int(line[0]);
    line = line[1..$];
    // do things
}

I realize this is just a more complicated version of your heap code above,
but suppose I needed to read an integer, a string, and a floating point
number for each item. This scales up quite nicely to that sort of thing.


 ...

 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.

 No chance, that is not the way D/Phobos works. You do not have a class for
 everything that would not need one. (just like Phobos does not have a
 writer class
 for output)

Yes, if I had thought a bit more, I wouldn't have said class. This could
just be implemented as a few simple methods for reading primitives from
string ranges (or character ranges, actually, as that would be more
general). I would expect something like this to appear with the stream API
that we'll hopefully build at some point. A class would probably be
overkill.


 However I agree that Phobos has to provide some better input handling,
 since using
 possibly unsafe C functions is the best way to do it by now. (I think readf
 is
 severely crippled) I may try to implement a meaningful "read" function.


I think that input handling like this should be built on top of a stream
API, and because that API isn't here yet, improving input may be premature.
Or it may be too useful to wait.

May 08 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

 Andrew Wiley wrote:
 I was one of the D users, although I wasn't really worried about
 competing. I just wanted to see how D would compare after doing so many
 programming contests in Java.
 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,
 I'd do something like this in Java:
 Scanner in = new Scanner(System.in);
 int numTests = in.nextInt();
 for(int test = 0; test < numTests; tests++) { //need the test index for
 output
 int numSteps = in.nextInt();
 for(; numSteps < 0; numSteps--)
 char robot = in.nextChar();
 int button = in.nextInt();
 //solve the problem!
 }
 //print the output!
 }

 
 Well, I don't like D's readf either (I use scanf, 2-3x faster and better
 whitespace handling). That said, you really made my day.
 The problem is not that reading input in D is less straightforward than in
 Java, the problem is, that you are used to Java's way of doing IO. (which
 I pretty much dislike, I guess it is a matter of taste)
 
 You do not actually have to bother with string handling at all when doing
 IO in C/C++/D.
 
 Reading array of integers:
 
 int[100000] array; //somewhere in static storage, faster
 ...
 scanf("%d",&n);
 foreach(ref x;array) scanf("%d",&x);
 
 Or, some heap activity involved, and actually more keystrokes, but some
 people like this way:
 readf("%s",&n);//read number of items
 
 int[] array=to!(int[])(split(strip(readln())));
 
 
 How I would have written your example in D.
 int numTests; scanf("%d", &numTests);
 foreach(test;0..numTests){
     int numSteps; scanf("%d", &numSteps);
     foreach(step;0..numSteps){ //you have a bug in this line of your Java
 code introducing a looooong loop
         char robot; scanf("%c", &robot);
         int button; scanf("%d", &button);
         //solve the problem!
     }
     //print the output
 }
 
 In D, that looked like this:
 string line;
 int num;
 stdin.readln(line);
 formattedRead(line, "%s", &num);
 for(int casen = 0; casen < num; casen++) {
 
 ...
 
 In a few places, I could have used stdin.readf instead of
 readln/formattedRead, but not many because the number of items within a
 test is on the same line as the items.

 
 That is not a problem at all, you can read the first few elements with
 readf and the rest of the line with readln
 
 I could have just been missing something, but something that was trivial
 in Java became brittle in D because I had to exactly match the
 whitespace for

 
 I actually think Java's way is brittle. You have to instantiate a class
 just to read IO.
 
 things to work. I suppose I could have read a line and used splitter to
 split on whitespace, but that would make me have to watch more state and
 would wind up looking like this:
 string line;
 stdin.readln(line);
 auto split = split(line);
 int num = to!int(split[0]);
 split = split[1..$];

 
 I don't get this.
 
 ...
 
 Actually... now that I'm looking at that, if I wrote a Scanner-like class
 based on this, is there any chance it could go into Phobos? Seems like
 between split and to, we could get something much less brittle working.

 
 No chance, that is not the way D/Phobos works. You do not have a class for
 everything that would not need one. (just like Phobos does not have a
 writer class for output)
 
 However I agree that Phobos has to provide some better input handling,
 since using possibly unsafe C functions is the best way to do it by now.
 (I think readf is severely crippled) I may try to implement a meaningful
 "read" function.

stdin is already a struct in D. To do it in a more Java-like manner would 
likely involve having a templated read function which is templated on the type 
that you want to get out of stdin next. Essentially, you'd do something like 
std.conv.parse directly on stdin by having it as part of std.stdio.File.

Now, personally, I just always read in the whole line and then use 
std.conv.parse on it. I'm not sure if that actually costs you anything in 
terms of functionality, though it might be possible to implement a templated 
read function on std.stdio.File more efficiently. And using parse like that, 
you can get much friendlier I/O which is closer to what you'd get with Scanner 
in Java.

- Jonathan M Davis

May 08 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrew Wiley:

 The main thing that frustrated me was that getting input in D wasn't
 anywhere near as straightforward as it is in Java. For the first problem,

I have tried to implement a D solution to the first problem, because its input
is a bit more complex. I have used C++ code written the winner as starting
point. After several failed D versions (this is BAD for D2/Phobos), I've
written a Python prototype and then I have translated it to D2:

import std.stdio, std.math, std.conv, std.string, std.array, std.algorithm;

auto next(R)(ref R range) {
   auto result = range.front();
   range.popFront();
   return result;
}

void main() {
    auto fin = File("input.txt");
    auto fout = File("output.txt", "w");
    foreach (i; 0 .. to!int(fin.readln().strip())) {
        int[2] lastP = 1;
        int[2] lastT = 0;
        int t = 0;
        auto parts = splitter(fin.readln().strip(), " ");
        foreach (_; 0 .. to!int(next(parts))) {
            string s = next(parts);
            int q = to!int(next(parts));
            int id = cast(int)(s == "B");
            t = max(t, abs(q - lastP[id]) + lastT[id]) + 1;
            lastP[id] = q;
            lastT[id] = t;
        }

    }
}


Three problems I've found in translating the prototype:

- A next() function/method is missing, but I needed it, so I have had to define
it, to keep code from becoming hairy and quite less readable.


to!int expects a stripped string. In my code I am never sure to have a stripped
string coming from input, so I have to always add a strip(), this is dumb:
foreach (i; 0 .. to!int(fin.readln().strip())) {
==>
foreach (i; 0 .. to!int(fin.readln())) {


std.algorithm.splitter() doesn't default to splitting on whitespace as
std.string.split() does. This is bad because in this program I need to add a
strip() and in general it's bad because if there are two spaces, or a newline,
it causes a mess, so I'd like a new overload of splitter() that acts as split():
auto parts = splitter(fin.readln().strip(), " ");
==>
auto parts = splitter(fin.readln());

Bye,
bearophile

May 08 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Google Code Jam 2011 Language Usage