www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - stdio performance in tango, stdlib, and perl

reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
release), and Tango.

First, I realize I should make an account on dsource.org and post the 
following there, but I'll mention here that it's quite disappointing 
that Tango's idiomatic method of reading a line from the console 
(Cin.nextLine(line) unless I missed something) chose to chop the newline 
automatically. The Perl book spends half a page or so explaining why 
it's _good_ that the newline is included in the line, and I've been 
thankful for that on numerous occasions when writing Perl. Please put 
the newline back in the line.

Anyhow, here's the code. The D up-and-coming stdio version:

import std.stdio;
void main() {
   char[] line;
   while (readln(line)) {
     write(line);
   }
}

The Tango version:

import tango.io.Console;
void main() {
   char[] line;
   while (Cin.nextLine(line)) {
     Cout(line).newline;
   }
}

(The .newline adds back the information that nextLine promptly lost, 
sigh.) I'm not sure whether this is the idiomatic way of reading and 
writing lines in Tango, but tango.io.Stdout seems to say so: "If you 
don't need formatted output or unicode translation, consider using the 
module tango.io.Console directly." - which suggests that Console would 
be the most primitive stdio library.

The Perl version:
#!/usr/bin/env perl
while (<>) {
   print;
}

All programs operate in the same exact boring way: read a line from 
stdin, print it, lather, rinse, repeat.

I passed a 31 MB text file (containing a dictionary that I'm using in my 
research) through each of the programs above. The output was set to 
/dev/null. I've ran the same program multiple times before the actual 
test, so everything is cached and the process becomes 
computationally-bound. Here are the results summed for 10 consecutive 
runs (averaged over 5 epochs):

13.9s		Tango
6.6s		Perl
5.0s		std.stdio


Andrei
Mar 21 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
 release), and Tango.

Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.
Mar 21 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the 
 coming release), and Tango.

Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.

Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcat Andrei
Mar 21 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Obliged. Darn, I had to wait a *lot* longer.
 
 #include <string>
 #include <iostream>
 
 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }
 
 (C++ makes the same mistake wrt newline.)
 
 35.7s        cppcat

This is awesomely bad. Although it's possible to get very fast code out of C++, it rarely seems to happen when you write straightforward code.
 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:
 
 #include <string>
 #include <iostream>
 using namespace std;
 
 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }
 
 Result:
 
 13.3s        cppcat

Turning off sync is cheating - D's readln does syncing.
Mar 21 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Obliged. Darn, I had to wait a *lot* longer.

 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

This is awesomely bad. Although it's possible to get very fast code out of C++, it rarely seems to happen when you write straightforward code.
 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

Turning off sync is cheating - D's readln does syncing.

I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of. Andrei
Mar 21 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.

I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code. D's readln is inherently synced in this manner.
Mar 21 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.

I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code. D's readln is inherently synced in this manner.

Aha, so readln is better _and_ more compatible. Great! Andrei
Mar 21 2007
prev sibling next sibling parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[snip]
 (C++ makes the same mistake wrt newline.)
 
 35.7s        cppcat
 
 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:
 
 #include <string>
 #include <iostream>
 using namespace std;
 
 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }
 
 Result:
 
 13.3s        cppcat

Out of interest, how does the currently shipping Phobos fare in this test?
Mar 21 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

Out of interest, how does the currently shipping Phobos fare in this test?

I don't have it anymore. Couldn't write a test anyway, because currently Phobos does not offer readln. Andrei
Mar 21 2007
prev sibling next sibling parent reply James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.

Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcat

Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.) I don't have your sample file or your machine, but for the quick tests I just ran on this one machine, the code above runs move than 60% faster. Without using tie(), each read from standard input causes a flush of standard output (so that, by default, they work appropriately for console I/O). It's certainly true that making efficient use of IOStreams needs some specific knowledge, and that writing an efficient implementation of IOStreams is far from trivial. But if we're comparing to C++, we should probably compare to some reasonably efficient idiomatic C++. -- James
Mar 21 2007
next sibling parent reply torhu <fake address.dude> writes:
James Dennett wrote:
<snip>
 Try the way IOStreams would be used if you didn't want
 it to go slowly:
 
 #include <string>
 #include <iostream>
 
 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }

I did some tests with a 58 MB file, containing one million lines. I'm on winxp. I ran each test a few times, timing them with a stopwatch. I threw in a naive C version, and a std.cstream version, just out of curiousity. It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with mingw it does. Basically, Tango wins hands down on my system. Whether the Tango version flushes after each line or not, doesn't seem to matter much on Windows. Compiled with: dmd -O -release -inline gcc -O2 (mingw 3.4.2) cl /O2 /GX Fastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51s --- // Tango import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { //Cout(line).newline; Cout(line)("\n"); } } --- --- // Phobos std.cstream test import std.cstream; void main() { char[] buf = new char[1000]; char[] line; while (!din.eof()) { line = din.readLine(buf); dout.writeLine(line); } } --- --- /* C, reusing buffer */ #include <stdio.h> #include <stdlib.h> char buf[1000]; int main() { while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); } return 0; } --- --- /* C test w/malloc and free */ #include <stdio.h> #include <stdlib.h> int main() { char *buf = malloc(1000); while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); free(buf); buf = malloc(1000); } free(buf); return 0; --- --- // Andrei's #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } return 0; } --- --- // James' #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } ---
Mar 21 2007
next sibling parent reply torhu <fake address.dude> writes:
torhu wrote:
<snip>
 Fastest first:
 
 tango.io.Console, no flushing (Andrei's): ca 1.5s
 
 C, reusing buffer, gcc & msvc71: ca 3s
 
 James' C++, gcc: 3.5s
 
 Phobos std.cstream, reused buffer: 11s
 
 C w/malloc and free each line, msvc71: 23s
 
 Andrei's C++, gcc: 27s
 
 C w/malloc and free each line, gcc: 37s
 
 Andrei's C++, msvc71: 50s
 
 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.
Mar 22 2007
next sibling parent reply kris <foo bar.com> writes:
torhu wrote:
 torhu wrote:
 <snip>
 
 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 torhu wrote:
 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io

That's great news!
 Thanks for giving it a whirl, tohru :)
 
 
 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line? Andrei
Mar 22 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io

That's great news!
 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?

Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent. If "interaction" between D & C on a shared, global file-handle becomes some kind of issue due to buffering (and only if) we'll cross that bridge at that point in time. I'm sure there's a number of solutions that don't involve restricting D to using a lowest common denominator approach. There's lots of smart people here who would be willing to help resolve that if necessary. The tango.io package is intended to be clean, extensible, simple, and a whole lot more coherent than certain others. We feel it meets those goals, and it happens to be quite efficient at the same time. Seems a bit like sour-grapes to start looking for "issues" with that intent, particularly when compared to an implementation that proclaims "It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio" ? Tango is not meant to be a phobos clone; it doesn't make the same claims as phobos and it doesn't follow the same rules as phobos. If you need phobos rules, then use phobos. If you don't like tango.io speed, extensibility and simplicity, without all the special cases of C IO, then use phobos. If you want both then, at some point, we'll consider figuring out how to make your C-oriented corner-cases work with tango.io Walter wrote: "One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed." Andrei wrote: "Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code." That sentiment is very much what Tango itself is about. You began this thread by titling it "stdio and Tango IO performance" and noting the following: "has anyone verified that Tango's I/O performance is up to snuff? I see it imposes the dynamic-polymorphic approach, and unless there was some serious performance work going on, it's possible it's even slower than stdio. " Given the results shown above, I hope we can put that to rest at this time.
Mar 22 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io

That's great news!
 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?

Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent.

That's not what my tests show on Linux, where Perl and readln beat Tango by a large margin.
 If "interaction" between D & C on a shared, global file-handle becomes 
 some kind of issue due to buffering (and only if) we'll cross that 
 bridge at that point in time. I'm sure there's a number of solutions 
 that don't involve restricting D to using a lowest common denominator 
 approach. There's lots of smart people here who would be willing to help 
 resolve that if necessary.

Exactly. What I argue for is not adding _gratuitous_ incompatibility. I'm seeing that using read instead of getline on Linux does not add any speed. They why not use getline and be done with it. Everybody would be happy.
 The tango.io package is intended to be clean, extensible, simple, and a 
 whole lot more coherent than certain others. We feel it meets those 
 goals, and it happens to be quite efficient at the same time. Seems a 
 bit like sour-grapes to start looking for "issues" with that intent, 
 particularly when compared to an implementation that proclaims "It peeks 
 under the hood of C's stdio implementation, meaning it's customized for 
 Digital Mars' stdio, and gcc's stdio" ?

I'm not sure understand this. For all it's worth, there's no sour grapes in the mix. I *wanted* to switch to Tango to save me future aggravation.
 Tango is not meant to be a phobos clone; it doesn't make the same claims 
 as phobos and it doesn't follow the same rules as phobos. If you need 
 phobos rules, then use phobos. If you don't like tango.io speed, 
 extensibility and simplicity, without all the special cases of C IO, 
 then use phobos. If you want both then, at some point, we'll consider 
 figuring out how to make your C-oriented corner-cases work with tango.io

They aren't C-oriented. They are stream-oriented. It just so happens that the OS opens some streams and serves them to you in FILE* format. I have programs that read standard input and write to standard output. They are extremely easy to combine, parallelize, and run on a cluster. After switching form Perl to D for performance considerations, I was in a position of a net loss. Then I've been to hell and back figuring what the problem was and fixing it. Then I thought, hmmm, maybe I could have avoided all that by switching to Tango. So I tried Tango and it was again a net loss. Perl's I/O beats Tango's Cin.
 Walter wrote: "One of my goals with D is to fix that - the 
 straightforward, untuned code should get you most of the possible speed."
 
 Andrei wrote: "Just make the clear and simple code fastest. One thing I 
 like about D is that it clearly strives to achieve best performance for 
 simply-written code."
 
 That sentiment is very much what Tango itself is about.
 
 You began this thread by titling it "stdio and Tango IO performance" and 
 noting the following: "has anyone verified that Tango's I/O performance 
 is up to snuff? I see it imposes the dynamic-polymorphic approach, and 
 unless there was some serious performance work going on, it's possible 
 it's even slower than stdio. "
 
 Given the results shown above, I hope we can put that to rest at this time.

Of course you can, it's your library. You look at the results that please you most, I look at the results of my concrete application. I simply can't afford a 50%+ loss in I/O throughput, so I need to stay with Phobos. Why, I don't understand. Andrei
Mar 22 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io

That's great news!
 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?

Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent.

That's not what my tests show on Linux, where Perl and readln beat Tango by a large margin.
 If "interaction" between D & C on a shared, global file-handle becomes 
 some kind of issue due to buffering (and only if) we'll cross that 
 bridge at that point in time. I'm sure there's a number of solutions 
 that don't involve restricting D to using a lowest common denominator 
 approach. There's lots of smart people here who would be willing to 
 help resolve that if necessary.

Exactly. What I argue for is not adding _gratuitous_ incompatibility. I'm seeing that using read instead of getline on Linux does not add any speed. They why not use getline and be done with it. Everybody would be happy.
 The tango.io package is intended to be clean, extensible, simple, and 
 a whole lot more coherent than certain others. We feel it meets those 
 goals, and it happens to be quite efficient at the same time. Seems a 
 bit like sour-grapes to start looking for "issues" with that intent, 
 particularly when compared to an implementation that proclaims "It 
 peeks under the hood of C's stdio implementation, meaning it's 
 customized for Digital Mars' stdio, and gcc's stdio" ?

I'm not sure understand this. For all it's worth, there's no sour grapes in the mix. I *wanted* to switch to Tango to save me future aggravation.
 Tango is not meant to be a phobos clone; it doesn't make the same 
 claims as phobos and it doesn't follow the same rules as phobos. If 
 you need phobos rules, then use phobos. If you don't like tango.io 
 speed, extensibility and simplicity, without all the special cases of 
 C IO, then use phobos. If you want both then, at some point, we'll 
 consider figuring out how to make your C-oriented corner-cases work 
 with tango.io

They aren't C-oriented. They are stream-oriented. It just so happens that the OS opens some streams and serves them to you in FILE* format. I have programs that read standard input and write to standard output. They are extremely easy to combine, parallelize, and run on a cluster. After switching form Perl to D for performance considerations, I was in a position of a net loss. Then I've been to hell and back figuring what the problem was and fixing it. Then I thought, hmmm, maybe I could have avoided all that by switching to Tango. So I tried Tango and it was again a net loss. Perl's I/O beats Tango's Cin.
 Walter wrote: "One of my goals with D is to fix that - the 
 straightforward, untuned code should get you most of the possible speed."

 Andrei wrote: "Just make the clear and simple code fastest. One thing 
 I like about D is that it clearly strives to achieve best performance 
 for simply-written code."

 That sentiment is very much what Tango itself is about.

 You began this thread by titling it "stdio and Tango IO performance" 
 and noting the following: "has anyone verified that Tango's I/O 
 performance is up to snuff? I see it imposes the dynamic-polymorphic 
 approach, and unless there was some serious performance work going on, 
 it's possible it's even slower than stdio. "

 Given the results shown above, I hope we can put that to rest at this 
 time.

Of course you can, it's your library. You look at the results that please you most, I look at the results of my concrete application. I simply can't afford a 50%+ loss in I/O throughput, so I need to stay with Phobos. Why, I don't understand.

Oh, come now. Yesterday Tango was the "fastest" on your machine, and today it is not. And you're now claiming a 50% loss in throughput? I put it to you that you're not being very forthcoming in allowing for changes in tango.io to address this anomoly in your timings? Yesterday I pointed out where to make the change so that you could try tango without the automatic chomp; you didn't bother to do that. There is a change in SVN implementing your request, but you're not bothering to try that either. Instead, you appear to be using empty rhetoric and exaggeration to pit one library against another. That's hardly being helpful, Andrei. Tango has been shown to be very efficient on Win32, and there's no reason to assert that it can't be so on linux. We've seen that flush() is a no-no for linux, and that it has some impact on Win32 also. That can be rectified, as Walter kindly pointed out. If you're serious about giving Tango a shot, then give it some time for the different platform specifics to be addressed. Is that really too much to ask? Of a beta release?
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Oh, come now. Yesterday Tango was the "fastest" on your machine, and 
 today it is not. And you're now claiming a 50% loss in throughput?

Probably it's a misunderstanding. Yesterday the Tango that did not output the newlines was fastest. I don't have Tango code to test a version that reads lines including the newline, so I tried the Cout(line)("\n") thing, which was slow. I'd be of course happy to use something that is faster, no matter where it comes from.
 I put it to you that you're not being very forthcoming in allowing for 
 changes in tango.io to address this anomoly in your timings? Yesterday I 
 pointed out where to make the change so that you could try tango without 
 the automatic chomp; you didn't bother to do that. There is a change in 
 SVN implementing your request, but you're not bothering to try that either.

It's not that I didn't bother; just getting my app to link with Tango was hard for me, so recompiling and rebuilding libtango.a was likely to take me a long time. Furthermore, I don't have svn installed nor admin access on the cluster I work on. If you put a libtango.a somewhere to be found with http or ftp, I'd be glad to download it.
 Instead, you appear to be using empty rhetoric and exaggeration to pit 
 one library against another. That's hardly being helpful, Andrei.
 
 Tango has been shown to be very efficient on Win32, and there's no 
 reason to assert that it can't be so on linux. We've seen that flush() 
 is a no-no for linux, and that it has some impact on Win32 also. That 
 can be rectified, as Walter kindly pointed out. If you're serious about 
 giving Tango a shot, then give it some time for the different platform 
 specifics to be addressed. Is that really too much to ask? Of a beta 
 release?

Of course this is great news. There's only one guy using rhetoric in this thread, and that's not me :o). Andrei
Mar 22 2007
parent Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 I put it to you that you're not being very forthcoming in allowing for 
 changes in tango.io to address this anomoly in your timings? Yesterday 
 I pointed out where to make the change so that you could try tango 
 without the automatic chomp; you didn't bother to do that. There is a 
 change in SVN implementing your request, but you're not bothering to 
 try that either.

It's not that I didn't bother; just getting my app to link with Tango was hard for me, so recompiling and rebuilding libtango.a was likely to take me a long time. Furthermore, I don't have svn installed nor admin access on the cluster I work on.

We're in the process of getting an automated nightly snapshot process set up. The scripts are actually written, and we're sorting out hosting and such. I'm sure someone would be willing to put one online somewhere in the interim. I'll do it myself if I can track down the Linux build scripts. Sean
Mar 22 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Tango is not meant to be a phobos clone; it doesn't make the same claims 
 as phobos and it doesn't follow the same rules as phobos. If you need 
 phobos rules, then use phobos. If you don't like tango.io speed, 
 extensibility and simplicity, without all the special cases of C IO, 
 then use phobos. If you want both then, at some point, we'll consider 
 figuring out how to make your C-oriented corner-cases work with tango.io

I think you'd make a lot of people happy. Several documented attempts of installing Tango failed for me, so in the end I figured some way to get programs to compile with a special command line and a modification of dmd.conf. I need to modify dmd.conf whenever I switch between Phobos programs and Tango programs. Andrei
Mar 22 2007
parent reply Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of 
 installing Tango failed for me, so in the end I figured some way to get 
 programs to compile with a special command line and a modification of 
 dmd.conf. I need to modify dmd.conf whenever I switch between Phobos 
 programs and Tango programs.

This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind. Sean
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of installing Tango failed for me, so in 
 the end I figured some way to get programs to compile with a special 
 command line and a modification of dmd.conf. I need to modify dmd.conf 
 whenever I switch between Phobos programs and Tango programs.

This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind.

Here's what worked for me. The script also allows compiling dmd programs on the fly. For some reason I needed to include libtango.a in the DFLAGS variable. ----------------------------------------- #!/bin/sh D_BIN=$(dirname $(which dmd)) WHICH=$1 if [ "$WHICH" = "phobos" ]; then DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib -L-L$D_BIN/../../dm/lib" elif [ "$WHICH" = "tango" ]; then DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix" DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a" else echo "Please pass either phobos or tango as the first argument" WHICH="" fi if [ ! -z "$WHICH" ]; then shift if [ "$*" != "" ]; then dmd $* else export DFLAGS echo "dmd configured for $WHICH" fi fi ----------------------------------------- Andrei
Mar 22 2007
parent Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of installing Tango failed for me, so in 
 the end I figured some way to get programs to compile with a special 
 command line and a modification of dmd.conf. I need to modify 
 dmd.conf whenever I switch between Phobos programs and Tango programs.

This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind.

Here's what worked for me. The script also allows compiling dmd programs on the fly. For some reason I needed to include libtango.a in the DFLAGS variable.

This is intentional, though it may change later based on user feedback. That said, my personal belief is that only the compiler runtime code should be implicitly linked, and the rest should be linked via DFLAGS or by some other means. In Tango parlance, this would mean implicitly linking the compiler runtime (libdmd.a), but not the GC code, the Tango runtime, or Tango user code. This is currently quite possible--it just isn't the default configuration because it's unnecessarily complex for most users. For those who are interested however, the process is outlined here: http://www.dsource.org/projects/tango/wiki/TopicAdvancedConfiguration
 -----------------------------------------
 #!/bin/sh
 
 D_BIN=$(dirname $(which dmd))
 WHICH=$1
 
 if [ "$WHICH" = "phobos" ]; then
     DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib 
 -L-L$D_BIN/../../dm/lib"
 elif [ "$WHICH" = "tango" ]; then
     DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix"
     DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a"
 else
     echo "Please pass either phobos or tango as the first argument"
     WHICH=""
 fi
 
 if [ ! -z "$WHICH" ]; then
     shift
     if [ "$*" != "" ]; then
     dmd $*
     else
     export DFLAGS
     echo "dmd configured for $WHICH"
     fi
 fi
 -----------------------------------------

Thanks. I'll look this over and see about adding it to the wiki. Sean
Mar 22 2007
prev sibling next sibling parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
torhu wrote:
 torhu wrote:
 <snip>
 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

cat is not comparable. Besides, there must be some overhead associated with that cat, because Linux' cat consistently clocks way faster than all line-oriented tests. Andrei
Mar 22 2007
prev sibling parent torhu <fake address.dude> writes:
torhu wrote:
 torhu wrote:
 <snip>
 Fastest first:
 
 tango.io.Console, no flushing (Andrei's): ca 1.5s
 
 C, reusing buffer, gcc & msvc71: ca 3s
 
 James' C++, gcc: 3.5s
 
 Phobos std.cstream, reused buffer: 11s
 
 C w/malloc and free each line, msvc71: 23s
 
 Andrei's C++, gcc: 27s
 
 C w/malloc and free each line, gcc: 37s
 
 Andrei's C++, msvc71: 50s
 
 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.

Couple of more results: ActiveState Perl 5.8.8: 3.8s. Python 2.5: 3.6s. cat.py: --- #!/usr/bin/env python import sys sys.stdout.writelines(sys.stdin.xreadlines()) # to process each line: #sys.stdout.writelines(do_stuff_with_each_line(sys.stdin.xreadlines())) # possibly slower: #sys.stdout.writelines(do_stuff_with_each_line(s) for s in sys.stdin) --- cat.pl: --- #!/usr/bin/env perl while (<>) { print; } --- I guess that's enough benchmarking for now.
Mar 22 2007
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
torhu wrote:
 James Dennett wrote:
 <snip>
 Try the way IOStreams would be used if you didn't want
 it to go slowly:

 #include <string>
 #include <iostream>

 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }

I did some tests with a 58 MB file, containing one million lines. I'm on winxp. I ran each test a few times, timing them with a stopwatch. I threw in a naive C version, and a std.cstream version, just out of curiousity. It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with mingw it does. Basically, Tango wins hands down on my system. Whether the Tango version flushes after each line or not, doesn't seem to matter much on Windows.

 ---
 // Tango
 import tango.io.Console;
 
 void main() {
   char[] line;
 
   while (Cin.nextLine(line)) {
     //Cout(line).newline;
     Cout(line)("\n");
   }
 }
 ---

Oh good. I was hoping someone would test Tango without flushing every line :-) Basically, Tango's 'newline' method is equivalent to C++'s 'endl' mutator function. It should not be used for every carriage return in normal output for performance-critical applications. Rather, it should be used as the trailing newline after writing a block of data that should be displayed immediately ('flush' is another option if no newline is desired). Sean
Mar 22 2007
prev sibling parent reply torhu <fake address.dude> writes:
torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>
 
 int main() {
 
     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.
Mar 24 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
torhu wrote:
 torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>

 int main() {

     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.

I'm guessing the fact that sizeof(buf) != 1000 ?
Mar 24 2007
parent torhu <fake address.dude> writes:
Frits van Bommel wrote:
 torhu wrote:
 
 Whoops, can anyone spot the bug?  When I fixed it, the time it took to 
 run my test went down from about 23 to about 3 seconds.

I'm guessing the fact that sizeof(buf) != 1000 ?

I think you were the first to post. Go buy yourself a lollipop, you've earned it.
Mar 25 2007
prev sibling parent Sean Kelly <sean f4.ca> writes:
torhu wrote:
 torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>

 int main() {

     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.

The fgets(sizeof(buf)) looks like it could affect read performance a tad :-) Sean
Mar 24 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

useful additional data point.

#include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcat

Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.)

With your code pasted and wind from behind: 13.5s cppcat
 I don't have your sample file or your machine, but for
 the quick tests I just ran on this one machine, the code
 above runs move than 60% faster.  Without using tie(),
 each read from standard input causes a flush of standard
 output (so that, by default, they work appropriately for
 console I/O).
 
 It's certainly true that making efficient use of IOStreams
 needs some specific knowledge, and that writing an
 efficient implementation of IOStreams is far from trivial.
 But if we're comparing to C++, we should probably compare
 to some reasonably efficient idiomatic C++.

The sync_with_stdio and tie tricks are already unknown to most programmers, so it would be an uphill battle to characterize them as idiomatic. They are idiomatic for a small group at best. But, obviously not enough. Perl does way better. (Again: gcc on Linux.) Andrei
Mar 22 2007
parent reply James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

useful additional data point.

#include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcat

Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.)

With your code pasted and wind from behind: 13.5s cppcat

Blasted weather. Never a hurricane when you need one.
 I don't have your sample file or your machine, but for
 the quick tests I just ran on this one machine, the code
 above runs move than 60% faster.  Without using tie(),
 each read from standard input causes a flush of standard
 output (so that, by default, they work appropriately for
 console I/O).

 It's certainly true that making efficient use of IOStreams
 needs some specific knowledge, and that writing an
 efficient implementation of IOStreams is far from trivial.
 But if we're comparing to C++, we should probably compare
 to some reasonably efficient idiomatic C++.

The sync_with_stdio and tie tricks are already unknown to most programmers, so it would be an uphill battle to characterize them as idiomatic. They are idiomatic for a small group at best.

IOStreams is a terrible chunk of library design, and its effective use is fiendishly difficult even for fairly trivial tasks. I've implemented large chunks of the C++ standard library, but IOStreams scares me.
 But, obviously not enough. Perl does way better.
 
 (Again: gcc on Linux.)

Most of the time I do large text processing jobs in Perl or inside a database; once in a while I use C++, primarily if I need to do trickier calculations. No good reason D shouldn't be able to handle the jobs I use C++ for in this area (though I'd have to get D working on Solaris, and 64-bit support would probably be necessary). -- James
Mar 22 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
[snip]
 Most of the time I do large text processing jobs in
 Perl or inside a database; once in a while I use C++,
 primarily if I need to do trickier calculations.
 No good reason D shouldn't be able to handle the
 jobs I use C++ for in this area (though I'd have
 to get D working on Solaris, and 64-bit support
 would probably be necessary).

Indeed. Then you'll be glad to hear that D will soon accommodate smarter string literals and probably here-documents, all with interpolation, which should make scripting jobs a snap. Andrei
Mar 22 2007
prev sibling parent reply Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[...]
 
 #include <string>
 #include <iostream>
 
 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

The portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files. Ciao
Mar 22 2007
next sibling parent reply torhu <fake address.dude> writes:
Roberto Mariottini wrote:
<snip>
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text files.
 
 Ciao

Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.
Mar 22 2007
parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
torhu wrote:
 Unless a file is opened in binary mode, '\n' will be translated into
 '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
 (not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.
Mar 22 2007
parent reply kris <foo bar.com> writes:
Deewiant wrote:
 torhu wrote:
 
Unless a file is opened in binary mode, '\n' will be translated into
'\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
(not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.

At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?
Mar 22 2007
parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
kris wrote:
 Deewiant wrote:
 torhu wrote:

 Unless a file is opened in binary mode, '\n' will be translated into
 '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
 (not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.

At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?

Only if you've got the latest SVN revision of Tango. If not, use tango.io.FileConst.NewlineString (side note: for easier access, perhaps Print.Eol should be public and assigned to this) in place of "\n".
Mar 22 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [...]
 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

The portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files.

Wrong. Newline translation will be correct on both systems. Andrei
Mar 22 2007
parent reply Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text 
 files.

Wrong. Newline translation will be correct on both systems.

It depends on how you open the file: 'endl' works even with files open in binary mode (the default on most platforms, the default on the average programmer). Or else, say that 'endl' is yet another design error in C++. Ciao
Mar 23 2007
parent James Dennett <jdennett acm.org> writes:
Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text
 files.

Wrong. Newline translation will be correct on both systems.

It depends on how you open the file: 'endl' works even with files open in binary mode (the default on most platforms, the default on the average programmer). Or else, say that 'endl' is yet another design error in C++. Ciao

The difference between '\n' and std::endl in C++ is only that std::endl flushes the stream after writing a newline (well, and uses widen to convert to the character type of the stream, but binary mode makes no difference to that, it's a property of the template parameters of the stream type to which you are writing). C++ doesn't default to binary mode, though on many platforms that's of academic concern only as there is no distinction between text and binary modes. And this is somewhat off-topic for d.D, I think, except in that we'd like D's IO to be better than C++'s. -- James
Mar 23 2007
prev sibling next sibling parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
 release), and Tango.
 
 First, I realize I should make an account on dsource.org and post the 
 following there, but I'll mention here that it's quite disappointing 
 that Tango's idiomatic method of reading a line from the console 
 (Cin.nextLine(line) unless I missed something) chose to chop the newline 
 automatically. The Perl book spends half a page or so explaining why 
 it's _good_ that the newline is included in the line, and I've been 
 thankful for that on numerous occasions when writing Perl. Please put 
 the newline back in the line.
 
 Anyhow, here's the code. The D up-and-coming stdio version:
 
 import std.stdio;
 void main() {
   char[] line;
   while (readln(line)) {
     write(line);
   }
 }
 
 The Tango version:
 
 import tango.io.Console;
 void main() {
   char[] line;
   while (Cin.nextLine(line)) {
     Cout(line).newline;
   }
 }
 
 (The .newline adds back the information that nextLine promptly lost, 
 sigh.) I'm not sure whether this is the idiomatic way of reading and 
 writing lines in Tango, but tango.io.Stdout seems to say so: "If you 
 don't need formatted output or unicode translation, consider using the 
 module tango.io.Console directly." - which suggests that Console would 
 be the most primitive stdio library.
 
 The Perl version:
 #!/usr/bin/env perl
 while (<>) {
   print;
 }
 
 All programs operate in the same exact boring way: read a line from 
 stdin, print it, lather, rinse, repeat.
 
 I passed a 31 MB text file (containing a dictionary that I'm using in my 
 research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):
 
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

There's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit) 2) the output.newline on each line will cause a flush ~ this may or may not have something to do with it 3) the test would appear to be stressing the parsing of lines just as much (if not more) than the io system itself. All part-and-parcel to a degree, but it may be worth investigating In order to track this down, we'd be interested to see the results of: a) Cout.conduit.copy (Cin.conduit); b) foregoing the output .newline, purely as an experiment c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that what phobos uses also? (on Win32, Tango uses direct Win32 calls instead) Just a head's up: Console is not the lowest IO level. It wraps both a streaming-buffer and console idioms around the raw IO. Raw IO in tango is based around two virtual methods: read(void[]) and write(void[])
Mar 21 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

There's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)

The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.
 2) the output.newline on each line will cause a flush ~ this may or may 
 not have something to do with it

Probably.
 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to a 
 degree, but it may be worth investigating

I don't understand this.
 In order to track this down, we'd be interested to see the results of:
 
 a) Cout.conduit.copy (Cin.conduit);

The program wouldn't be comparable with the others.
 b) foregoing the output .newline, purely as an experiment

4.7s tcat
 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
 what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Then probably that could be filed as a bug in Tango. The nextLine function should lock the file only once, thus giving each thread an entire line, not a portion of a line. Also, using block-oriented read for reading lines makes Tango incompatible with standard C usage (Tango might read more than one line into its buffers; if a C-level function tries to read from the file, it will be too late). Unfortunately there's no a public API for such stuff so system-specific approaches must be taken. readln on Linux uses Gnu's getline(), which locks the file only once per line. See: http://www.gnu.org/software/libc/manual/html_node/Line-Input.html Unfortunately there's one extra copy going on - from the mallocated buffer into D's gc'd array. That copy could be optimized away by using Gnu's malloc hooks: http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html Andrei
Mar 21 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

There's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)

The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.

Yep, just trying to isolate things
 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to a 
 degree, but it may be worth investigating

I don't understand this.

Just suggesting that the scanning for [\r]\n patterns is likely a good chunk of the CPU time
 b) foregoing the output .newline, purely as an experiment

4.7s tcat

Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular test Can you distill the benefits of retaining CR on a readline, please?
Mar 21 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

There's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)

The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.

Yep, just trying to isolate things
 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to 
 a degree, but it may be worth investigating

I don't understand this.

Just suggesting that the scanning for [\r]\n patterns is likely a good chunk of the CPU time
 b) foregoing the output .newline, purely as an experiment

4.7s tcat

Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular test

Well probably but must be tested. Newlines comprise about 3% of the file size.
 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information. Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; } The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code. In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times. Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style. All of my C++ applications use getline() or fgets() (both of which thankfully do include the newline) and then process the line in-situ. Andrei
Mar 21 2007
next sibling parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[snip]

 4.7s    tcat

Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular test

Well probably but must be tested. Newlines comprise about 3% of the file size.

Yeah, I can imagine. Module tango.io.Console at line 119 should have a slice in it ... if you change 'j' to be 'i+1' instead, that should remove the chop Tango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO) Having said that, I'm very glad you ran this since it shows how much overhead there is in a flush operation (on *nix) that's very useful to know
 
 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.

That's a valid point [snip]
Mar 21 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 Having said that, I'm very glad you ran this since it shows how much 
 overhead there is in a flush operation (on *nix) that's very useful to know

The flush on newline should only be done if isatty() returns !=0.
Mar 21 2007
parent kris <foo bar.com> writes:
Walter Bright wrote:
 kris wrote:
 
 Having said that, I'm very glad you ran this since it shows how much 
 overhead there is in a flush operation (on *nix) that's very useful to 
 know

The flush on newline should only be done if isatty() returns !=0.

yep; if you were to submit a ticket for that, it would be appreciated :) http://www.dsource.org/projects/tango/newticket
Mar 21 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 
 4.7s    tcat

Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular test

Well probably but must be tested. Newlines comprise about 3% of the file size.

Yeah, I can imagine. Module tango.io.Console at line 119 should have a slice in it ... if you change 'j' to be 'i+1' instead, that should remove the chop

Yum.
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean much 
 of anything important whether tango "wins" this or not (IMO)

Why not? Programs using the standard input and output are ubiquitous, efficient, and extremely easy to combine. I write them all the time for processing huge amounts of data. I didn't run the tests willy-nilly. I had a Perl script that took a night to run (it scrambles through some 20 GB of data), so I decided to give D a shot. The D equivalent was two times slower. With the new readln, it takes 98 minutes; parallelized, it is hand over fist another five times faster (which was impossible in the previous version because it used 98% CPU). I was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others. If there's any chance that automated chopping could be removed from Tango, that would be awesome. Also it would be great to fix the incompatibility created by using read/write instead of getline. Andrei
Mar 21 2007
next sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

Most programs I run that do lots of I/O only take seconds to run, so if they run 50% slower or faster, not only wouldn't I notice, I wouldn't care. Taking a sip of coffee takes longer than that. That is why I haven't noticed. (Maybe I should continue working on my mini-DataBase library project and give it a good "real world" workout <G>) By the way, I do appreciate you doing this performance comparison and improving Phobos' I/O routine. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 11:19:16 AM
Mar 21 2007
parent reply Davidl <Davidl 126.com> writes:
u r working on database?
i have a feeling that SQL ain't really suitable for
databse-related development, any better idea?

 On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For
 Email) wrote:

 I was actually surprised that nobody noticed phobos' low I/O speed in
 years. It's a maker or breaker for me and many others.

Most programs I run that do lots of I/O only take seconds to run, so if they run 50% slower or faster, not only wouldn't I notice, I wouldn't care. Taking a sip of coffee takes longer than that. That is why I haven't noticed. (Maybe I should continue working on my mini-DataBase library project and give it a good "real world" workout <G>) By the way, I do appreciate you doing this performance comparison and improving Phobos' I/O routine.

Mar 21 2007
parent Derek Parnell <derek nomail.afraid.org> writes:
On Thu, 22 Mar 2007 10:22:28 +0800, Davidl wrote:

 u r working on database?
 i have a feeling that SQL ain't really suitable for
 databse-related development, any better idea?

Yep. A light-weight, single-user D/B suitable for "home" applications. It has its own DSL so I'm hoping to eventually use some of D's new mixin goodies to help generate optimal code from high-level Database statements. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 1:49:41 PM
Mar 21 2007
prev sibling parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular [snip]
 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

That assumes IO performance wasn't brought up as an issue before ;)
 If there's any chance that automated chopping could be removed from 
 Tango, that would be awesome. Also it would be great to fix the 
 incompatibility created by using read/write instead of getline.

Sure; could you submit a ticket for it, please, lest it fall by the wayside? http://www.dsource.org/projects/tango/newticket
Mar 21 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.
 [snip]
 
 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

That assumes IO performance wasn't brought up as an issue before ;)
 If there's any chance that automated chopping could be removed from 
 Tango, that would be awesome. Also it would be great to fix the 
 incompatibility created by using read/write instead of getline.

Sure; could you submit a ticket for it, please, lest it fall by the wayside? http://www.dsource.org/projects/tango/newticket

For the \n, read/write, or both? :o) Andrei
Mar 21 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

[snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.

Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;) [snip]
 Sure; could you submit a ticket for it, please, lest it fall by the 
 wayside?

 http://www.dsource.org/projects/tango/newticket

For the \n, read/write, or both? :o)

Both, if you prefer?
Mar 21 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

[snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.

Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)

Do it and let's test. Andrei
Mar 22 2007
parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

Do it and let's test.

you can try it right now with a Cout(line)("\n"); The option to eschew the chop is checked in also. You'll perhaps see from the Win32 tests that tango.io is pretty darned fast anyway?
Mar 22 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

Do it and let's test.

you can try it right now with a Cout(line)("\n"); The option to eschew the chop is checked in also. You'll perhaps see from the Win32 tests that tango.io is pretty darned fast anyway?

On my Linux box: import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { Cout(line)("\n"); } } 7.8s tcat Andrei
Mar 22 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

[snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.

Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)

Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is. Andrei
Mar 22 2007
next sibling parent reply kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[snip]
 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead. FWIW, tango.io could trivially be sped up significantly on this 'test' -- as it stands, the implementation is quite pedestrian in nature ;)
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead.

"Principle" I guess. That sounds great. My opinion in the matter is simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives the programmer full access to them. It would be only nice, if it does not cost too much, to not be gratuitously incompatible with them. That's all. If you want to take the other route, you better disable access to C's getchar et al.
 FWIW, tango.io could trivially be sped up significantly on this 'test' 
 -- as it stands, the implementation is quite pedestrian in nature ;)

The 'test' is not a 'test', it's a test deriving from my attempts to find the bottleneck in a real D program. Andrei
Mar 22 2007
parent kris <foo bar.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]

 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead.

"Principle" I guess.

Yep. A thousand pardons for my late night spelling mistake. I'll be sure to reciprocate in future also, if that would be helpful?
 That sounds great. My opinion in the matter is 
 simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives 
 the programmer full access to them. It would be only nice, if it does 
 not cost too much, to not be gratuitously incompatible with them. That's 
 all. If you want to take the other route, you better disable access to 
 C's getchar et al.

Yes, thanks for that option. It is certainly one approach that has been considered before, and a trivial one to implement. We'll probably cross that bridge when we reach it. It's worth noting, however, that Tango is focused for usage with D programs; not C BTW: the use of gratuitous here is wholly out of context; some might interpret the usage as an implication that Tango is based upon a whim ;)
 
 FWIW, tango.io could trivially be sped up significantly on this 'test' 
 -- as it stands, the implementation is quite pedestrian in nature ;)

The 'test' is not a 'test', it's a test deriving from my attempts to find the bottleneck in a real D program.

It's being referred to as a "benchmark" Andrei; I was trying to be somewhat less political by calling it a 'test'. Many pardons
Mar 22 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.

If I understand you correctly, you're saying that all IO packages must go through the standard C library so they stay in sync with the C IO routines? What is the point of read/write, ReadFile/WriteFile, etc, then? Sean
Mar 22 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.

If I understand you correctly, you're saying that all IO packages must go through the standard C library so they stay in sync with the C IO routines? What is the point of read/write, ReadFile/WriteFile, etc, then?

I think for stdio, going through the standard C library would be very advisable. If, on the other hand, a library chooses to implement a file abstraction not exposing FILE*, it could use whichever means. Andrei
Mar 22 2007
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off. Having to recode one's programs to speed them up is a big productivity sapper. (The most egregious examples of this are people forced to recode bits of their python/java/ruby app in C++.) What makes stdio so worth the effort to speed up is because the payoff is evident in 80-90% of the programs out there. Optimizing your own program speeds up only your own program - optimizing the library speeds everyone up. Tango doesn't need to be terribly, terribly slow to be a cause for concern. It only needs to be slower than C++/Perl/Java to be a problem, because then it is a convenient excuse for people to not switch to D. The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.
Mar 21 2007
next sibling parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off.

You could tell from this and my (almost identical) post that Walter's propaganda got me thoroughly brainwashed :o). Andrei
Mar 21 2007
prev sibling next sibling parent kris <foo bar.com> writes:
Walter Bright wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

[snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

Why not?

If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular

One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off. Having to recode one's programs to speed them up is a big productivity sapper. (The most egregious examples of this are people forced to recode bits of their python/java/ruby app in C++.) What makes stdio so worth the effort to speed up is because the payoff is evident in 80-90% of the programs out there. Optimizing your own program speeds up only your own program - optimizing the library speeds everyone up. Tango doesn't need to be terribly, terribly slow to be a cause for concern. It only needs to be slower than C++/Perl/Java to be a problem, because then it is a convenient excuse for people to not switch to D. The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.

tango.io is not even optimized for this case (unlike the new Phobos code), and yet it is still faster than all others once the flush() is removed? The earlier point is only that optimization can easily be premature and misguided; typically better to get a flexible and effective design instead. This should not have given anyone cause to assume, assert, or imply that tango is in any way inefficient -- apparently that needs to be clarified ;) For the record, my perspective of "terribly, terribly slow" is pretty much where C++ landed in this particular case
Mar 21 2007
prev sibling parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:

[snip]

 The conventional wisdom with C++ is that:
 
 1) C++ code is inherently faster than in any other language
 2) iostream has a great design
 3) iostream is uber fast because it uses templates to inline everything
 
 Andrei's benchmark blows that out of the water. Even interpreted Perl
 beats the pants off of C++ iostreams.

This kind of simplistic bashing of a language or library design based on testing of some unnamed implementation(s) of that library doesn't give D a good image. There are other benchmarks that show C++ IOStreams beating C's stdio on performance. Those are also meaningless out of context. There are real issues with some of the design of IOStreams. There are very real problems with many implementations of IOStreams. There are also good implementations that perform pretty well, but overall IOStreams is not widely viewed in the C++ community as "having a great design", just as having a design that's OK, and a lot safer and more cleanly extensible than C's stdio. Of course C++ code isn't inherently faster than any other language, and I've not come across anyone saying that it is. And one of the main problems with IOStreams is that it makes excessive use of virtual functions in ways that inhibit inlining, particularly in typical implementations which drag in locale support even for programs that do not use it. The C++ community recognizes these problems, and the C++ committee has addressed some of them (through exposition) in its Technical Report on C++ performance. I'm at a loss to understand why you would write what you did. It seems to be a straw man, but maybe there was something else to it -- frustration that people assume that D must be slower than C++? -- James
Mar 21 2007
next sibling parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 Walter Bright wrote:
 
 [snip]
 
 The conventional wisdom with C++ is that:

 1) C++ code is inherently faster than in any other language
 2) iostream has a great design
 3) iostream is uber fast because it uses templates to inline everything

 Andrei's benchmark blows that out of the water. Even interpreted Perl
 beats the pants off of C++ iostreams.

This kind of simplistic bashing of a language or library design based on testing of some unnamed implementation(s) of that library doesn't give D a good image. There are other benchmarks that show C++ IOStreams beating C's stdio on performance. Those are also meaningless out of context.

For the record, I used gcc 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5).
 There are real issues with some of the design of IOStreams.
 There are very real problems with many implementations of
 IOStreams.  There are also good implementations that
 perform pretty well, but overall IOStreams is not widely
 viewed in the C++ community as "having a great design",
 just as having a design that's OK, and a lot safer and
 more cleanly extensible than C's stdio.
 
 Of course C++ code isn't inherently faster than any other
 language, and I've not come across anyone saying that it
 is.  And one of the main problems with IOStreams is that
 it makes excessive use of virtual functions in ways that
 inhibit inlining, particularly in typical implementations
 which drag in locale support even for programs that do
 not use it.  The C++ community recognizes these problems,
 and the C++ committee has addressed some of them (through
 exposition) in its Technical Report on C++ performance.
 
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

I don't know why he wrote that, but my perception is that iostreams have always been "on the verge of an efficient implementation" for eight years now. What I've seen repeatedly year after year whenever I sat down to run a test was performance that make iostream practically unusable for any serious coding. I'd be faster at moving molasses upstream on a cold day. I am amazed how iostreams managed to maintain this clout for so long. If they were a guy, I'd love to know his trick. :o) Andrei
Mar 22 2007
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
James Dennett wrote:
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have: 1) written bad C++ code 2) lied 3) used a sabotaged C++ compiler 4) written some magic optimization that only works on that carefully crafted benchmark So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true. I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast. I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.
Mar 22 2007
parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 James Dennett wrote:
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have: 1) written bad C++ code 2) lied 3) used a sabotaged C++ compiler 4) written some magic optimization that only works on that carefully crafted benchmark So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true. I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast. I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.

Good answer. (Yes, seriously.) It's certainly true that for code doing large amounts of I/O where performance was an issue, I've always avoided IOStreams in these situations; no implementation I've used has been anywhere near fast enough. IOStreams is also a pain where robustness is required. It's most useful for simple tools that are used in tame environments. The last time I had to get out a profiler to optimize C++ code, it turned out to mostly be an exercise in avoiding (a) a terribly inefficient implementation of std::string, and (b) a mind-bogglingly inefficient implementation of strftime. Which I guess illustrates how important it is that the out-of-the-box, natural ways to write code should have performance that is not too far removed from optimal. It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today. -- James
Mar 22 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
James Dennett wrote:
 Walter Bright wrote:

 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.

I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true. --bb
Mar 23 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:

 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.
 I think there is a tendency to assume that APIs and languages which have 
 (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that would 
 be the case.  Unfortunately, it's all too often just not true.

I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.
Mar 23 2007
parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:

 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.

Which "wrong" assertions are those?
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables? -- James
Mar 24 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:
 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.

Which "wrong" assertions are those?
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?

cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o). Btw, does tango provide such a guarantee for code such as Cout(a)(b)? From the construct, my understanding is that it doesn't. Andrei
Mar 24 2007
next sibling parent reply James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:

[snip]
 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.  Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.  Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).

As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone. It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream, or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level? Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization. Which is useful for debugging or simplistic logging,but not for anything else I've seen. (IOStreams has this wrong, in different ways: it's not just the order of output that's ill-defined if a stream is used concurrently across multiple threads. Nasal demons are also possible, I hear.)
 Btw, does tango provide such a guarantee for code such as Cout(a)(b)?
 From the construct, my understanding is that it doesn't.

I'll leave that for the Tango experts to answer. -- James
Mar 24 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.

In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream). D's implementation of writef does the same thing. D's writef also wraps the whole thing in a try-finally, making it exception safe. Iostreams' cout << a << b; results in the equivalent of: (cout->out(a))->out(b); The trouble is, there's no place to hang the lock acquire/release, nor the try-finally. It's a fundamental design problem.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

It does exactly one lock acquire/release for each printf, not for each character written.
 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?
 
 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.
 
 (IOStreams has this wrong, in different ways: it's not
 just the order of output that's ill-defined if a stream
 is used concurrently across multiple threads.  Nasal
 demons are also possible, I hear.)

Mar 24 2007
next sibling parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.

In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).

That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.
 D's implementation of writef does the same thing. D's writef also wraps
 the whole thing in a try-finally, making it exception safe.
 
 Iostreams'
     cout << a << b;
 results in the equivalent of:
     (cout->out(a))->out(b);
 The trouble is, there's no place to hang the lock acquire/release, nor
 the try-finally. It's a fundamental design problem.

There's a place: locked(cout) << a << b; can be made do the job, using RAII to lock at the start of the expression and unlock at the end.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

It does exactly one lock acquire/release for each printf, not for each character written.

Right. I certainly did not intend to imply that any serious design would be silly enough to lock for each character written (which would be fairly useless synchronization in any case).
 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.

Most libraries ought not to do so; coding dependencies on globals into libraries is generally poor design. The problem is not that users would have to write synchronization. Usually they need to do that. A problem would be if some low-level locking inside the I/O subsystems gave the impression that the user did *not* need to synchronize their own code. It's not quite as simple as this. One (possibly killer) argument for building synchronization into low-level libraries is to reduce the cost of dealing with support issues from bemused users who expected not to have to consider thread-safety when sharing streams between threads. -- James
Mar 24 2007
next sibling parent Walter Bright <newshound digitalmars.com> writes:
James Dennett wrote:
 Walter Bright wrote:
 That would be true, except that Andrei wrote that
 the guarantee applied to separate processes, and
 that can only be guaranteed if you both use some
 kind of synchronization between the processes *and*
 flush the stream.
 
 Andrei's claim went beyond mere thread-safety, and
 that was what I responded to.

Ok, but since it is typical to do a flush on newline if isatty(), that seems to resolve these inter-process problems.
 There's a place:
 
 locked(cout) << a << b;
 
 can be made do the job, using RAII to lock at the
 start of the expression and unlock at the end.

I don't think it is that easy, see: http://docs.sun.com/source/819-3690/Multithread.html and http://www.atnf.csiro.au/computing/software/sol2docs/manuals/c++/lib_ref/MT.html
 Right.  I certainly did not intend to imply that any
 serious design would be silly enough to lock for each
 character written (which would be fairly useless
 synchronization in any case).

It's needed if only to avoid corrupting the I/O buffer itself.
 The problem is such synchronization would be invented and added on by
 the user, making it impossible to combine disparate libraries that write
 to stderr, for example, in a multithreading environment.

on globals into libraries is generally poor design.

I think it is unreasonable to tell users they cannot use standard cin/cout/cerr in standard ways in their library code.
 The problem is not that users would have to write
 synchronization.  Usually they need to do that. A
 problem would be if some low-level locking inside
 the I/O subsystems gave the impression that the
 user did *not* need to synchronize their own code.
 
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

I think it is a killer argument. Multithreaded programming is hard enough without heaping more burdens on the user.
Mar 24 2007
prev sibling next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

time it's used, I'd guess that it's unusable for performance reasons alone.

(you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).

That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.

Lines don't have to appear at exact times, they only must not interleave. So printf does not have to flush often. I've used printf-level atomicity for a long time on various systems and it works perfectly. Is a system-dependent assumption? I don't know. It sure is there and is very helpful on all systems I used it with. Andrei
Mar 24 2007
parent James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

time it's used, I'd guess that it's unusable for performance reasons alone.

(you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).

That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.

Lines don't have to appear at exact times, they only must not interleave. So printf does not have to flush often. I've used printf-level atomicity for a long time on various systems and it works perfectly.

With sufficiently short lines, where the value of "sufficiently" depends on which platform and which kind of file descriptor you're writing to. printf is likely to end up calling write with no locking; write isn't atomic past a certain (or uncertain) size, and has no reason to make the boundary coincide with the end of a line.
 Is a system-dependent assumption? I don't know. It sure is there and is
 very helpful on all systems I used it with.

Can you name one specific system where this is documented as working reliably, or where it can be shown to do so? I've *seen* interleaving between processes, and lived with it in debugging code for performance reasons, but for reliable output have used other mechanisms. I understood this to be a widely known problem with printf, write et al. -- James
Mar 24 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
James Dennett wrote:
 
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

...since they obviously don't have to consider thread-safety when sharing other objects between threads. I'll admit that a global output object might be seen as somehow magic to those who don't really understand what 'cout' represents, for example, how much of a problem would this really be? The argument against building locking into C++ containers seems fairly well-settled, so why does there seem to be so much contention about output? Is it that producing predictable behavior is easier or that the cost of locking is less of an issue since IO is expensive anyway? Sean
Mar 24 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 James Dennett wrote:
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

...since they obviously don't have to consider thread-safety when sharing other objects between threads. I'll admit that a global output object might be seen as somehow magic to those who don't really understand what 'cout' represents, for example, how much of a problem would this really be? The argument against building locking into C++ containers seems fairly well-settled, so why does there seem to be so much contention about output? Is it that producing predictable behavior is easier or that the cost of locking is less of an issue since IO is expensive anyway?

Good question(s). Might be also that I/O interface is considerably simpler than container interface. The classic example of failure of method-level synchronization with containers is if (!cont.empty()) cont.pop(); With I/O, most of the time, covert synchronization at the call level is all you need. Andrei
Mar 24 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 
 D's implementation of writef does the same thing. D's writef also wraps 
 the whole thing in a try-finally, making it exception safe.
 
 Iostreams'
     cout << a << b;
 results in the equivalent of:
     (cout->out(a))->out(b);
 The trouble is, there's no place to hang the lock acquire/release, nor 
 the try-finally. It's a fundamental design problem.

The stream could acquire a lock and pass it to a proxy object which closes the lock on destruction. This would work fine in C++ where the lifetime of such objects is deterministic, but the design is incredibly awkward.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

It does exactly one lock acquire/release for each printf, not for each character written.

This is still far too granular for most uses. About the only time I actually use output without explicit synchronization are for throw-away debug output.
 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.

This is a valid point, but how often is it actually used in practice? Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.


Exactly. Sean
Mar 24 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 It does exactly one lock acquire/release for each printf, not for each 
 character written.


I disagree. It's been working fine for nearly 20 years now. gcc implements it the same way, and it's hardly unusable for most uses.
 The problem is such synchronization would be invented and added on by 
 the user, making it impossible to combine disparate libraries that 
 write to stderr, for example, in a multithreading environment.

Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.

It apparently comes up often enough in C++ to merit 59,000 hits on "multithreaded iostreams" and many web pages outlining attempts to solve the problem. It is a problem that is solved by every C stdio for multithreaded environments, although the C standard does not mention the word "thread". Multithreading threatens to become far more common, not less, as we move to multicore machines. If that isn't compelling, ok, but I suggest at a minimum that Tango not lock into a design that *precludes* adding thread synchronization without changing user code.
Mar 24 2007
parent Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 
 The problem is such synchronization would be invented and added on by 
 the user, making it impossible to combine disparate libraries that 
 write to stderr, for example, in a multithreading environment.

Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.

It apparently comes up often enough in C++ to merit 59,000 hits on "multithreaded iostreams" and many web pages outlining attempts to solve the problem.

True enough. Though I wonder how much of a factor it is that C++ has no built-in support for multithreading, and if this has a positive or negative effect on the number of questions.
 It is a problem that is solved by every C stdio for multithreaded 
 environments, although the C standard does not mention the word "thread".
 
 Multithreading threatens to become far more common, not less, as we move 
 to multicore machines.
 
 If that isn't compelling, ok, but I suggest at a minimum that Tango not 
 lock into a design that *precludes* adding thread synchronization 
 without changing user code.

True enough. I suppose that if nothing else, the option for synchronized output to stdout, stderr, and stdlog should be somehow available without user changes, as you say. Sean
Mar 25 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

Numbers clearly tell the above is wrong. Here's the thing: I write programs that write lines to files. If I use cout, they don't work. If I use fprintf, the do work, and 10 times faster. And that's that.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

The C standard library takes care of it without me having to do anything in particular.
 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.

I do not concur. Andrei
Mar 24 2007
parent reply James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

Numbers clearly tell the above is wrong.

Only if they apply to the above.
 Here's the thing: I write
 programs that write lines to files. If I use cout, they don't work. If I
 use fprintf, the do work, and 10 times faster. And that's that.

Except that your test wasn't of the right thing; you probably didn't test code that guaranteed atomicity of writes between different processes.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

The C standard library takes care of it without me having to do anything in particular.

I've never seen a C library that guarantees atomicity of writes between processes on a Unix-like system. The documentation of some systems does guarantee atomicity of sufficiently small writes to certain types of file descriptors, but I've not seen any Unix-like system that guarantees atomicity for writes of unlimited sizes; in some cases they can even be interrupted before the full amount is written. I've certainly seen the result of C's *printf *not* being synchronized between processes on a wide variety of systems.
 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.

I do not concur.

With my description of my own experience? ;) -- James
Mar 24 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.


Only if they apply to the above.
 Here's the thing: I write
 programs that write lines to files. If I use cout, they don't work. If I
 use fprintf, the do work, and 10 times faster. And that's that.

Except that your test wasn't of the right thing; you probably didn't test code that guaranteed atomicity of writes between different processes.
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

in particular.

I've never seen a C library that guarantees atomicity of writes between processes on a Unix-like system. The documentation of some systems does guarantee atomicity of sufficiently small writes to certain types of file descriptors, but I've not seen any Unix-like system that guarantees atomicity for writes of unlimited sizes; in some cases they can even be interrupted before the full amount is written. I've certainly seen the result of C's *printf *not* being synchronized between processes on a wide variety of systems.

If you did, fine. I take that part of my argument back. I'll also note that that doesn't make iostreams any more defensible :o). Andrei
Mar 24 2007
parent James Dennett <jdennett acm.org> writes:
Andrei Alexandrescu (See Website For Email) wrote:

[snip]

 I'll also note
 that that doesn't make iostreams any more defensible :o).

Trying to defend IOStreams is certainly a challenge. I think I've tried enough, given what a sick puppy it is, and now should leave it to suffer in peace. -- James
Mar 24 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:
 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.

Which "wrong" assertions are those?
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?

cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).

stringstream s; s << a << b; cout << s.str(); ;-)
 Btw, does tango provide such a guarantee for code such as Cout(a)(b)? 
  From the construct, my understanding is that it doesn't.

No. There really isn't any way to do automatic locking with chained opCall barring the use of proxy objects or something equally nasty. Also, it hurts efficiency to always lock regardless of whether the user is performing IO in multiple threads. The preferred method here is: synchronized( Cout ) Cout( a )( b )( c )(); Sean
Mar 24 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 There really isn't any way to do automatic locking with chained 
 opCall barring the use of proxy objects or something equally nasty. 
 Also, it hurts efficiency to always lock regardless of whether the user 
 is performing IO in multiple threads.  The preferred method here is:
 
 synchronized( Cout )
     Cout( a )( b )( c )();

The trouble with that design is people working on subsystems or libraries, which will be combined by others into a working whole. Since it is extra work to add the synchronized statement, odds are pretty good it won't happen. Then, the whole gets erratic multithreading performance. Ideally, things should be inverted so that thread safety is the default behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the extra work. One way to solve this problem is to use variadic templates as outlined in http://www.digitalmars.com/d/variadic-function-templates.html Back in the early days of Windows NT, when multithreaded programming was introduced to a mass platform, C compilers typically shipped with two runtime libraries - a single threaded one "for efficiency", and a multithreaded one. Also, to do multithreaded code, one had to predefine _MT or throw a command line switch. Inevitably, this was overlooked, and endless bugs consumed endless time. I made the decision early on to only ship threadsafe libraries, and have _MT always on. I've never regretted it, I'm sure it saved me a lot of tech support time, and avoided the perception that the compiler didn't work with multithreading.
Mar 24 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 There really isn't any way to do automatic locking with chained opCall 
 barring the use of proxy objects or something equally nasty. Also, it 
 hurts efficiency to always lock regardless of whether the user is 
 performing IO in multiple threads.  The preferred method here is:

 synchronized( Cout )
     Cout( a )( b )( c )();

The trouble with that design is people working on subsystems or libraries, which will be combined by others into a working whole. Since it is extra work to add the synchronized statement, odds are pretty good it won't happen. Then, the whole gets erratic multithreading performance. Ideally, things should be inverted so that thread safety is the default behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the extra work. One way to solve this problem is to use variadic templates as outlined in http://www.digitalmars.com/d/variadic-function-templates.html Back in the early days of Windows NT, when multithreaded programming was introduced to a mass platform, C compilers typically shipped with two runtime libraries - a single threaded one "for efficiency", and a multithreaded one. Also, to do multithreaded code, one had to predefine _MT or throw a command line switch. Inevitably, this was overlooked, and endless bugs consumed endless time. I made the decision early on to only ship threadsafe libraries, and have _MT always on. I've never regretted it, I'm sure it saved me a lot of tech support time, and avoided the perception that the compiler didn't work with multithreading.

MS does the same now if I remember correctly: all of its libraries are MT by default. I agree with Walter's sentiment that Cout(a)(b) is a design mistake. Fortunately, now we have compile-time variadic functions, which will make it easy to correct the design - Cout(a, b) can be made just as good without having to chase typeinfo's at runtime. Andrei
Mar 24 2007
parent Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Back in the early days of Windows NT, when multithreaded programming 
 was introduced to a mass platform, C compilers typically shipped with 
 two runtime libraries - a single threaded one "for efficiency", and a 
 multithreaded one. Also, to do multithreaded code, one had to 
 predefine _MT or throw a command line switch. Inevitably, this was 
 overlooked, and endless bugs consumed endless time. I made the 
 decision early on to only ship threadsafe libraries, and have _MT 
 always on. I've never regretted it, I'm sure it saved me a lot of tech 
 support time, and avoided the perception that the compiler didn't work 
 with multithreading.

MS does the same now if I remember correctly: all of its libraries are MT by default.

Yup. In fact, I just discovered that Visual Studio 2005 doesn't even provide a single-threaded build option any more. In some ways it's a relief because it's allowed me to drop two build options and remove a bunch of #if defined(_MT) clauses.
 I agree with Walter's sentiment that Cout(a)(b) is a design mistake. 
 Fortunately, now we have compile-time variadic functions, which will 
 make it easy to correct the design - Cout(a, b) can be made just as good 
 without having to chase typeinfo's at runtime.

Agreed. Sean
Mar 24 2007
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.


gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment.

Note the reliance here on global state that is neither thread nor exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);
 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

That's not an excuse, as 1) multithreading was common long before C++98 was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.
 Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

All I can do is point to the example above.
Mar 24 2007
next sibling parent reply James Dennett <jdennett acm.org> writes:
Walter Bright wrote:
 James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.


gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.

I've seen only a minority of those claims made as part of the C++ committee discussions of GC. However: GC *is* often used as a crutch by programmers who cannot or do not want to take time to make a design in which ownership is clear. GC is unsuitable for *some* types of mission critical applications. These are true. It's also true that: Effective use of GC is not restricted to lazy/sloppy/ less capable programmers, and can be used by experts to produce software that is more reliable in certain ways. GC is suitable for some types of mission critical applications. GC can affect performance, either positively or negatively. GC can affect memory footprint. Working with GC can cause resource management issues because many programmers are often tempted to think less carefully about these issues when there is a garbage collector to mitigate some of the damage. Almost all discussions of the pros and cons of GC are simplistic and unbalanced.
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment.

Note the reliance here on global state that is neither thread nor exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);

True, exception-safety is an issue. There is not a threading issue in the code above *except* that it uses a global variable without synchronization; you explicitly coded reliance on global state, by using a global variable. Unfortunately that's easily done with the IOStreams interface.
 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

That's not an excuse, as 1) multithreading was common long before C++98 was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.

I wasn't aiming to make an excuse. I was merely noting that it's not surprising. IOStreams was old before the 1998 standard was published; this was a case of the standards committee doing what it was supposed to do, i.e., standardizing existing practice.
 Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

All I can do is point to the example above.

I see exception-safety issues, but no threading issue apart from *if* your code fails to synchronize access to a global variable. So far as I can tell, there are not thread-safety issues unless multiple threads share a stream without synchronization (which is just as much of a defect as if they shared a container without synchronization). Automatic synchronization tends to be at the wrong level, just as in the case of containers etc. Most often in robust code it's redundant to make a stream synchronize itself. Anyway, I was just hoping to find out something I didn't already know. One thing we do know is that IOStreams is not the gold standard for I/O interfaces, though it does have strengths in extensibility and type-safety compared to the alternatives in most C-like languages. -- James
Mar 24 2007
parent Walter Bright <newshound digitalmars.com> writes:
James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.


mission critical industrial apps, gc is for academic unusable languages, etc.

I've seen only a minority of those claims made as part of the C++ committee discussions of GC.

I think we're in agreement, as I said "one or two", and that such claims are not made in general by the top tier of C++ programmers.
 Almost all discussions of the pros and cons of GC are
 simplistic and unbalanced.

It's not humanly possible to mention every pro and every con in every discussion. Nobody is making a claim of absolutes, either. For every example, sure, you can find a counter-example. That doesn't mean one cannot have a meaningful discussion about the pros and cons of adding gc, and it doesn't mean we can't dismiss certain arguments against gc, like it being a crutch for lazy programmers.
 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.

exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);

True, exception-safety is an issue. There is not a threading issue in the code above *except* that it uses a global variable without synchronization; you explicitly coded reliance on global state, by using a global variable. Unfortunately that's easily done with the IOStreams interface.

That's the design of iostreams - reliance on global state with no multithreading protection. Using std::left is not a mistake on my part, it is a feature of iostreams. Also, cout << a << b; has multithreading problems as well, as if two threads are writing to stdout, the output of a and b can be interleaved with the other thread's output. Note that: writefln(a, b); is both exception safe and thread safe - there will be no interleaving of output.
 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.

I wasn't aiming to make an excuse. I was merely noting that it's not surprising. IOStreams was old before the 1998 standard was published; this was a case of the standards committee doing what it was supposed to do, i.e., standardizing existing practice.

Iostreams was substantially redesigned for C++98. Iostreams has undergone two major, incompatible overhauls since it originally debuted. You can see the old ones in DMC++'s <iostream.h> and <oldstr/stream.h>.
 I see exception-safety issues, but no threading issue
 apart from *if* your code fails to synchronize access
 to a global variable.  So far as I can tell, there are
 not thread-safety issues unless multiple threads share
 a stream without synchronization (which is just as
 much of a defect as if they shared a container without
 synchronization).

You can use C's stdio and D's stdio (and even mix them) without exception safety problems or need for the user to supply any synchronization.
 Automatic synchronization tends to be at the wrong
 level, just as in the case of containers etc.  Most
 often in robust code it's redundant to make a stream
 synchronize itself.
 
 Anyway, I was just hoping to find out something I
 didn't already know.  One thing we do know is that
 IOStreams is not the gold standard for I/O interfaces,
 though it does have strengths in extensibility and
 type-safety compared to the alternatives in most
 C-like languages.

I agree it has strengths in extensibility and type-safety. But I set that against its poor performance, exception unsafety, and threading problems, and conclude it is not a design that should be emulated.
Mar 24 2007
prev sibling parent reply 0ffh <spam frankhirsch.net> writes:
Walter Bright wrote:
 Which "wrong" assertions are those?

mission critical industrial apps, gc is for academic unusable languages, etc.

I admit I used to think similar to that, a somewhat longer while ago. What made me change my mind was that Greenspun's Tenth Rule also includes GC: I find that doing the dynamic memory management myself results not only in bigger and more fragile source code, but also may perform worse than GC unless I go about it very warily. I think it is just not efficient to put a lot of work into that with every application - it's much more efficient if somebody solves the problem *once*, and properly, and that's that. Happy hacking, Frank p.s. Thanks for your work... ;-)
Mar 25 2007
parent Dan <murpsoft hotmail.com> writes:
 Walter Bright wrote:
 Which "wrong" assertions are those?

mission critical industrial apps, gc is for academic unusable languages, etc.


0ffh Wrote:
 I admit I used to think similar to that, a somewhat longer while ago.
 What made me change my mind was that Greenspun's Tenth Rule also
 includes GC: I find that doing the dynamic memory management myself
 results not only in bigger and more fragile source code, but also
 may perform worse than GC unless I go about it very warily.
 
 I think it is just not efficient to put a lot of work into that
 with every application - it's much more efficient if somebody
 solves the problem *once*, and properly, and that's that.

I totally agree that GC is a solid way of cutting bad code, which performs far worse than the usually trivial overhead of having a GC. I do think though that it should be somewhat easier to declare something as not being under the gc's influence so that when we want to be wary and we're scratching for an extra 10% performance in a loop, we can do so more readily. ~~ At first I was astonished to see my 26kb source compiled to a whopping 82kb. I was wondering if it imported all of phobos... Now I've realized that that extra mass did all the dynamic array stuff, associative array stuff, gc and phobos. Things that would have taken me just as much in source to write...
Mar 26 2007
prev sibling next sibling parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 
 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.

And exactly how often do people need to write this program? I would have thought that the need to exactly reproduce the input is kind of rare, because most programs read stuff to manipulate or deduce things from it, and not to replicate it.
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:
 
 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }
 
 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }
 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

And yet my code nearly always looks like ... line = trim_right(readln()); because I then have to parse the data contained in the line and white space (blank, tab and new line) at the end of a line is just usually cruft. On the other hand, as I have to trim the line anyhow, I guess it doesn't matter if the routine ensures a new line or not. Another interesting twist is that some text files omit the new-line on the last line in the file.
 Ever tried to do cin or fscanf? You can't do any intelligent input with 
 them because they skip whitespace and newlines like it's out of style. 
 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

I conclude that we tend to write different types of apps. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 10:55:43 AM
Mar 21 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
 Email) wrote:
 
 Can you distill the benefits of retaining CR on a readline, please?

point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.

And exactly how often do people need to write this program? I would have thought that the need to exactly reproduce the input is kind of rare, because most programs read stuff to manipulate or deduce things from it, and not to replicate it.

Of course. It's not about reproducing the input exactly, but about having all of the information in the input available to the program.
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }

I said that the API would force people to write convoluted code if it wanted to offer char[] readln(). Consequently, your code is buggy in the likely case io.readln overwrites its buffer, which is mute testimony to the validity of my point :o).
 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

And yet my code nearly always looks like ... line = trim_right(readln());

I often do that too. And I'm glad I can remove information I don't need, because clearly I couldn't add back information I've lost. It should be pointed out that my point generalizes to more than newlines. I plan to add to phobos two routines that efficiently and atomically implement the following: read_delim(FILE*, char[] buf, dchar delim); and read_delim(FILE*, char[] buf, char delim[]); For such functions, particularly the last one, it is vital that the delimiter is KEPT in the resulting buffer. Andrei
Mar 21 2007
parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Wed, 21 Mar 2007 17:21:40 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 Derek Parnell wrote:
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }

I said that the API would force people to write convoluted code if it wanted to offer char[] readln(). Consequently, your code is buggy in the likely case io.readln overwrites its buffer, which is mute testimony to the validity of my point :o).

Actually you said "stdio also offers a readln() that creates a new line on every call" and so does my fictious "io.readln(line)". It can not overwrite its buffer because it creates the buffer. io.Status readln(out char[] pBuffer) { pBuffer.length = io.FirstGuessLength; // Note: This routine expand/contracts the buffer as required. fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer); // If I get this far then the low-level I/O system didn't fail me. return io.Success; }
 It should be pointed out that my point generalizes to more than 
 newlines. I plan to add to phobos two routines that efficiently and 
 atomically implement the following:
 
 read_delim(FILE*, char[] buf, dchar delim);

 and
 
 read_delim(FILE*, char[] buf, char delim[]);
 
 For such functions, particularly the last one, it is vital that the 
 delimiter is KEPT in the resulting buffer.

And that would be because it stops at the leftmost 'delim' that is contained in "char[] delim" so the caller needs to know which one stopped the input stream? I presume that this would support Unicode characters too? -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 11:26:34 AM
Mar 21 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 Actually you said "stdio also offers a readln() that creates a new line on
 every call" and so does my fictious "io.readln(line)".  It can not
 overwrite its buffer because it creates the buffer. 
 
   io.Status readln(out char[] pBuffer)
   {
      pBuffer.length = io.FirstGuessLength;
      
      // Note: This routine expand/contracts the buffer as required.
      fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer);
 
      // If I get this far then the low-level I/O system didn't fail me.
      return io.Success;
   }

Fine. It's just not clear what readln does from its signature. In contrast, stdio offers size_t readln(char[]) and char[] readln(), with clear semantics.
 It should be pointed out that my point generalizes to more than 
 newlines. I plan to add to phobos two routines that efficiently and 
 atomically implement the following:

 read_delim(FILE*, char[] buf, dchar delim);

 and

 read_delim(FILE*, char[] buf, char delim[]);

 For such functions, particularly the last one, it is vital that the 
 delimiter is KEPT in the resulting buffer.

And that would be because it stops at the leftmost 'delim' that is contained in "char[] delim" so the caller needs to know which one stopped the input stream? I presume that this would support Unicode characters too?

It's the other way around: you read til the _last_ character of the delimiter, and you look back in the buffer. If the buffer has the delimiter as suffix, you're done. Otherwise, repeat (while appending to the buffer). This should work with Unicode streams too, although I'm not an expert in the matter. My point is that at end-of-file you may want to know whether the delimiter was correctly present, as is required in certain protocols. Andrei
Mar 21 2007
parent Derek Parnell <derek nomail.afraid.org> writes:
On Wed, 21 Mar 2007 17:57:51 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 Derek Parnell wrote:
 Actually you said "stdio also offers a readln() that creates a new line on
 every call" and so does my fictious "io.readln(line)".  It can not
 overwrite its buffer because it creates the buffer. 
 
   io.Status readln(out char[] pBuffer)
   {
      pBuffer.length = io.FirstGuessLength;
      
      // Note: This routine expand/contracts the buffer as required.
      fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer);
 
      // If I get this far then the low-level I/O system didn't fail me.
      return io.Success;
   }

Fine. It's just not clear what readln does from its signature. In contrast, stdio offers size_t readln(char[]) and char[] readln(), with clear semantics.

 read_delim(FILE*, char[] buf, char delim[]);



 It's the other way around: 

Right ... it was the "from its signature ... with clear semantics" that had me fooled.
 My point is that at end-of-file you may want to know whether the 
 delimiter was correctly present, as is required in certain protocols.

Yes. A very good point indeed. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 12:07:34 PM
Mar 21 2007
prev sibling next sibling parent reply Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[...]
 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there.

I suspect Walter was thinking on something else at the time.
 Essentially it's about information. The naive loop:
 
 while (readln(line)) {
   write(line);
 }

I'm completely against that awful mess of code.
 is guaranteed 100% to produce an accurate copy of its input. The version 
 that chops lines looks like:
 
 while (readln(line)) {
   writeln(line);
 }
 
 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

Are you sure? Can you elaborate more on this?
 This is the kind of imprecision that makes the
 difference between a well-designed API and an almost-good one. Moreover, 
 with the automated chopping it is basically impossible to write a 
 program that exactly reproduces its input because readln essentially 
 loses information.

Same here.
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:
 
 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.
 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.
 Ever tried to do cin or fscanf? You can't do any intelligent input with 
 them because they skip whitespace and newlines like it's out of style. 

I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.
 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

You obviously program only for one single platform. Being portable is way more complex than this. Ciao
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [...]
  >
 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there.

I suspect Walter was thinking on something else at the time.
 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

I'm completely against that awful mess of code.

What exactly would be bad about it?
 is guaranteed 100% to produce an accurate copy of its input. The 
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

Are you sure? Can you elaborate more on this?

Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.
  > This is the kind of imprecision that makes the
 difference between a well-designed API and an almost-good one. 
 Moreover, with the automated chopping it is basically impossible to 
 write a program that exactly reproduces its input because readln 
 essentially loses information.

Same here.
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.

Yes, wrong, very wrong. Except it's not me who's wrong :o).
 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is 
 another important point.) An API that uses automated chopping should 
 not offer such a function because an empty line may mean that an empty 
 line was read, or that it's eof time. So the API would force people to 
 write convoluted code.

What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

You are objectively wrong. The code is portable. Newline translation takes care of it. Just try it.
 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.

What can I say? Thanks! I'm enlightened!
 Ever tried to do cin or fscanf? You can't do any intelligent input 
 with them because they skip whitespace and newlines like it's out of 
 style. 

I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.

To each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).
 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

You obviously program only for one single platform. Being portable is way more complex than this.

Yep, I saw that :o). Andrei
Mar 22 2007
parent Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:

 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

I'm completely against that awful mess of code.

What exactly would be bad about it?

It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code.
 is guaranteed 100% to produce an accurate copy of its input. The 
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

Are you sure? Can you elaborate more on this?

Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.

A newline is two bytes here.
 Moreover, with the automated chopping it is basically impossible to 
 write a program that exactly reproduces its input because readln 
 essentially loses information.



A text file is not a binary file. A newline at end of file is completely irrelevant. On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others.
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.

Yes, wrong, very wrong. Except it's not me who's wrong :o).

Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computing
 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is 
 another important point.) An API that uses automated chopping should 
 not offer such a function because an empty line may mean that an 
 empty line was read, or that it's eof time. So the API would force 
 people to write convoluted code.

What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

You are objectively wrong.

Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); }
 The code is portable. Newline translation 
 takes care of it. Just try it.

Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves.
 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.

What can I say? Thanks! I'm enlightened!

You'll be more enlightened if you had to work with big CGI scripts written in Perl, and eventually had to convert them to JSP to make the average (available) programmers able to work on them. Sure, with Perl you can do many things in less than 10 lines. But keep it less than 10 lines, or you are in troubles.
 Ever tried to do cin or fscanf? You can't do any intelligent input 
 with them because they skip whitespace and newlines like it's out of 
 style. 

I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.

To each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).

scanf(" :%d", &i); Ciao
Mar 23 2007
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email)
<SeeWebsiteForEmail erdani.org> wrote:

 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln(). 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings. 3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway. 4) it's much easier to add a line ending than to remove it. Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For
Email) <SeeWebsiteForEmail erdani.org> wrote:
 
 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().

I suppose it is a little, but I think that's more an issue with text IO in general; for instance, even *if* readln discarded the line ending, readln and writeln wouldn't be symmetric anyway! If you expect them to be, then you're in for a nasty surprise :P
 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

Well, that's Pascal/Delphi/etc., not D.
 3) in my personal experience (of a number of smaller and larger console
applications), it's much more often that I need to work with the contents of
lines (without line endings), rather than with. If you need to copy data while
preserving line endings, I would recommend using binary buffers for files - and
I've no idea why would you use standard input/output for binary data anyway.

That's a valid point; I rarely need the line endings, that said, see [1] :)
 4) it's much easier to add a line ending than to remove it.

Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy. What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?
 Based on the above reasons, I would like to suggest to let readln() chop line
endings, and perhaps have another function (getline?) which keeps them.

[1] There have been a few times I've needed the line-ending, and it's a major pain when your IO library simply refuses to give it to you. It should be that the call gives you the whole line *including* line-endings, but since stripping the line of its ending is so common there should be either another function to do that, or a nice shortcut to get it done. Maybe we need readln and readlt for "read line and trim"... </2c> -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 22 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com>
wrote:

 4) it's much easier to add a line ending than to remove it.

Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy.

I was actually talking about the complexity of the source, not the efficiency of the generated code. When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF) You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.
 What's more, how can you be sure there was a line-ending there at all?
 What if it's the last line and it didn't have a line ending before EOF?

IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :) -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
next sibling parent Daniel Keep <daniel.keep.lists gmail.com> writes:
Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com>
wrote:
 
 4) it's much easier to add a line ending than to remove it.

string. *Adding* a line ending could involve a heap allocation, at least a full copy.

I was actually talking about the complexity of the source, not the efficiency of the generated code. When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF) You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.

import std.string; auto line = readln().chomp(); :)
 What's more, how can you be sure there was a line-ending there at all?
 What if it's the last line and it didn't have a line ending before EOF?

IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)

-- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 22 2007
prev sibling parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Vladimir Panteleev wrote:

 When readln gives you the line with a line ending, you have three cases:
 1) a CR/LF line ending (Windows)
 2) LF line ending (Unix)
 3) no line ending at all (EOF)

Actually it is even four: 4) CR line ending (Mac) But that's just for files coming from the old Mac OS (9), normally Mac OS X uses Unix linefeeds for line endings... --anders
Mar 22 2007
parent Roberto Mariottini <rmariottini mail.com> writes:
Anders F Bj=C3=B6rklund wrote:
 Vladimir Panteleev wrote:
=20
 When readln gives you the line with a line ending, you have three case=


 1) a CR/LF line ending (Windows)
 2) LF line ending (Unix)
 3) no line ending at all (EOF)

Actually it is even four: 4) CR line ending (Mac) =20 But that's just for files coming from the old Mac OS (9), normally Mac OS X uses Unix linefeeds for line endings...

I have some of these also. Legacy applications are not the most, but=20 they work, and for me that's it. Ciao
Mar 22 2007
prev sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For
Email) <SeeWebsiteForEmail erdani.org> wrote:
 
 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().

"Read a line. Write what you've read. Rinse. Lather. Repeat."
 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

That's a mistake, simple as that. Pascal has made many other similar mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html.
 3) in my personal experience (of a number of smaller and larger console
applications), it's much more often that I need to work with the contents of
lines (without line endings), rather than with. If you need to copy data while
preserving line endings, I would recommend using binary buffers for files - and
I've no idea why would you use standard input/output for binary data anyway.

I understand that. But again, getting rid of information when you have it is a much better proposition than regaining information when you irremediably lost. Think that a file is produced by a utility or transmission that sends messages separated by a single-char or multi-char separator. If your reading primitive omits the separator, you don't know whether the last line is a fragment of a broken transmission or a valid line. "Just call chomp."
 4) it's much easier to add a line ending than to remove it.

It's been already said: it's cheaper to remove it in all circumstances.
 Based on the above reasons, I would like to suggest to let readln() chop line
endings, and perhaps have another function (getline?) which keeps them.

For more balkanization, cognitive load, and confusion? "Just call chomp." Andrei
Mar 22 2007
next sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Thu, 22 Mar 2007 18:14:14 +0200, Andrei Alexandrescu (See Website For Email)
<SeeWebsiteForEmail erdani.org> wrote:

 Vladimir Panteleev wrote:
 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

That's a mistake, simple as that. Pascal has made many other similar mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html.

<offtopic> That article has been written a quarter of a century ago, and doesn't really represent the state of the latest Pascal versions/implementations out there (the most prominent being Borland Delphi and FreePascal). That said, switching from Pascal to D is still quite a great experience for me nevertheless. </offtopic>
 "Just call chomp."

Ah, yes, missed that one. <nitpick> But even so, you'd have to check for line endings twice - when reading the stdin stream, and when calling chomp ;) </nitpick> -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
prev sibling parent Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[...]
 "Just call chomp."

Just add a call to chomp to your benchmarks. Ciao
Mar 23 2007
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
kris wrote:
 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
 what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Here's the new std.stdio work in progress (doesn't yet include write()). Free free to leverage it as you see fit for Tango. Some features of note: 1) It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio. 2) It throws on I/O errors. 3) Unlike C's stdio, it can handle streams of either wide or regular chars. 4) It does not go as far as directly using Posix read/write functions or Windows API functions. We wished to avoid that in the interests of interoperability with C's stdio. 5) It is fully interoperable with, and is synced with, C's stdio. 6) Note how nicely scope(exit) makes the code more readable! ---------------------------------------- // Written in the D programming language. /* Written by Walter Bright and Andrei Alexandrescu * www.digitalmars.com * Placed in the Public Domain. */ /******************************** * Standard I/O functions that extend $(B std.c.stdio). * $(B std.c.stdio) is automatically imported when importing * $(B std.stdio). * Macros: * WIKI=Phobos/StdStdio */ module std.stdio; public import std.c.stdio; import std.format; import std.utf; import std.string; import std.gc; import std.c.stdlib; import std.c.string; import std.c.stddef; version (DigitalMars) { version (Windows) { // Specific to the way Digital Mars C does stdio version = DIGITAL_MARS_STDIO; } } version (DIGITAL_MARS_STDIO) { } else { // Specific to the way Gnu C does stdio version = GCC_IO; import std.c.linux.linux; } version (DIGITAL_MARS_STDIO) { extern (C) { /* ** * Digital Mars under-the-hood C I/O functions */ int _fputc_nlock(int, FILE*); int _fputwc_nlock(int, FILE*); int _fgetc_nlock(FILE*); int _fgetwc_nlock(FILE*); int __fp_lock(FILE*); void __fp_unlock(FILE*); } alias _fputc_nlock FPUTC; alias _fputwc_nlock FPUTWC; alias _fgetc_nlock FGETC; alias _fgetwc_nlock FGETWC; alias __fp_lock FLOCK; alias __fp_unlock FUNLOCK; } else version (GCC_IO) { /* ** * Gnu under-the-hood C I/O functions; see * http://www.gnu.org/software/libc/manual/html_node/I_002fO-on-Streams.html#I_002fO-on-Streams */ extern (C) { int fputc_unlocked(int, FILE*); int fputwc_unlocked(wchar_t, FILE*); int fgetc_unlocked(FILE*); int fgetwc_unlocked(FILE*); void flockfile(FILE*); void funlockfile(FILE*); ssize_t getline(char**, size_t*, FILE*); ssize_t getdelim (char**, size_t*, int, FILE*); } alias fputc_unlocked FPUTC; alias fputwc_unlocked FPUTWC; alias fgetc_unlocked FGETC; alias fgetwc_unlocked FGETWC; alias flockfile FLOCK; alias funlockfile FUNLOCK; } else { static assert(0, "unsupported C I/O system"); } /********************* * Thrown if I/O errors happen. */ class StdioException : Exception { uint errno; // operating system error code this(char[] msg) { super(msg); } this(uint errno) { char* s = strerror(errno); super(std.string.toString(s).dup); } static void opCall(char[] msg) { throw new StdioException(msg); } static void opCall() { throw new StdioException(getErrno()); } } private void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int newline=false) { int orientation; orientation = fwide(fp, 0); /* Do the file stream locking at the outermost level * rather than character by character. */ FLOCK(fp); scope(exit) FUNLOCK(fp); if (orientation <= 0) // byte orientation or no orientation { void putc(dchar c) { if (c <= 0x7F) { FPUTC(c, fp); } else { char[4] buf; char[] b; b = std.utf.toUTF8(buf, c); for (size_t i = 0; i < b.length; i++) FPUTC(b[i], fp); } } std.format.doFormat(&putc, arguments, argptr); if (newline) FPUTC('\n', fp); } else if (orientation > 0) // wide orientation { version (Windows) { void putcw(dchar c) { assert(isValidDchar(c)); if (c <= 0xFFFF) { FPUTWC(c, fp); } else { wchar[2] buf; buf[0] = cast(wchar) ((((c - 0x10000) >> 10) & 0x3FF) + 0xD800); buf[1] = cast(wchar) (((c - 0x10000) & 0x3FF) + 0xDC00); FPUTWC(buf[0], fp); FPUTWC(buf[1], fp); } } } else version (linux) { void putcw(dchar c) { FPUTWC(c, fp); } } else { static assert(0); } std.format.doFormat(&putcw, arguments, argptr); if (newline) FPUTWC('\n', fp); } } /*********************************** * Arguments are formatted per the * $(LINK2 std_format.html#format-string, format strings) * and written to $(B stdout). */ void writef(...) { writefx(stdout, _arguments, _argptr, 0); } /*********************************** * Same as $(B writef), but a newline is appended * to the output. */ void writefln(...) { writefx(stdout, _arguments, _argptr, 1); } /*********************************** * Same as $(B writef), but output is sent to the * stream fp instead of $(B stdout). */ void fwritef(FILE* fp, ...) { writefx(fp, _arguments, _argptr, 0); } /*********************************** * Same as $(B writefln), but output is sent to the * stream fp instead of $(B stdout). */ void fwritefln(FILE* fp, ...) { writefx(fp, _arguments, _argptr, 1); } /********************************** * Read line from stream fp. * Returns: * null for end of file, * char[] for line read from fp, including terminating '\n' * Params: * fp = input stream * Throws: * $(B StdioException) on error * Example: * Reads $(B stdin) and writes it to $(B stdout). --- import std.stdio; int main() { char[] buf; while ((buf = readln()) != null) writef("%s", buf); return 0; } --- */ char[] readln(FILE* fp = stdin) { char[] buf; readln(fp, buf); return buf; } /********************************** * Read line from stream fp and write it to buf[], * including terminating '\n'. * * This is often faster than readln(FILE*) because the buffer * is reused each call. Note that reusing the buffer means that * the previous contents of it need to be copied if needed. * Params: * fp = input stream * buf = buffer used to store the resulting line data. buf * is resized as necessary. * Returns: * 0 for end of file, otherwise * number of characters read * Throws: * $(B StdioException) on error * Example: * Reads $(B stdin) and writes it to $(B stdout). --- import std.stdio; int main() { char[] buf; while (readln(stdin, buf)) writef("%s", buf); return 0; } --- */ size_t readln(FILE* fp, inout char[] buf) { version (DIGITAL_MARS_STDIO) { FLOCK(fp); scope(exit) FUNLOCK(fp); if (__fhnd_info[fp._file] & FHND_WCHAR) { /* Stream is in wide characters. * Read them and convert to chars. */ static assert(wchar_t.sizeof == 2); buf.length = 0; int c2; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) { buf ~= c; if (c == '\n') break; } else { if (c >= 0xD800 && c <= 0xDBFF) { if ((c2 == FGETWC(fp)) != -1 || c2 < 0xDC00 && c2 > 0xDFFF) { StdioException("unpaired UTF-16 surrogate"); } c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00); } std.utf.encode(buf, c); } } if (ferror(fp)) StdioException(); return buf.length; } auto sz = std.gc.capacity(buf.ptr); //auto sz = buf.length; buf = buf.ptr[0 .. sz]; if (fp._flag & _IONBF) { /* Use this for unbuffered I/O, when running * across buffer boundaries, or for any but the common * cases. */ L1: char *p; if (sz) { p = buf.ptr; } else { sz = 64; p = cast(char*) std.gc.malloc(sz); std.gc.hasNoPointers(p); buf = p[0 .. sz]; } size_t i = 0; for (int c; (c = FGETC(fp)) != -1; ) { if ((p[i] = c) != '\n') { i++; if (i < sz) continue; buf = p[0 .. i] ~ readln(fp); return buf.length; } else { buf = p[0 .. i + 1]; return i + 1; } } if (ferror(fp)) StdioException(); buf = p[0 .. i]; return i; } else { int u = fp._cnt; char* p = fp._ptr; int i; if (fp._flag & _IOTRAN) { /* Translated mode ignores \r and treats ^Z as end-of-file */ char c; while (1) { if (i == u) // if end of buffer goto L1; // give up c = p[i]; i++; if (c != '\r') { if (c == '\n') break; if (c != 0x1A) continue; goto L1; } else { if (i != u && p[i] == '\n') break; goto L1; } } if (i > sz) { buf = cast(char[])std.gc.malloc(i); std.gc.hasNoPointers(buf.ptr); } if (i - 1) memcpy(buf.ptr, p, i - 1); buf[i - 1] = '\n'; if (c == '\r') i++; } else { while (1) { if (i == u) // if end of buffer goto L1; // give up auto c = p[i]; i++; if (c == '\n') break; } if (i > sz) { buf = cast(char[])std.gc.malloc(i); std.gc.hasNoPointers(buf.ptr); } memcpy(buf.ptr, p, i); } fp._cnt -= i; fp._ptr += i; buf = buf[0 .. i]; return i; } } else version (GCC_IO) { if (fwide(fp, 0) > 0) { /* Stream is in wide characters. * Read them and convert to chars. */ FLOCK(fp); scope(exit) FUNLOCK(fp); version (Windows) { buf.length = 0; int c2; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) { buf ~= c; if (c == '\n') break; } else { if (c >= 0xD800 && c <= 0xDBFF) { if ((c2 == FGETWC(fp)) != -1 || c2 < 0xDC00 && c2 > 0xDFFF) { StdioException("unpaired UTF-16 surrogate"); } c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00); } std.utf.encode(buf, c); } } if (ferror(fp)) StdioException(); return buf.length; } else version (linux) { buf.length = 0; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) buf ~= c; else std.utf.encode(buf, cast(dchar)c); if (c == '\n') break; } if (ferror(fp)) StdioException(); return buf.length; } else { static assert(0); } } char *lineptr = null; size_t n = 0; auto s = getdelim(&lineptr, &n, '\n', fp); if (s < 0) { if (ferror(fp)) StdioException(); buf.length = 0; // end of file return 0; } scope(exit) free(lineptr); buf = buf.ptr[0 .. std.gc.capacity(buf.ptr)]; if (s <= buf.length) { buf.length = s; buf[] = lineptr[0 .. s]; } else { buf = lineptr[0 .. s].dup; } return s; } else { static assert(0); } } /** ditto */ size_t readln(inout char[] buf) { return readln(stdin, buf); }
Mar 21 2007
next sibling parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 kris wrote:
 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is 
 that what phobos uses also? (on Win32, Tango uses direct Win32 calls 
 instead)

Here's the new std.stdio work in progress (doesn't yet include write()). Free free to leverage it as you see fit for Tango. Some features of note: 1) It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio. 2) It throws on I/O errors. 3) Unlike C's stdio, it can handle streams of either wide or regular chars. 4) It does not go as far as directly using Posix read/write functions or Windows API functions. We wished to avoid that in the interests of interoperability with C's stdio. 5) It is fully interoperable with, and is synced with, C's stdio. 6) Note how nicely scope(exit) makes the code more readable!

snip]
 private
 void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int 
 newline=false)

Oh, I meant to say that a while ago: some experiments I've done show that doing formatting with templates and direct calls is significantly faster than the way writefx does it (with a delegate). Probably that should be changed. writefln is still very slow. Changing the loop to: void main() { char[] line; while (readln(line)) { writef("%s", line); } } yields: 17.8s dcat Fortunately, up-and-coming template features will allow the library to detect statically known format strings and parse them to render the most efficient writing method. And reading, too. I already have a prototype readfln function that statically figures out the correctness of its format string. Andrei
Mar 21 2007
prev sibling parent reply Roberto Mariottini <rmariottini mail.com> writes:
Walter Bright wrote:
 /**********************************
  * Read line from stream fp and write it to buf[],
  * including terminating '\n'.

Nooo! Please get rid of such a awful Perl-ish hack! Ciao
Mar 22 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Roberto Mariottini wrote:
 Walter Bright wrote:
 /**********************************
  * Read line from stream fp and write it to buf[],
  * including terminating '\n'.

Nooo! Please get rid of such a awful Perl-ish hack!

Please justify your statements instead of using emotion, rhetoric, and implied assumptions. Andrei
Mar 22 2007
parent Roberto Mariottini <rmariottini mail.com> writes:
Andrei Alexandrescu (See Website For Email) wrote:
[...]
 Please justify your statements instead of using emotion, rhetoric, and 
 implied assumptions.

See my previous post. Ciao
Mar 22 2007
prev sibling next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Wed, 21 Mar 2007 14:36:10 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 I'll mention here that it's quite disappointing 
 that Tango's idiomatic method of reading a line from the console 
 (Cin.nextLine(line) unless I missed something) chose to chop the newline 
 automatically. The Perl book spends half a page or so explaining why 
 it's _good_ that the newline is included in the line, and I've been 
 thankful for that on numerous occasions when writing Perl. 

LOL ... That is odd because in nearly every program I ever write that reads text lines, the first thing I need to do after I read in the line is to strip off the bloody newline character.
 Please put the newline back in the line.

... but leave in the option of reading it without a newline attached. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 10:12:54 AM
Mar 21 2007
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Andrei Alexandrescu (See Website For Email) wrote:
 
 I passed a 31 MB text file (containing a dictionary that I'm using in my 
 research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):
 
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.
Mar 22 2007
next sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Sean Kelly wrote:
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

Alternatively, http://www.digitalmars.com/techtips/timing_code.html
Mar 22 2007
parent reply "Kristian Kilpi" <kjkilpi gmail.com> writes:
On Thu, 22 Mar 2007 19:45:16 +0200, Walter Bright  =

<newshound digitalmars.com> wrote:
 Sean Kelly wrote:
 For what it's worth, I created a Win32 version of the Unix 'time'  =


 command recently.  Not too complicated, but if anyone is interested, =


 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's =


 quick and dirty implementation, but works for how I typically use it.=


 Alternatively,
   http://www.digitalmars.com/techtips/timing_code.html

BTW, the following line (printed in bold in 'timing_code.html'): auto Timer t =3D new Timer(); uses 'auto' instead of 'scope'. There is also another identical line at = = the bottom of the page. I think I should also mention that DMD v1.007 uses 'auto' in the error = message 'Error: variable XXX reference to auto class must be auto' = (happens when a scope class object is declared without the 'scope' = keyword). Is this minor glitch corrected in DMD v1.009?
Mar 22 2007
parent Walter Bright <newshound digitalmars.com> writes:
Thanks for the tip, that needs to be fixed.
Mar 22 2007
prev sibling next sibling parent reply torhu <fake address.dude> writes:
Sean Kelly wrote:
<snip>
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

Looks useful, my own tool just measures 'real' time. But it breaks when using redirection, either way: redirect stdin: --- c:\prog\test\linetest>ptime cat <test.txt cat: -: Bad file descriptor real 0m0.000s user 0m0.010s sys 0m0.000s --- redirect stdout: --- c:\prog\test\linetest>ptime cat test.txt >NUL --- The last one outputs nothing. Printing to stderr would fix that.
Mar 23 2007
parent reply Sean Kelly <sean f4.ca> writes:
torhu wrote:
 Sean Kelly wrote:
 <snip>
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

Looks useful, my own tool just measures 'real' time. But it breaks when using redirection, either way: redirect stdin: --- c:\prog\test\linetest>ptime cat <test.txt cat: -: Bad file descriptor real 0m0.000s user 0m0.010s sys 0m0.000s --- redirect stdout: --- c:\prog\test\linetest>ptime cat test.txt >NUL --- The last one outputs nothing. Printing to stderr would fix that.

Hm, I suspect IO redirection must be a feature of the shell. It's a bit of a hack, but this may work "ptime cmd /c cat < test.txt." I'll see how complicated a real fix would be. Sean
Mar 23 2007
parent reply torhu <fake address.dude> writes:
Sean Kelly wrote:
 Hm, I suspect IO redirection must be a feature of the shell.  It's a bit 
 of a hack, but this may work "ptime cmd /c cat < test.txt."  I'll see 
 how complicated a real fix would be.

I get the same error. My own tool doesn't have such problems, but it only uses the standard C system() function. Which might be too limited for what your tool does.
Mar 23 2007
parent Sean Kelly <sean f4.ca> writes:
torhu wrote:
 Sean Kelly wrote:
 Hm, I suspect IO redirection must be a feature of the shell.  It's a 
 bit of a hack, but this may work "ptime cmd /c cat < test.txt."  I'll 
 see how complicated a real fix would be.

I get the same error. My own tool doesn't have such problems, but it only uses the standard C system() function. Which might be too limited for what your tool does.

Yeah, mine uses CreateProcess and then GetProcessTimes. I'll give the docs a look later and see if I can figure out why it's not working.
Mar 23 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in 
 my research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.

I was looking for something like this just the other day. Link seems to be dead these days. Is there a new URL for it? --bb
Apr 20 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in
 my research) through each of the programs above. The output was set to
 /dev/null. I've ran the same program multiple times before the actual
 test, so everything is cached and the process becomes
 computationally-bound. Here are the results summed for 10 consecutive
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.

Link seems to be dead these days. Is there a new URL for it?

I switched web hosts and have yet to re-upload all my old content. I'll see about getting this zipfile up in the next few days. Sean
Apr 21 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Sean Kelly wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in
 my research) through each of the programs above. The output was set to
 /dev/null. I've ran the same program multiple times before the actual
 test, so everything is cached and the process becomes
 computationally-bound. Here are the results summed for 10 consecutive
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.

Link seems to be dead these days. Is there a new URL for it?

I switched web hosts and have yet to re-upload all my old content. I'll see about getting this zipfile up in the next few days.

Okay, I've uploaded it here: http://invisibleduck.org/sean/tmp/ptime.zip Sean
Apr 21 2008
prev sibling next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Andrei Alexandrescu (See Website For Email) wrote:

 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } -------- For the sake of reference, I created a file with 1.8 million (equal) lines, at a total of 133 Megabytes. I ran it through the above program, and your Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD 1.009. Average times perl program : 1.65 seconds (real), 1.45 seconds (user) Average times tango program: 1.08 seconds (real), 0.91 seconds (user) Note that I also tried without the optimization flags to DMD, which resulted in times that were about 10% faster than Perl. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Mar 22 2007
next sibling parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Lars Ivar Igesund wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } --------

5.0s tcat Neat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. Andrei
Mar 22 2007
parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Andrei Alexandrescu (See Website For Email) wrote:

 Lars Ivar Igesund wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } --------

5.0s tcat Neat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. Andrei

Maybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility? Then consider the hypothetical situation where all of libc functionality (including posix functionality currently used in Tango, system calls, etc) is exchanged with an equivalent libd. Somewhat depending on answer above, would same reasoning apply? -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Mar 23 2007
parent reply "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
Lars Ivar Igesund wrote:
 Neat! Now that we got the performance problem out of the way, let's
 discuss stdio compatibility. I suggest you use getline on GNU platforms.

 Andrei

Maybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility?

As long as the global "stdin" symbol is a FILE*, this would be highly recommendable. And given that phobos does offer stdin as a FILE*, stdio compatibility is important for programs that want to use phobos and tango simultaneously (e.g., a library using phobos linked with another one using tango). Andrei
Mar 23 2007
parent Lars Ivar Igesund <larsivar igesund.net> writes:
Andrei Alexandrescu (See Website For Email) wrote:

 Lars Ivar Igesund wrote:
 Neat! Now that we got the performance problem out of the way, let's
 discuss stdio compatibility. I suggest you use getline on GNU platforms.

 Andrei

Maybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility?

As long as the global "stdin" symbol is a FILE*, this would be highly recommendable. And given that phobos does offer stdin as a FILE*, stdio compatibility is important for programs that want to use phobos and tango simultaneously (e.g., a library using phobos linked with another one using tango).

May I then suggest that you create a enhancement/wishlist ticket for this? Thanks :) -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Mar 23 2007
prev sibling parent reply Davidl <Davidl 126.com> writes:
great job!
i didn't know I/O performance could variate in such a great range.
and thanks for the great job from tango team.
heh, now d's I/O is as fast as c ?
or tango is even faster than C's I/O?

 Andrei Alexandrescu (See Website For Email) wrote:

 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } -------- For the sake of reference, I created a file with 1.8 million (equal) lines, at a total of 133 Megabytes. I ran it through the above program, and your Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD 1.009. Average times perl program : 1.65 seconds (real), 1.45 seconds (user) Average times tango program: 1.08 seconds (real), 0.91 seconds (user) Note that I also tried without the optimization flags to DMD, which resulted in times that were about 10% faster than Perl.

Mar 22 2007
parent Sean Kelly <sean f4.ca> writes:
Davidl wrote:
 great job!
 i didn't know I/O performance could variate in such a great range.
 and thanks for the great job from tango team.
 heh, now d's I/O is as fast as c ?
 or tango is even faster than C's I/O?

Tango is faster, at least for this particular test. Sean
Mar 22 2007
prev sibling parent reply Roberto Mariottini <rmariottini mail.com> writes:
Hi,
I have got no reply to my questions.
Can somebody answer them?

Ciao

-------- Original Message --------
Subject: Re: stdio performance in tango, stdlib, and perl
Date: Fri, 23 Mar 2007 10:08:24 +0100
From: Roberto Mariottini <rmariottini mail.com>
Organization: Digital Mars
Newsgroups: digitalmars.D
References: <4601A54A.8050307 erdani.org> 
<etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
<etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
<ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>

Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:

 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

I'm completely against that awful mess of code.

What exactly would be bad about it?

It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code.
 is guaranteed 100% to produce an accurate copy of its input. The
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a
 file larger by one byte.

Are you sure? Can you elaborate more on this?

Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.

A newline is two bytes here.
 Moreover, with the automated chopping it is basically impossible to
 write a program that exactly reproduces its input because readln
 essentially loses information.



A text file is not a binary file. A newline at end of file is completely irrelevant. On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others.
 Also, stdio also offers a readln() that creates a new line on every
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.

Yes, wrong, very wrong. Except it's not me who's wrong :o).

Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computing
 The code _just works_ because an empty line means _precisely_ and
 without the shadow of a doubt that the file has ended. (An I/O error
 throws an exception, and does NOT return an empty line; that is
 another important point.) An API that uses automated chopping should
 not offer such a function because an empty line may mean that an
 empty line was read, or that it's eof time. So the API would force
 people to write convoluted code.

What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

You are objectively wrong.

Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); }
 The code is portable. Newline translation
 takes care of it. Just try it.

Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves.
 In the couple of years I've used Perl I've thanked the Perl folks for
 their readline decision numerous times.

Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.

What can I say? Thanks! I'm enlightened!

You'll be more enlightened if you had to work with big CGI scripts written in Perl, and eventually had to convert them to JSP to make the average (available) programmers able to work on them. Sure, with Perl you can do many things in less than 10 lines. But keep it less than 10 lines, or you are in troubles.
 Ever tried to do cin or fscanf? You can't do any intelligent input
 with them because they skip whitespace and newlines like it's out of
 style.

I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.

To each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).

scanf(" :%d", &i); Ciao
Mar 27 2007
next sibling parent "David B. Held" <dheld codelogicconsulting.com> writes:
Roberto Mariottini wrote:
 Hi,
 I have got no reply to my questions.
 Can somebody answer them?

Your "questions" hardly seem sincere. Were you not simply posturing for your position? Or do you want to see endless debate on chomp() vs. no chomp()? Dave
 -------- Original Message --------
 Subject: Re: stdio performance in tango, stdlib, and perl
 Date: Fri, 23 Mar 2007 10:08:24 +0100
 From: Roberto Mariottini <rmariottini mail.com>
 Organization: Digital Mars
 Newsgroups: digitalmars.D
 References: <4601A54A.8050307 erdani.org> 
 <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
 <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
 <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>
 
 Andrei Alexandrescu (See Website For Email) wrote:
  > Roberto Mariottini wrote:
 [...]
  >>> Essentially it's about information. The naive loop:
  >>>
  >>> while (readln(line)) {
  >>>   write(line);
  >>> }
  >>
  >> I'm completely against that awful mess of code.
  >
  > What exactly would be bad about it?
 
 It's not clearly evident for a non-expert programmer that a new-line is
 appended at each line.
 Take any programmer from any language of your choice and ask what this
 snippets is supposed to do.
 This is against immediate comprehension of code.
 
  >>> is guaranteed 100% to produce an accurate copy of its input. The
  >>> version that chops lines looks like:
  >>>
  >>> while (readln(line)) {
  >>>   writeln(line);
  >>> }
  >>>
  >>> This may or may not add a newline to the output, possibly creating a
  >>> file larger by one byte.
  >>
  >> Are you sure? Can you elaborate more on this?
  >
  > Very simple. If the file ends with a newline, the code reproduces it. If
  > not, the code gratuitously appends a newline.
 
 A newline is two bytes here.
 
  >>> Moreover, with the automated chopping it is basically impossible to
  >>> write a program that exactly reproduces its input because readln
  >>> essentially loses information.
 
 A text file is not a binary file.
 A newline at end of file is completely irrelevant.
 
 On the other side, no code should break if the last newline is there or
 not. The problem with your code is that the last line comes different
 from the others.
 
  >>> Also, stdio also offers a readln() that creates a new line on every
  >>> call. That is useful if you want fresh lines every read:
  >>>
  >>> char[] line;
  >>> while ((line = readln()).length > 0) {
  >>>   ++dictionary[line];
  >>> }
  >>
  >> This way you'll get two different dictionaries on Windows and on Unix.
  >> Wrong, very wrong.
  >
  > Yes, wrong, very wrong. Except it's not me who's wrong :o).
 
 Ehm, can you elaborate how good is to put a '\n' at the end of any
 string when working with:
 
  - databases
  - communication programs
  - interprocess communication
  - distributed computing
 
  >>> The code _just works_ because an empty line means _precisely_ and
  >>> without the shadow of a doubt that the file has ended. (An I/O error
  >>> throws an exception, and does NOT return an empty line; that is
  >>> another important point.) An API that uses automated chopping should
  >>> not offer such a function because an empty line may mean that an
  >>> empty line was read, or that it's eof time. So the API would force
  >>> people to write convoluted code.
  >>
  >> What is your definition of "convolute"?
  >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
  >
  > You are objectively wrong.
 
 Say 'subjectively'.
 Assignments in boolean expressions should be avoided. The average
 programmer knows something about this magic, but fears to touch it, and
 never completely understand it.
 
 Still, any programmer from any language would think that this code ends
 at the first empty line.
 
 Here is one of the many possible non-convoluted versions:
 
 char[] line = readln();
 while (line.length > 0) {
   ++dictionary[chomp(line)];
   line = readln();
 }
 
 And this is how it should be:
 
 char[] line = readln();
 while (line != null) {
   ++dictionary[line];
   line = readln();
 }
 
  > The code is portable. Newline translation
  > takes care of it. Just try it.
 
 Newline translation is an old problem with C, C++ and now with D.
 Nothing can be resolved with newline translation.
 
 Opening a file in binary mode on Unix and treating it like a text file
 works only as long as the program is run on Unix.
 Newline translation is prone to portability errors, thus non-portable.
 
 In my experience, newline translations pose more portability problems
 than it solves.
 
  >>> In the couple of years I've used Perl I've thanked the Perl folks for
  >>> their readline decision numerous times.
  >>
  >> Per is something the world should get rid of, quickly.
  >> Per is wrong, Perl is evil, Perl is useless.
  >> You don't need Perl, try to cease using it.
  >>
  >> The fact that this narrow-minded idea comes from Perl is not 
 surprising.
  >
  > What can I say? Thanks! I'm enlightened!
 
 You'll be more enlightened if you had to work with big CGI scripts
 written in Perl, and eventually had to convert them to JSP to make the
 average (available) programmers able to work on them.
 
 Sure, with Perl you can do many things in less than 10 lines.
 But keep it less than 10 lines, or you are in troubles.
 
  >>> Ever tried to do cin or fscanf? You can't do any intelligent input
  >>> with them because they skip whitespace and newlines like it's out of
  >>> style.
  >>
  >> I use them, and I find them very comfortable.
  >> Again your definition of 'intelligent' is particular.
  >> If you find Perl 'intelligent', this say a lot.
  >
  > To each their own :o). Oh, probably you could explain how I can read a
  > string containing spaces, followed by ":" and a number with scanf. Takes
  > one line in Perl and D's readfln (not yet distributed).
 
 scanf(" :%d", &i);
 
 Ciao

Mar 27 2007
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 27 Mar 2007 16:27:57 +0200, Roberto Mariottini wrote:

 Hi,
 I have got no reply to my questions.
 Can somebody answer them?
 
 Ciao
 
 -------- Original Message --------
 Subject: Re: stdio performance in tango, stdlib, and perl
 Date: Fri, 23 Mar 2007 10:08:24 +0100
 From: Roberto Mariottini <rmariottini mail.com>
 Organization: Digital Mars
 Newsgroups: digitalmars.D
 References: <4601A54A.8050307 erdani.org> 
 <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
 <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
 <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>
 
 Andrei Alexandrescu (See Website For Email) wrote:
  > Roberto Mariottini wrote:
 [...]
  >>> Essentially it's about information. The naive loop:
  >>>
  >>> while (readln(line)) {
  >>>   write(line);
  >>> }
  >>
  >> I'm completely against that awful mess of code.
  >
  > What exactly would be bad about it?
 
 It's not clearly evident for a non-expert programmer that a new-line is
 appended at each line.
 Take any programmer from any language of your choice and ask what this
 snippets is supposed to do.
 This is against immediate comprehension of code.

One of the small issues I have with 'readln' appending a newline character(s) at the end of a line is that such characters are not actually a part of the text line; they are delimiters that separate one line from another. In essence they are the same type of thing as the null byte that marks the ends of a C-style string. If the purpose of returning the newline character(s) by readln() is to inform the caller that a complete line was actually read in, then I would have thought that this is 'optional' data that the caller could choose to know about or not. If I call readln() and a complete line was not read in I would consider this an exception. And by the way, a text file that does not terminate with a newline is not an exception in my point of view as this could be just a situation in which a delimiting newline is not required (there is nothing to delimit the last from).
  >>> is guaranteed 100% to produce an accurate copy of its input. The
  >>> version that chops lines looks like:
  >>>
  >>> while (readln(line)) {
  >>>   writeln(line);
  >>> }
  >>>
  >>> This may or may not add a newline to the output, possibly creating a
  >>> file larger by one byte.
  >>
  >> Are you sure? Can you elaborate more on this?
  >
  > Very simple. If the file ends with a newline, the code reproduces it. If
  > not, the code gratuitously appends a newline.
 
 A newline is two bytes here.

Som reanln() implementations disregard the actual newline as supplied by the operating system and just append a single 0x0A byte for all operating systems. And when it comes to outputing this, it is transformed back into the appropriate newline sequence for the running opsys.
  >>> Moreover, with the automated chopping it is basically impossible to
  >>> write a program that exactly reproduces its input because readln
  >>> essentially loses information.
 
 A text file is not a binary file.
 A newline at end of file is completely irrelevant.

Exactly. It is merely a delimiter *between* lines.
 On the other side, no code should break if the last newline is there or
 not. The problem with your code is that the last line comes different
 from the others.

The last line does not need a delimiter - so some systems make it optional.
  >>> Also, stdio also offers a readln() that creates a new line on every
  >>> call. That is useful if you want fresh lines every read:
  >>>
  >>> char[] line;
  >>> while ((line = readln()).length > 0) {
  >>>   ++dictionary[line];
  >>> }
  >>
  >> This way you'll get two different dictionaries on Windows and on Unix.
  >> Wrong, very wrong.
  >
  > Yes, wrong, very wrong. Except it's not me who's wrong :o).
 
 Ehm, can you elaborate how good is to put a '\n' at the end of any
 string when working with:
 
   - databases
   - communication programs
   - interprocess communication
   - distributed computing

Does not make a lot of sense to me either. Like I said earlier, the first thing I usually do when reading a line is to remove the damned newline character(s).
  >>> The code _just works_ because an empty line means _precisely_ and
  >>> without the shadow of a doubt that the file has ended. (An I/O error
  >>> throws an exception, and does NOT return an empty line; that is
  >>> another important point.) An API that uses automated chopping should
  >>> not offer such a function because an empty line may mean that an
  >>> empty line was read, or that it's eof time. So the API would force
  >>> people to write convoluted code.
  >>
  >> What is your definition of "convolute"?
  >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
  >
  > You are objectively wrong.
 
 Say 'subjectively'.
 Assignments in boolean expressions should be avoided. The average
 programmer knows something about this magic, but fears to touch it, and
 never completely understand it.
 
 Still, any programmer from any language would think that this code ends
 at the first empty line.
 
 Here is one of the many possible non-convoluted versions:
 
 char[] line = readln();
 while (line.length > 0) {
    ++dictionary[chomp(line)];
    line = readln();
 }
 
 And this is how it should be:
 
 char[] line = readln();
 while (line != null) {
    ++dictionary[line];
    line = readln();
 }

This depends on distinguishing between an empty line and a null line.
  > The code is portable. Newline translation
  > takes care of it. Just try it.
 
 Newline translation is an old problem with C, C++ and now with D.
 Nothing can be resolved with newline translation.
 
 Opening a file in binary mode on Unix and treating it like a text file
 works only as long as the program is run on Unix.
 Newline translation is prone to portability errors, thus non-portable.
 
 In my experience, newline translations pose more portability problems
 than it solves.

Unless done right by the compiler/language and not having to be done by the code writer each time. Much like a GC system. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Mar 27 2007