digitalmars.D - stdio performance in tango, stdlib, and perl

Andrei Alexandrescu (See Website For Email) (49/49) Mar 21 2007 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming...

Walter Bright (3/5) Mar 21 2007 Can you add a C++ to the mix? I think that would be a very

Andrei Alexandrescu (See Website For Email) (28/34) Mar 21 2007 Obliged. Darn, I had to wait a *lot* longer.

Walter Bright (4/38) Mar 21 2007 This is awesomely bad. Although it's possible to get very fast code out

Andrei Alexandrescu (See Website For Email) (4/46) Mar 21 2007 I don't know exactly what sync'ing does in C++, but probably it isn't

Walter Bright (5/10) Mar 21 2007 I think it means bringing the iostream I/O buffer in to sync with the

Andrei Alexandrescu (See Website For Email) (3/15) Mar 21 2007 Aha, so readln is better _and_ more compatible. Great!

kris (3/26) Mar 21 2007 Out of interest, how does the currently shipping Phobos fare in this tes...

Andrei Alexandrescu (See Website For Email) (4/32) Mar 21 2007 I don't have it anymore. Couldn't write a test anyway, because currently...

James Dennett (27/70) Mar 21 2007 Try the way IOStreams would be used if you didn't want

torhu (99/113) Mar 21 2007

torhu (6/25) Mar 22 2007 I've run some of the tests with more accurate timing. Andrei's Tango

kris (7/35) Mar 22 2007 Just for jollies, a briefly optimized tango.io was tried also: it came

Andrei Alexandrescu (See Website For Email) (7/46) Mar 22 2007 Is it compatible with C's stdio? IOW, would this sequence work?

kris (39/95) Mar 22 2007 Nope. Tango is for D, not C. In order to make a arguably better library,...

Andrei Alexandrescu (See Website For Email) (23/124) Mar 22 2007 That's not what my tests show on Linux, where Perl and readln beat Tango...

kris (17/164) Mar 22 2007 Oh, come now. Yesterday Tango was the "fastest" on your machine, and

Andrei Alexandrescu (See Website For Email) (16/33) Mar 22 2007 Probably it's a misunderstanding. Yesterday the Tango that did not

Sean Kelly (7/20) Mar 22 2007 We're in the process of getting an automated nightly snapshot process

Andrei Alexandrescu (See Website For Email) (7/13) Mar 22 2007 I think you'd make a lot of people happy. Several documented attempts of...

Sean Kelly (8/13) Mar 22 2007 This page describes one way to use Tango and Phobos together:

Andrei Alexandrescu (See Website For Email) (29/41) Mar 22 2007 Here's what worked for me. The script also allows compiling dmd programs...

Sean Kelly (13/57) Mar 22 2007 This is intentional, though it may change later based on user feedback.

Andrei Alexandrescu (See Website For Email) (5/31) Mar 22 2007 cat is not comparable. Besides, there must be some overhead associated
torhu (22/48) Mar 22 2007 Couple of more results:

Sean Kelly (10/51) Mar 22 2007 Oh good. I was hoping someone would test Tango without flushing every
torhu (3/19) Mar 24 2007 Whoops, can anyone spot the bug? When I fixed it, the time it took to

Frits van Bommel (2/22) Mar 24 2007 I'm guessing the fact that sizeof(buf) != 1000 ?

torhu (3/9) Mar 25 2007 I think you were the first to post. Go buy yourself a lollipop, you've

Sean Kelly (3/23) Mar 24 2007 The fgets(sizeof(buf)) looks like it could affect read performance a tad...

Andrei Alexandrescu (See Website For Email) (9/81) Mar 22 2007 With your code pasted and wind from behind:

James Dennett (14/98) Mar 22 2007 IOStreams is a terrible chunk of library design, and

Andrei Alexandrescu (See Website For Email) (6/13) Mar 22 2007 Indeed. Then you'll be glad to hear that D will soon accommodate smarter...

Roberto Mariottini (6/16) Mar 22 2007 The portable way to write a newline in C++ is to use the 'endl'

torhu (5/10) Mar 22 2007 Unless a file is opened in binary mode, '\n' will be translated into

Deewiant (3/6) Mar 22 2007 But I don't think this is the case in Tango, so Cout(line)("\n") should ...

kris (3/12) Mar 22 2007 At the behest of andrei, Cin line-parsing now has an option to include

Deewiant (4/18) Mar 22 2007 Only if you've got the latest SVN revision of Tango. If not, use

Andrei Alexandrescu (See Website For Email) (3/19) Mar 22 2007 Wrong. Newline translation will be correct on both systems.

Roberto Mariottini (6/13) Mar 23 2007 It depends on how you open the file: 'endl' works even with files open

James Dennett (13/29) Mar 23 2007 The difference between '\n' and std::endl in C++ is only

kris (17/75) Mar 21 2007 There's a couple of things to look at here:

Andrei Alexandrescu (See Website For Email) (23/44) Mar 21 2007 The test code assumed taking a look at each line before printing it, so

kris (7/35) Mar 21 2007 Just suggesting that the scanning for [\r]\n patterns is likely a good

Andrei Alexandrescu (See Website For Email) (40/80) Mar 21 2007 Well probably but must be tested. Newlines comprise about 3% of the file...

kris (12/47) Mar 21 2007 Yeah, I can imagine. Module tango.io.Console at line 119 should have a

Walter Bright (2/4) Mar 21 2007 The flush on newline should only be done if isatty() returns !=0.

kris (3/11) Mar 21 2007 yep; if you were to submit a ticket for that, it would be appreciated :)

Andrei Alexandrescu (See Website For Email) (17/36) Mar 21 2007 Why not? Programs using the standard input and output are ubiquitous,

Derek Parnell (15/17) Mar 21 2007 Most programs I run that do lots of I/O only take seconds to run, so if

Davidl (3/16) Mar 21 2007 u r working on database?

Derek Parnell (10/13) Mar 21 2007 Yep. A light-weight, single-user D/B suitable for "home" applications.

kris (10/22) Mar 21 2007 If tango were terribly terribly slow instead, then it would be cause for...

Andrei Alexandrescu (See Website For Email) (12/41) Mar 21 2007 That's great, but by and large, the attitude that "this is the simple

kris (10/50) Mar 21 2007 Oh, if there's any implication that Tango ought to be "faster" than it

Andrei Alexandrescu (See Website For Email) (3/47) Mar 22 2007 Do it and let's test.

kris (5/12) Mar 22 2007 you can try it right now with a Cout(line)("\n");

Andrei Alexandrescu (See Website For Email) (12/26) Mar 22 2007 On my Linux box:

Andrei Alexandrescu (See Website For Email) (4/48) Mar 22 2007 Oh, but I forgot it's cheating: uses read/write so it's incompatible

kris (13/15) Mar 22 2007 How can it possibly be "cheating" when the code was in place before you

Andrei Alexandrescu (See Website For Email) (10/28) Mar 22 2007 "Principle" I guess. That sounds great. My opinion in the matter is

kris (11/47) Mar 22 2007 Yep. A thousand pardons for my late night spelling mistake. I'll be sure...

Sean Kelly (5/13) Mar 22 2007 If I understand you correctly, you're saying that all IO packages must

Andrei Alexandrescu (See Website For Email) (5/18) Mar 22 2007 I think for stdio, going through the standard C library would be very

Walter Bright (22/35) Mar 21 2007 One problem with C++, as I mentioned before, is that the

Andrei Alexandrescu (See Website For Email) (4/24) Mar 21 2007 You could tell from this and my (almost identical) post that Walter's
kris (11/60) Mar 21 2007 tango.io is not even optimized for this case (unlike the new Phobos
James Dennett (29/37) Mar 21 2007 This kind of simplistic bashing of a language or library

Andrei Alexandrescu (See Website For Email) (11/52) Mar 22 2007 For the record, I used gcc 4.1.2 20060928 (prerelease) (Ubuntu
Walter Bright (21/25) Mar 22 2007 Maybe it is a bit of frustration on my part. I often run into people

James Dennett (25/55) Mar 22 2007 Good answer. (Yes, seriously.)

Bill Baxter (7/16) Mar 23 2007 I think there is a tendency to assume that APIs and languages which have...

Walter Bright (11/27) Mar 23 2007 D bucks conventional wisdom in more than one way. There's a current

James Dennett (13/43) Mar 24 2007 I'm intrigued by your claim that IOStreams is not thread-safe;

Andrei Alexandrescu (See Website For Email) (10/52) Mar 24 2007 cout << a << b;

James Dennett (21/44) Mar 24 2007 As you appear to be saying that printf has to flush every

Walter Bright (22/54) Mar 24 2007 In order for printf to work right it does not need to flush every time

James Dennett (30/82) Mar 24 2007 That would be true, except that Andrei wrote that

Walter Bright (12/48) Mar 24 2007 Ok, but since it is typical to do a flush on newline if isatty(), that
Andrei Alexandrescu (See Website For Email) (8/41) Mar 24 2007 Lines don't have to appear at exact times, they only must not

James Dennett (17/58) Mar 24 2007 With sufficiently short lines, where the value of

Sean Kelly (11/18) Mar 24 2007 ...since they obviously don't have to consider thread-safety when

Andrei Alexandrescu (See Website For Email) (8/26) Mar 24 2007 Good question(s). Might be also that I/O interface is considerably

Sean Kelly (17/47) Mar 24 2007 The stream could acquire a lock and pass it to a proxy object which

Walter Bright (13/26) Mar 24 2007 I disagree. It's been working fine for nearly 20 years now. gcc

Sean Kelly (8/33) Mar 25 2007 True enough. Though I wonder how much of a factor it is that C++ has no...

Andrei Alexandrescu (See Website For Email) (8/22) Mar 24 2007 Numbers clearly tell the above is wrong. Here's the thing: I write

James Dennett (17/42) Mar 24 2007 Except that your test wasn't of the right thing; you

Andrei Alexandrescu (See Website For Email) (4/41) Mar 24 2007 If you did, fine. I take that part of my argument back. I'll also note

James Dennett (6/8) Mar 24 2007 Trying to defend IOStreams is certainly a challenge.

Sean Kelly (12/67) Mar 24 2007 stringstream s;

Walter Bright (19/26) Mar 24 2007 The trouble with that design is people working on subsystems or

Andrei Alexandrescu (See Website For Email) (8/38) Mar 24 2007 MS does the same now if I remember correctly: all of its libraries are

Sean Kelly (7/26) Mar 24 2007 Yup. In fact, I just discovered that Visual Studio 2005 doesn't even

Walter Bright (15/41) Mar 24 2007 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for

James Dennett (53/99) Mar 24 2007 I've seen only a minority of those claims made as part

Walter Bright (28/94) Mar 24 2007 I think we're in agreement, as I said "one or two", and that such claims...

0ffh (13/17) Mar 25 2007 I admit I used to think similar to that, a somewhat longer while ago.

Dan (6/20) Mar 26 2007 I totally agree that GC is a solid way of cutting bad code, which perfor...

Derek Parnell (29/75) Mar 21 2007 And exactly how often do people need to write this program? I would have

Andrei Alexandrescu (See Website For Email) (18/79) Mar 21 2007 Of course. It's not about reproducing the input exactly, but about

Derek Parnell (24/64) Mar 21 2007 Actually you said "stdio also offers a readln() that creates a new line ...

Andrei Alexandrescu (See Website For Email) (12/42) Mar 21 2007 Fine. It's just not clear what readln does from its signature. In

Derek Parnell (13/36) Mar 21 2007

Roberto Mariottini (20/63) Mar 22 2007 I suspect Walter was thinking on something else at the time.

Andrei Alexandrescu (See Website For Email) (13/89) Mar 22 2007 Very simple. If the file ends with a newline, the code reproduces it. If...

Roberto Mariottini (51/125) Mar 23 2007 It's not clearly evident for a non-expert programmer that a new-line is

Vladimir Panteleev (10/19) Mar 22 2007 I'd just like to say that the chosen naming convention seems a bit unint...

Daniel Keep (31/53) Mar 22 2007 I suppose it is a little, but I think that's more an issue with text IO

Vladimir Panteleev (11/17) Mar 22 2007 I was actually talking about the complexity of the source, not the effic...

Daniel Keep (13/31) Mar 22 2007 import std.string;
=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (6/10) Mar 22 2007 Actually it is even four:

Roberto Mariottini (5/17) Mar 22 2007 I have some of these also. Legacy applications are not the most, but=20

Andrei Alexandrescu (See Website For Email) (16/38) Mar 22 2007 That's a mistake, simple as that. Pascal has made many other similar

Vladimir Panteleev (7/12) Mar 22 2007 Ah, yes, missed that one.
Roberto Mariottini (4/5) Mar 23 2007 Just add a call to chomp to your benchmarks.

Walter Bright (518/520) Mar 21 2007 Here's the new std.stdio work in progress (doesn't yet include write())....

Andrei Alexandrescu (See Website For Email) (21/41) Mar 21 2007 [snip]
Roberto Mariottini (4/7) Mar 22 2007 Nooo!

Andrei Alexandrescu (See Website For Email) (4/11) Mar 22 2007 Please justify your statements instead of using emotion, rhetoric, and

Roberto Mariottini (4/6) Mar 22 2007 See my previous post.

Derek Parnell (12/19) Mar 21 2007 LOL ... That is odd because in nearly every program I ever write that re...
Sean Kelly (5/16) Mar 22 2007 For what it's worth, I created a Win32 version of the Unix 'time'

Walter Bright (3/7) Mar 22 2007 Alternatively,

Kristian Kilpi (13/20) Mar 22 2007 I =

Walter Bright (1/1) Mar 22 2007 Thanks for the tip, that needs to be fixed.

torhu (17/21) Mar 23 2007 Looks useful, my own tool just measures 'real' time. But it breaks when...

Sean Kelly (5/33) Mar 23 2007 Hm, I suspect IO redirection must be a feature of the shell. It's a bit...

torhu (4/7) Mar 23 2007 I get the same error. My own tool doesn't have such problems, but it

Sean Kelly (3/11) Mar 23 2007 Yeah, mine uses CreateProcess and then GetProcessTimes. I'll give the

Bill Baxter (4/21) Apr 20 2008 I was looking for something like this just the other day.

Sean Kelly (4/24) Apr 21 2008 I switched web hosts and have yet to re-upload all my old content. I'll...

Sean Kelly (4/26) Apr 21 2008 Okay, I've uploaded it here:

Lars Ivar Igesund (30/32) Mar 22 2007 I have uploaded a snapshot with prebuilt libraries to

Andrei Alexandrescu (See Website For Email) (5/33) Mar 22 2007 5.0s tcat

Lars Ivar Igesund (13/49) Mar 23 2007 Maybe discuss first why stdio compatibility is needed? Is the equivalent

Andrei Alexandrescu (See Website For Email) (7/15) Mar 23 2007 As long as the global "stdin" symbol is a FILE*, this would be highly

Lars Ivar Igesund (8/23) Mar 23 2007 May I then suggest that you create a enhancement/wishlist ticket for thi...

Davidl (5/35) Mar 22 2007 great job!

Sean Kelly (3/8) Mar 22 2007 Tango is faster, at least for this particular test.

Dave (3/15) Mar 22 2007 Which of course begs the question -- Could an overload be added so it do...

Walter Bright (4/20) Mar 22 2007 Since the data has to be buffered anyway, might as well use stdio's

Roberto Mariottini (65/133) Mar 27 2007 Hi,

David B. Held (5/171) Mar 27 2007 Your "questions" hardly seem sincere. Were you not simply posturing for...
Derek Parnell (35/163) Mar 27 2007 One of the small issues I have with 'readln' appending a newline

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
release), and Tango.

First, I realize I should make an account on dsource.org and post the 
following there, but I'll mention here that it's quite disappointing 
that Tango's idiomatic method of reading a line from the console 
(Cin.nextLine(line) unless I missed something) chose to chop the newline 
automatically. The Perl book spends half a page or so explaining why 
it's _good_ that the newline is included in the line, and I've been 
thankful for that on numerous occasions when writing Perl. Please put 
the newline back in the line.

Anyhow, here's the code. The D up-and-coming stdio version:

import std.stdio;
void main() {
   char[] line;
   while (readln(line)) {
     write(line);
   }
}

The Tango version:

import tango.io.Console;
void main() {
   char[] line;
   while (Cin.nextLine(line)) {
     Cout(line).newline;
   }
}

(The .newline adds back the information that nextLine promptly lost, 
sigh.) I'm not sure whether this is the idiomatic way of reading and 
writing lines in Tango, but tango.io.Stdout seems to say so: "If you 
don't need formatted output or unicode translation, consider using the 
module tango.io.Console directly." - which suggests that Console would 
be the most primitive stdio library.

The Perl version:

while (<>) {
   print;
}

All programs operate in the same exact boring way: read a line from 
stdin, print it, lather, rinse, repeat.

I passed a 31 MB text file (containing a dictionary that I'm using in my 
research) through each of the programs above. The output was set to 
/dev/null. I've ran the same program multiple times before the actual 
test, so everything is cached and the process becomes 
computationally-bound. Here are the results summed for 10 consecutive 
runs (averaged over 5 epochs):

13.9s		Tango
6.6s		Perl
5.0s		std.stdio


Andrei

Mar 21 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
 release), and Tango.

Can you add a C++ <iostream> to the mix? I think that would be a very 
useful additional data point.

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the 
 coming release), and Tango.

 
 Can you add a C++ <iostream> to the mix? I think that would be a very 
 useful additional data point.

Obliged. Darn, I had to wait a *lot* longer.

#include <string>
#include <iostream>

int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
}

(C++ makes the same mistake wrt newline.)

35.7s		cppcat

I seem to remember a trick that puts some more wind into iostream's 
sails, so I tried that as well:

#include <string>
#include <iostream>
using namespace std;

int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
}

Result:

13.3s		cppcat


Andrei

Mar 21 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Obliged. Darn, I had to wait a *lot* longer.
 
 #include <string>
 #include <iostream>
 
 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }
 
 (C++ makes the same mistake wrt newline.)
 
 35.7s        cppcat

This is awesomely bad. Although it's possible to get very fast code out 
of C++, it rarely seems to happen when you write straightforward code.


 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:
 
 #include <string>
 #include <iostream>
 using namespace std;
 
 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }
 
 Result:
 
 13.3s        cppcat

Turning off sync is cheating - D's readln does syncing.

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Obliged. Darn, I had to wait a *lot* longer.

 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

 
 This is awesomely bad. Although it's possible to get very fast code out 
 of C++, it rarely seems to happen when you write straightforward code.
 
 
 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

 
 Turning off sync is cheating - D's readln does syncing.

I don't know exactly what sync'ing does in C++, but probably it isn't 
the locking that you are thinking of.

Andrei

Mar 21 2007

Walter Bright <newshound digitalmars.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

 
 I don't know exactly what sync'ing does in C++, but probably it isn't 
 the locking that you are thinking of.

I think it means bringing the iostream I/O buffer in to sync with the 
stdio I/O buffer, i.e. you can mix printf and iostream output and it 
will appear in the same order the calls happen in the code.

D's readln is inherently synced in this manner.

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

 I don't know exactly what sync'ing does in C++, but probably it isn't 
 the locking that you are thinking of.

 
 I think it means bringing the iostream I/O buffer in to sync with the 
 stdio I/O buffer, i.e. you can mix printf and iostream output and it 
 will appear in the same order the calls happen in the code.
 
 D's readln is inherently synced in this manner.

Aha, so readln is better _and_ more compatible. Great!

Andrei

Mar 21 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[snip]
 (C++ makes the same mistake wrt newline.)
 
 35.7s        cppcat
 
 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:
 
 #include <string>
 #include <iostream>
 using namespace std;
 
 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }
 
 Result:
 
 13.3s        cppcat


Out of interest, how does the currently shipping Phobos fare in this test?

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

 I seem to remember a trick that puts some more wind into iostream's 
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

 
 
 Out of interest, how does the currently shipping Phobos fare in this test?

I don't have it anymore. Couldn't write a test anyway, because currently 
Phobos does not offer readln.

Andrei

Mar 21 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

 Can you add a C++ <iostream> to the mix? I think that would be a very
 useful additional data point.

 
 Obliged. Darn, I had to wait a *lot* longer.
 
 #include <string>
 #include <iostream>
 
 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }
 
 (C++ makes the same mistake wrt newline.)
 
 35.7s        cppcat
 
 I seem to remember a trick that puts some more wind into iostream's
 sails, so I tried that as well:
 
 #include <string>
 #include <iostream>
 using namespace std;
 
 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }
 
 Result:
 
 13.3s        cppcat

Try the way IOStreams would be used if you didn't want
it to go slowly:

#include <string>
#include <iostream>

int main() {
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(NULL);
    std::string s;
    while (std::getline(std::cin, s)) {
        std::cout << s << '\n';
    }
}

(Excuse the lack of a using directive there; I find the
code more readable without them.  YMMV.)

I don't have your sample file or your machine, but for
the quick tests I just ran on this one machine, the code
above runs move than 60% faster.  Without using tie(),
each read from standard input causes a flush of standard
output (so that, by default, they work appropriately for
console I/O).

It's certainly true that making efficient use of IOStreams
needs some specific knowledge, and that writing an
efficient implementation of IOStreams is far from trivial.
But if we're comparing to C++, we should probably compare
to some reasonably efficient idiomatic C++.

-- James

Mar 21 2007

torhu <fake address.dude> writes:

James Dennett wrote:
<snip>
 Try the way IOStreams would be used if you didn't want
 it to go slowly:
 
 #include <string>
 #include <iostream>
 
 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }

<snip>


I did some tests with a 58 MB file, containing one million lines.  I'm 
on winxp.  I ran each test a few times, timing them with a stopwatch.  I 
threw in a naive C version, and a std.cstream version, just out of 
curiousity.

It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with 
mingw it does.  Basically, Tango wins hands down on my system.  Whether 
the Tango version flushes after each line or not, doesn't seem to matter 
much on Windows.


Compiled with:
dmd -O -release -inline
gcc -O2  (mingw 3.4.2)
cl /O2 /GX


Fastest first:

tango.io.Console, no flushing (Andrei's): ca 1.5s

C, reusing buffer, gcc & msvc71: ca 3s

James' C++, gcc: 3.5s

Phobos std.cstream, reused buffer: 11s

C w/malloc and free each line, msvc71: 23s

Andrei's C++, gcc: 27s

C w/malloc and free each line, gcc: 37s

Andrei's C++, msvc71: 50s

James' C++,  msvc: 51s


---
// Tango
import tango.io.Console;

void main() {
   char[] line;

   while (Cin.nextLine(line)) {
     //Cout(line).newline;
     Cout(line)("\n");
   }
}
---

---
// Phobos std.cstream test
import std.cstream;

void main() {
    char[] buf = new char[1000];
    char[] line;
    while (!din.eof()) {
       line = din.readLine(buf);
       dout.writeLine(line);
    }
}
---

---
/* C, reusing buffer */
#include <stdio.h>
#include <stdlib.h>

char buf[1000];

int main() {
    while (fgets(buf, sizeof(buf), stdin)) {
       fputs(buf, stdout);
    }
    return 0;
}
---

---
/* C test w/malloc and free */
#include <stdio.h>
#include <stdlib.h>

int main() {

    char *buf = malloc(1000);
    while (fgets(buf, sizeof(buf), stdin)) {
       fputs(buf, stdout);
       free(buf);
       buf = malloc(1000);
    }
    free(buf);
    return 0;
---

---
// Andrei's
#include <string>
#include <iostream>

int main() {
    std::string s;
    while (getline(std::cin, s)) {
      std::cout << s << '\n';
    }

    return 0;
}
---

---
// James'
#include <string>
#include <iostream>

int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
}
---

Mar 21 2007

torhu <fake address.dude> writes:

torhu wrote:
<snip>
 Fastest first:
 
 tango.io.Console, no flushing (Andrei's): ca 1.5s
 
 C, reusing buffer, gcc & msvc71: ca 3s
 
 James' C++, gcc: 3.5s
 
 Phobos std.cstream, reused buffer: 11s
 
 C w/malloc and free each line, msvc71: 23s
 
 Andrei's C++, gcc: 27s
 
 C w/malloc and free each line, gcc: 37s
 
 Andrei's C++, msvc71: 50s
 
 James' C++,  msvc: 51s

I've run some of the tests with more accurate timing. Andrei's Tango 
code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. 
  I also tried cat itself, from the gnuwin32 project.  cat clocks in at 
1.3 seconds.

Mar 22 2007

kris <foo bar.com> writes:

torhu wrote:
 torhu wrote:
 <snip>
 
 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

 
 
 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. 
  I also tried cat itself, from the gnuwin32 project.  cat clocks in at 
 1.3 seconds.


Just for jollies, a briefly optimized tango.io was tried also: it came 
in at around 0.7 seconds. On a tripled file-size (3 million lines), that 
version is around 23% faster than bog-standard tango.io

Thanks for giving it a whirl, tohru :)


p.s. perhaps Andrei should be using tango for processing those vast 
files he has?

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 torhu wrote:
 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s


 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
 flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
 clocks in at 1.3 seconds.

 
 
 Just for jollies, a briefly optimized tango.io was tried also: it came 
 in at around 0.7 seconds. On a tripled file-size (3 million lines), that 
 version is around 23% faster than bog-standard tango.io

That's great news!

 Thanks for giving it a whirl, tohru :)
 
 
 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

Is it compatible with C's stdio? IOW, would this sequence work?

readln(line);
int c = getchar();

Is 'c' the first character on the next line?


Andrei

Mar 22 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s



 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
 flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
 clocks in at 1.3 seconds.



 Just for jollies, a briefly optimized tango.io was tried also: it came 
 in at around 0.7 seconds. On a tripled file-size (3 million lines), 
 that version is around 23% faster than bog-standard tango.io

 
 
 That's great news!
 
 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?

 
 
 Is it compatible with C's stdio? IOW, would this sequence work?
 
 readln(line);
 int c = getchar();
 
 Is 'c' the first character on the next line?


Nope. Tango is for D, not C. In order to make a arguably better library, 
one often has to step away from the norm. Both yourself and Walter have 
been saying "it needs to be fast and simple", and that's exactly what 
Tango is showing: for those who care deeply about such things, tango.io 
is shown to be around four times faster than the fastest C 
implementation tried (for Andrei's test under Win32), and a notable 
fourteen or fifteen times faster than the shipping phobos equivalent.

If "interaction" between D & C on a shared, global file-handle becomes 
some kind of issue due to buffering (and only if) we'll cross that 
bridge at that point in time. I'm sure there's a number of solutions 
that don't involve restricting D to using a lowest common denominator 
approach. There's lots of smart people here who would be willing to help 
resolve that if necessary.

The tango.io package is intended to be clean, extensible, simple, and a 
whole lot more coherent than certain others. We feel it meets those 
goals, and it happens to be quite efficient at the same time. Seems a 
bit like sour-grapes to start looking for "issues" with that intent, 
particularly when compared to an implementation that proclaims "It peeks 
under the hood of C's stdio implementation, meaning it's customized for 
Digital Mars' stdio, and gcc's stdio" ?

Tango is not meant to be a phobos clone; it doesn't make the same claims 
as phobos and it doesn't follow the same rules as phobos. If you need 
phobos rules, then use phobos. If you don't like tango.io speed, 
extensibility and simplicity, without all the special cases of C IO, 
then use phobos. If you want both then, at some point, we'll consider 
figuring out how to make your C-oriented corner-cases work with tango.io

Walter wrote: "One of my goals with D is to fix that - the 
straightforward, untuned code should get you most of the possible speed."

Andrei wrote: "Just make the clear and simple code fastest. One thing I 
like about D is that it clearly strives to achieve best performance for 
simply-written code."

That sentiment is very much what Tango itself is about.

You began this thread by titling it "stdio and Tango IO performance" and 
noting the following: "has anyone verified that Tango's I/O performance 
is up to snuff? I see it imposes the dynamic-polymorphic approach, and 
unless there was some serious performance work going on, it's possible 
it's even slower than stdio. "

Given the results shown above, I hope we can put that to rest at this time.

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s



 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
 flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
 clocks in at 1.3 seconds.



 Just for jollies, a briefly optimized tango.io was tried also: it 
 came in at around 0.7 seconds. On a tripled file-size (3 million 
 lines), that version is around 23% faster than bog-standard tango.io


 That's great news!

 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?


 Is it compatible with C's stdio? IOW, would this sequence work?

 readln(line);
 int c = getchar();

 Is 'c' the first character on the next line?

 
 
 Nope. Tango is for D, not C. In order to make a arguably better library, 
 one often has to step away from the norm. Both yourself and Walter have 
 been saying "it needs to be fast and simple", and that's exactly what 
 Tango is showing: for those who care deeply about such things, tango.io 
 is shown to be around four times faster than the fastest C 
 implementation tried (for Andrei's test under Win32), and a notable 
 fourteen or fifteen times faster than the shipping phobos equivalent.

That's not what my tests show on Linux, where Perl and readln beat Tango 
by a large margin.

 If "interaction" between D & C on a shared, global file-handle becomes 
 some kind of issue due to buffering (and only if) we'll cross that 
 bridge at that point in time. I'm sure there's a number of solutions 
 that don't involve restricting D to using a lowest common denominator 
 approach. There's lots of smart people here who would be willing to help 
 resolve that if necessary.

Exactly. What I argue for is not adding _gratuitous_ incompatibility. 
I'm seeing that using read instead of getline on Linux does not add any 
speed. They why not use getline and be done with it. Everybody would be 
happy.

 The tango.io package is intended to be clean, extensible, simple, and a 
 whole lot more coherent than certain others. We feel it meets those 
 goals, and it happens to be quite efficient at the same time. Seems a 
 bit like sour-grapes to start looking for "issues" with that intent, 
 particularly when compared to an implementation that proclaims "It peeks 
 under the hood of C's stdio implementation, meaning it's customized for 
 Digital Mars' stdio, and gcc's stdio" ?

I'm not sure understand this. For all it's worth, there's no sour grapes 
in the mix. I *wanted* to switch to Tango to save me future aggravation.

 Tango is not meant to be a phobos clone; it doesn't make the same claims 
 as phobos and it doesn't follow the same rules as phobos. If you need 
 phobos rules, then use phobos. If you don't like tango.io speed, 
 extensibility and simplicity, without all the special cases of C IO, 
 then use phobos. If you want both then, at some point, we'll consider 
 figuring out how to make your C-oriented corner-cases work with tango.io

They aren't C-oriented. They are stream-oriented. It just so happens 
that the OS opens some streams and serves them to you in FILE* format. I 
have programs that read standard input and write to standard output. 
They are extremely easy to combine, parallelize, and run on a cluster. 
After switching form Perl to D for performance considerations, I was in 
a position of a net loss. Then I've been to hell and back figuring what 
the problem was and fixing it. Then I thought, hmmm, maybe I could have 
avoided all that by switching to Tango. So I tried Tango and it was 
again a net loss. Perl's I/O beats Tango's Cin.

 Walter wrote: "One of my goals with D is to fix that - the 
 straightforward, untuned code should get you most of the possible speed."
 
 Andrei wrote: "Just make the clear and simple code fastest. One thing I 
 like about D is that it clearly strives to achieve best performance for 
 simply-written code."
 
 That sentiment is very much what Tango itself is about.
 
 You began this thread by titling it "stdio and Tango IO performance" and 
 noting the following: "has anyone verified that Tango's I/O performance 
 is up to snuff? I see it imposes the dynamic-polymorphic approach, and 
 unless there was some serious performance work going on, it's possible 
 it's even slower than stdio. "
 
 Given the results shown above, I hope we can put that to rest at this time.

Of course you can, it's your library. You look at the results that 
please you most, I look at the results of my concrete application. I 
simply can't afford a 50%+ loss in I/O throughput, so I need to stay 
with Phobos. Why, I don't understand.


Andrei

Mar 22 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 torhu wrote:

 torhu wrote:
 <snip>

 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s




 I've run some of the tests with more accurate timing. Andrei's 
 Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
 flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
 clocks in at 1.3 seconds.




 Just for jollies, a briefly optimized tango.io was tried also: it 
 came in at around 0.7 seconds. On a tripled file-size (3 million 
 lines), that version is around 23% faster than bog-standard tango.io



 That's great news!

 Thanks for giving it a whirl, tohru :)


 p.s. perhaps Andrei should be using tango for processing those vast 
 files he has?



 Is it compatible with C's stdio? IOW, would this sequence work?

 readln(line);
 int c = getchar();

 Is 'c' the first character on the next line?



 Nope. Tango is for D, not C. In order to make a arguably better 
 library, one often has to step away from the norm. Both yourself and 
 Walter have been saying "it needs to be fast and simple", and that's 
 exactly what Tango is showing: for those who care deeply about such 
 things, tango.io is shown to be around four times faster than the 
 fastest C implementation tried (for Andrei's test under Win32), and a 
 notable fourteen or fifteen times faster than the shipping phobos 
 equivalent.

 
 
 That's not what my tests show on Linux, where Perl and readln beat Tango 
 by a large margin.
 
 If "interaction" between D & C on a shared, global file-handle becomes 
 some kind of issue due to buffering (and only if) we'll cross that 
 bridge at that point in time. I'm sure there's a number of solutions 
 that don't involve restricting D to using a lowest common denominator 
 approach. There's lots of smart people here who would be willing to 
 help resolve that if necessary.

 
 
 Exactly. What I argue for is not adding _gratuitous_ incompatibility. 
 I'm seeing that using read instead of getline on Linux does not add any 
 speed. They why not use getline and be done with it. Everybody would be 
 happy.
 
 The tango.io package is intended to be clean, extensible, simple, and 
 a whole lot more coherent than certain others. We feel it meets those 
 goals, and it happens to be quite efficient at the same time. Seems a 
 bit like sour-grapes to start looking for "issues" with that intent, 
 particularly when compared to an implementation that proclaims "It 
 peeks under the hood of C's stdio implementation, meaning it's 
 customized for Digital Mars' stdio, and gcc's stdio" ?

 
 
 I'm not sure understand this. For all it's worth, there's no sour grapes 
 in the mix. I *wanted* to switch to Tango to save me future aggravation.
 
 Tango is not meant to be a phobos clone; it doesn't make the same 
 claims as phobos and it doesn't follow the same rules as phobos. If 
 you need phobos rules, then use phobos. If you don't like tango.io 
 speed, extensibility and simplicity, without all the special cases of 
 C IO, then use phobos. If you want both then, at some point, we'll 
 consider figuring out how to make your C-oriented corner-cases work 
 with tango.io

 
 
 They aren't C-oriented. They are stream-oriented. It just so happens 
 that the OS opens some streams and serves them to you in FILE* format. I 
 have programs that read standard input and write to standard output. 
 They are extremely easy to combine, parallelize, and run on a cluster. 
 After switching form Perl to D for performance considerations, I was in 
 a position of a net loss. Then I've been to hell and back figuring what 
 the problem was and fixing it. Then I thought, hmmm, maybe I could have 
 avoided all that by switching to Tango. So I tried Tango and it was 
 again a net loss. Perl's I/O beats Tango's Cin.
 
 Walter wrote: "One of my goals with D is to fix that - the 
 straightforward, untuned code should get you most of the possible speed."

 Andrei wrote: "Just make the clear and simple code fastest. One thing 
 I like about D is that it clearly strives to achieve best performance 
 for simply-written code."

 That sentiment is very much what Tango itself is about.

 You began this thread by titling it "stdio and Tango IO performance" 
 and noting the following: "has anyone verified that Tango's I/O 
 performance is up to snuff? I see it imposes the dynamic-polymorphic 
 approach, and unless there was some serious performance work going on, 
 it's possible it's even slower than stdio. "

 Given the results shown above, I hope we can put that to rest at this 
 time.

 
 
 Of course you can, it's your library. You look at the results that 
 please you most, I look at the results of my concrete application. I 
 simply can't afford a 50%+ loss in I/O throughput, so I need to stay 
 with Phobos. Why, I don't understand.

Oh, come now. Yesterday Tango was the "fastest" on your machine, and 
today it is not. And you're now claiming a 50% loss in throughput?

I put it to you that you're not being very forthcoming in allowing for 
changes in tango.io to address this anomoly in your timings? Yesterday I 
pointed out where to make the change so that you could try tango without 
the automatic chomp; you didn't bother to do that. There is a change in 
SVN implementing your request, but you're not bothering to try that either.

Instead, you appear to be using empty rhetoric and exaggeration to pit 
one library against another. That's hardly being helpful, Andrei.

Tango has been shown to be very efficient on Win32, and there's no 
reason to assert that it can't be so on linux. We've seen that flush() 
is a no-no for linux, and that it has some impact on Win32 also. That 
can be rectified, as Walter kindly pointed out. If you're serious about 
giving Tango a shot, then give it some time for the different platform 
specifics to be addressed. Is that really too much to ask? Of a beta 
release?

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Oh, come now. Yesterday Tango was the "fastest" on your machine, and 
 today it is not. And you're now claiming a 50% loss in throughput?

Probably it's a misunderstanding. Yesterday the Tango that did not 
output the newlines was fastest. I don't have Tango code to test a 
version that reads lines including the newline, so I tried the 
Cout(line)("\n") thing, which was slow.

I'd be of course happy to use something that is faster, no matter where 
it comes from.

 I put it to you that you're not being very forthcoming in allowing for 
 changes in tango.io to address this anomoly in your timings? Yesterday I 
 pointed out where to make the change so that you could try tango without 
 the automatic chomp; you didn't bother to do that. There is a change in 
 SVN implementing your request, but you're not bothering to try that either.

It's not that I didn't bother; just getting my app to link with Tango 
was hard for me, so recompiling and rebuilding libtango.a was likely to 
take me a long time. Furthermore, I don't have svn installed nor admin 
access on the cluster I work on.

If you put a libtango.a somewhere to be found with http or ftp, I'd be 
glad to download it.

 Instead, you appear to be using empty rhetoric and exaggeration to pit 
 one library against another. That's hardly being helpful, Andrei.
 
 Tango has been shown to be very efficient on Win32, and there's no 
 reason to assert that it can't be so on linux. We've seen that flush() 
 is a no-no for linux, and that it has some impact on Win32 also. That 
 can be rectified, as Walter kindly pointed out. If you're serious about 
 giving Tango a shot, then give it some time for the different platform 
 specifics to be addressed. Is that really too much to ask? Of a beta 
 release?

Of course this is great news. There's only one guy using rhetoric in 
this thread, and that's not me :o).


Andrei

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 I put it to you that you're not being very forthcoming in allowing for 
 changes in tango.io to address this anomoly in your timings? Yesterday 
 I pointed out where to make the change so that you could try tango 
 without the automatic chomp; you didn't bother to do that. There is a 
 change in SVN implementing your request, but you're not bothering to 
 try that either.

 
 It's not that I didn't bother; just getting my app to link with Tango 
 was hard for me, so recompiling and rebuilding libtango.a was likely to 
 take me a long time. Furthermore, I don't have svn installed nor admin 
 access on the cluster I work on.

We're in the process of getting an automated nightly snapshot process 
set up.  The scripts are actually written, and we're sorting out hosting 
and such.  I'm sure someone would be willing to put one online somewhere 
in the interim.  I'll do it myself if I can track down the Linux build 
scripts.


Sean

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Tango is not meant to be a phobos clone; it doesn't make the same claims 
 as phobos and it doesn't follow the same rules as phobos. If you need 
 phobos rules, then use phobos. If you don't like tango.io speed, 
 extensibility and simplicity, without all the special cases of C IO, 
 then use phobos. If you want both then, at some point, we'll consider 
 figuring out how to make your C-oriented corner-cases work with tango.io

I think you'd make a lot of people happy. Several documented attempts of 
installing Tango failed for me, so in the end I figured some way to get 
programs to compile with a special command line and a modification of 
dmd.conf. I need to modify dmd.conf whenever I switch between Phobos 
programs and Tango programs.


Andrei

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of 
 installing Tango failed for me, so in the end I figured some way to get 
 programs to compile with a special command line and a modification of 
 dmd.conf. I need to modify dmd.conf whenever I switch between Phobos 
 programs and Tango programs.

This page describes one way to use Tango and Phobos together: 
http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation  It's 
Win32-oriented, but the approach should be essentially the same for 
Linux.  One issue with install instructions is that the installation 
procedure is in flux as we try to simplify/automate it, and some of the 
documentation is lagging behind.


Sean

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of installing Tango failed for me, so in 
 the end I figured some way to get programs to compile with a special 
 command line and a modification of dmd.conf. I need to modify dmd.conf 
 whenever I switch between Phobos programs and Tango programs.

 
 This page describes one way to use Tango and Phobos together: 
 http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation  It's 
 Win32-oriented, but the approach should be essentially the same for 
 Linux.  One issue with install instructions is that the installation 
 procedure is in flux as we try to simplify/automate it, and some of the 
 documentation is lagging behind.

Here's what worked for me. The script also allows compiling dmd programs 
on the fly. For some reason I needed to include libtango.a in the DFLAGS 
variable.

-----------------------------------------


D_BIN=$(dirname $(which dmd))
WHICH=$1

if [ "$WHICH" = "phobos" ]; then
     DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib 
-L-L$D_BIN/../../dm/lib"
elif [ "$WHICH" = "tango" ]; then
     DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix"
     DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a"
else
     echo "Please pass either phobos or tango as the first argument"
     WHICH=""
fi

if [ ! -z "$WHICH" ]; then
     shift
     if [ "$*" != "" ]; then
	dmd $*
     else
	export DFLAGS
	echo "dmd configured for $WHICH"
     fi
fi
-----------------------------------------


Andrei

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Several documented attempts of installing Tango failed for me, so in 
 the end I figured some way to get programs to compile with a special 
 command line and a modification of dmd.conf. I need to modify 
 dmd.conf whenever I switch between Phobos programs and Tango programs.

 This page describes one way to use Tango and Phobos together: 
 http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation  
 It's Win32-oriented, but the approach should be essentially the same 
 for Linux.  One issue with install instructions is that the 
 installation procedure is in flux as we try to simplify/automate it, 
 and some of the documentation is lagging behind.

 
 Here's what worked for me. The script also allows compiling dmd programs 
 on the fly. For some reason I needed to include libtango.a in the DFLAGS 
 variable.

This is intentional, though it may change later based on user feedback. 
  That said, my personal belief is that only the compiler runtime code 
should be implicitly linked, and the rest should be linked via DFLAGS or 
by some other means.  In Tango parlance, this would mean implicitly 
linking the compiler runtime (libdmd.a), but not the GC code, the Tango 
runtime, or Tango user code.  This is currently quite possible--it just 
isn't the default configuration because it's unnecessarily complex for 
most users.  For those who are interested however, the process is 
outlined here:

http://www.dsource.org/projects/tango/wiki/TopicAdvancedConfiguration

 -----------------------------------------

 
 D_BIN=$(dirname $(which dmd))
 WHICH=$1
 
 if [ "$WHICH" = "phobos" ]; then
     DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib 
 -L-L$D_BIN/../../dm/lib"
 elif [ "$WHICH" = "tango" ]; then
     DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix"
     DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a"
 else
     echo "Please pass either phobos or tango as the first argument"
     WHICH=""
 fi
 
 if [ ! -z "$WHICH" ]; then
     shift
     if [ "$*" != "" ]; then
     dmd $*
     else
     export DFLAGS
     echo "dmd configured for $WHICH"
     fi
 fi
 -----------------------------------------

Thanks.  I'll look this over and see about adding it to the wiki.


Sean

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

torhu wrote:
 torhu wrote:
 <snip>
 Fastest first:

 tango.io.Console, no flushing (Andrei's): ca 1.5s

 C, reusing buffer, gcc & msvc71: ca 3s

 James' C++, gcc: 3.5s

 Phobos std.cstream, reused buffer: 11s

 C w/malloc and free each line, msvc71: 23s

 Andrei's C++, gcc: 27s

 C w/malloc and free each line, gcc: 37s

 Andrei's C++, msvc71: 50s

 James' C++,  msvc: 51s

 
 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. 
  I also tried cat itself, from the gnuwin32 project.  cat clocks in at 
 1.3 seconds.

cat is not comparable. Besides, there must be some overhead associated 
with that cat, because Linux' cat consistently clocks way faster than 
all line-oriented tests.

Andrei

Mar 22 2007

torhu <fake address.dude> writes:

torhu wrote:
 torhu wrote:
 <snip>
 Fastest first:
 
 tango.io.Console, no flushing (Andrei's): ca 1.5s
 
 C, reusing buffer, gcc & msvc71: ca 3s
 
 James' C++, gcc: 3.5s
 
 Phobos std.cstream, reused buffer: 11s
 
 C w/malloc and free each line, msvc71: 23s
 
 Andrei's C++, gcc: 27s
 
 C w/malloc and free each line, gcc: 37s
 
 Andrei's C++, msvc71: 50s
 
 James' C++,  msvc: 51s

 
 I've run some of the tests with more accurate timing. Andrei's Tango 
 code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. 
   I also tried cat itself, from the gnuwin32 project.  cat clocks in at 
 1.3 seconds.

Couple of more results:

ActiveState Perl 5.8.8: 3.8s.
Python 2.5: 3.6s.


cat.py:
---

import sys
sys.stdout.writelines(sys.stdin.xreadlines())


#sys.stdout.writelines(do_stuff_with_each_line(sys.stdin.xreadlines()))

#sys.stdout.writelines(do_stuff_with_each_line(s) for s in sys.stdin)
---

cat.pl:
---

while (<>) {
    print;
}
---

I guess that's enough benchmarking for now.

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

torhu wrote:
 James Dennett wrote:
 <snip>
 Try the way IOStreams would be used if you didn't want
 it to go slowly:

 #include <string>
 #include <iostream>

 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }

 <snip>
 
 
 I did some tests with a 58 MB file, containing one million lines.  I'm 
 on winxp.  I ran each test a few times, timing them with a stopwatch.  I 
 threw in a naive C version, and a std.cstream version, just out of 
 curiousity.
 
 It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with 
 mingw it does.  Basically, Tango wins hands down on my system.  Whether 
 the Tango version flushes after each line or not, doesn't seem to matter 
 much on Windows.

...
 ---
 // Tango
 import tango.io.Console;
 
 void main() {
   char[] line;
 
   while (Cin.nextLine(line)) {
     //Cout(line).newline;
     Cout(line)("\n");
   }
 }
 ---

Oh good.  I was hoping someone would test Tango without flushing every 
line :-)  Basically, Tango's 'newline' method is equivalent to C++'s 
'endl' mutator function.  It should not be used for every carriage 
return in normal output for performance-critical applications.  Rather, 
it should be used as the trailing newline after writing a block of data 
that should be displayed immediately ('flush' is another option if no 
newline is desired).


Sean

Mar 22 2007

torhu <fake address.dude> writes:

torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>
 
 int main() {
 
     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

Whoops, can anyone spot the bug?  When I fixed it, the time it took to 
run my test went down from about 23 to about 3 seconds.

Mar 24 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

torhu wrote:
 torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>

 int main() {

     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

 
 Whoops, can anyone spot the bug?  When I fixed it, the time it took to 
 run my test went down from about 23 to about 3 seconds.

I'm guessing the fact that sizeof(buf) != 1000 ?

Mar 24 2007

torhu <fake address.dude> writes:

Frits van Bommel wrote:
 torhu wrote:
 
 Whoops, can anyone spot the bug?  When I fixed it, the time it took to 
 run my test went down from about 23 to about 3 seconds.

 
 I'm guessing the fact that sizeof(buf) != 1000 ?

I think you were the first to post.  Go buy yourself a lollipop, you've 
earned it.

Mar 25 2007

Sean Kelly <sean f4.ca> writes:

torhu wrote:
 torhu wrote:
 ---
 /* C test w/malloc and free */
 #include <stdio.h>
 #include <stdlib.h>

 int main() {

     char *buf = malloc(1000);
     while (fgets(buf, sizeof(buf), stdin)) {
        fputs(buf, stdout);
        free(buf);
        buf = malloc(1000);
     }
     free(buf);
     return 0;
 ---

 
 Whoops, can anyone spot the bug?  When I fixed it, the time it took to 
 run my test went down from about 23 to about 3 seconds.

The fgets(sizeof(buf)) looks like it could affect read performance a tad :-)


Sean

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

 Can you add a C++ <iostream> to the mix? I think that would be a very
 useful additional data point.

 Obliged. Darn, I had to wait a *lot* longer.

 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

 I seem to remember a trick that puts some more wind into iostream's
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

 
 Try the way IOStreams would be used if you didn't want
 it to go slowly:
 
 #include <string>
 #include <iostream>
 
 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }
 
 (Excuse the lack of a using directive there; I find the
 code more readable without them.  YMMV.)

With your code pasted and wind from behind:

13.5s		cppcat

 I don't have your sample file or your machine, but for
 the quick tests I just ran on this one machine, the code
 above runs move than 60% faster.  Without using tie(),
 each read from standard input causes a flush of standard
 output (so that, by default, they work appropriately for
 console I/O).
 
 It's certainly true that making efficient use of IOStreams
 needs some specific knowledge, and that writing an
 efficient implementation of IOStreams is far from trivial.
 But if we're comparing to C++, we should probably compare
 to some reasonably efficient idiomatic C++.

The sync_with_stdio and tie tricks are already unknown to most 
programmers, so it would be an uphill battle to characterize them as 
idiomatic. They are idiomatic for a small group at best.

But, obviously not enough. Perl does way better.

(Again: gcc on Linux.)


Andrei

Mar 22 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the
 coming release), and Tango.

 Can you add a C++ <iostream> to the mix? I think that would be a very
 useful additional data point.

 Obliged. Darn, I had to wait a *lot* longer.

 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

 (C++ makes the same mistake wrt newline.)

 35.7s        cppcat

 I seem to remember a trick that puts some more wind into iostream's
 sails, so I tried that as well:

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   cin.sync_with_stdio(false);
   cout.sync_with_stdio(false);
   string s;
   while (getline(std::cin, s)) {
     cout << s << '\n';
   }
 }

 Result:

 13.3s        cppcat

 Try the way IOStreams would be used if you didn't want
 it to go slowly:

 #include <string>
 #include <iostream>

 int main() {
     std::ios_base::sync_with_stdio(false);
     std::cin.tie(NULL);
     std::string s;
     while (std::getline(std::cin, s)) {
         std::cout << s << '\n';
     }
 }

 (Excuse the lack of a using directive there; I find the
 code more readable without them.  YMMV.)

 
 With your code pasted and wind from behind:
 
 13.5s        cppcat

Blasted weather.  Never a hurricane when you need one.

 I don't have your sample file or your machine, but for
 the quick tests I just ran on this one machine, the code
 above runs move than 60% faster.  Without using tie(),
 each read from standard input causes a flush of standard
 output (so that, by default, they work appropriately for
 console I/O).

 It's certainly true that making efficient use of IOStreams
 needs some specific knowledge, and that writing an
 efficient implementation of IOStreams is far from trivial.
 But if we're comparing to C++, we should probably compare
 to some reasonably efficient idiomatic C++.

 
 The sync_with_stdio and tie tricks are already unknown to most
 programmers, so it would be an uphill battle to characterize them as
 idiomatic. They are idiomatic for a small group at best.

IOStreams is a terrible chunk of library design, and
its effective use is fiendishly difficult even for
fairly trivial tasks.  I've implemented large chunks
of the C++ standard library, but IOStreams scares me.

 But, obviously not enough. Perl does way better.
 
 (Again: gcc on Linux.)

Most of the time I do large text processing jobs in
Perl or inside a database; once in a while I use C++,
primarily if I need to do trickier calculations.
No good reason D shouldn't be able to handle the
jobs I use C++ for in this area (though I'd have
to get D working on Solaris, and 64-bit support
would probably be necessary).

-- James

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
[snip]
 Most of the time I do large text processing jobs in
 Perl or inside a database; once in a while I use C++,
 primarily if I need to do trickier calculations.
 No good reason D shouldn't be able to handle the
 jobs I use C++ for in this area (though I'd have
 to get D working on Solaris, and 64-bit support
 would probably be necessary).

Indeed. Then you'll be glad to hear that D will soon accommodate smarter 
string literals and probably here-documents, all with interpolation, 
which should make scripting jobs a snap.


Andrei

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[...]
 
 #include <string>
 #include <iostream>
 
 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

The portable way to write a newline in C++ is to use the 'endl'
modifier.
Your program is not portable, on Windows it will generate Unix text files.

Ciao

Mar 22 2007

torhu <fake address.dude> writes:

Roberto Mariottini wrote:
<snip>
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text files.
 
 Ciao

Unless a file is opened in binary mode, '\n' will be translated into 
'\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii 
(not binary) mode.

Mar 22 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

torhu wrote:
 Unless a file is opened in binary mode, '\n' will be translated into
 '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
 (not binary) mode.

But I don't think this is the case in Tango, so Cout(line)("\n") should also be
changed for the benchmarks.

Mar 22 2007

kris <foo bar.com> writes:

Deewiant wrote:
 torhu wrote:
 
Unless a file is opened in binary mode, '\n' will be translated into
'\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
(not binary) mode.

 
 
 But I don't think this is the case in Tango, so Cout(line)("\n") should also be
 changed for the benchmarks.

At the behest of andrei, Cin line-parsing now has an option to include 
the incoming line-terminator. That makes the "\n" somewhat redundant?

Mar 22 2007

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

kris wrote:
 Deewiant wrote:
 torhu wrote:

 Unless a file is opened in binary mode, '\n' will be translated into
 '\r\n' on Windows.  And stdin, stdout, stderr is by default in ascii
 (not binary) mode.


 But I don't think this is the case in Tango, so Cout(line)("\n")
 should also be
 changed for the benchmarks.

 
 At the behest of andrei, Cin line-parsing now has an option to include
 the incoming line-terminator. That makes the "\n" somewhat redundant?

Only if you've got the latest SVN revision of Tango. If not, use
tango.io.FileConst.NewlineString (side note: for easier access, perhaps
Print.Eol should be public and assigned to this) in place of "\n".

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [...]
 #include <string>
 #include <iostream>

 int main() {
   std::string s;
   while (getline(std::cin, s)) {
     std::cout << s << '\n';
   }
 }

 
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text files.

Wrong. Newline translation will be correct on both systems.

Andrei

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text 
 files.

 
 Wrong. Newline translation will be correct on both systems.

It depends on how you open the file: 'endl' works even with files open 
in binary mode (the default on most platforms, the default on the 
average programmer).

Or else, say that 'endl' is yet another design error in C++.

Ciao

Mar 23 2007

James Dennett <jdennett acm.org> writes:

Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:
 The portable way to write a newline in C++ is to use the 'endl'
 modifier.
 Your program is not portable, on Windows it will generate Unix text
 files.

 Wrong. Newline translation will be correct on both systems.

 
 It depends on how you open the file: 'endl' works even with files open
 in binary mode (the default on most platforms, the default on the
 average programmer).
 
 Or else, say that 'endl' is yet another design error in C++.
 
 Ciao

The difference between '\n' and std::endl in C++ is only
that std::endl flushes the stream after writing a newline
(well, and uses widen to convert to the character type of
the stream, but binary mode makes no difference to that,
it's a property of the template parameters of the stream
type to which you are writing).

C++ doesn't default to binary mode, though on many
platforms that's of academic concern only as there is
no distinction between text and binary modes.

And this is somewhat off-topic for d.D, I think, except
in that we'd like D's IO to be better than C++'s.

-- James

Mar 23 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
 release), and Tango.
 
 First, I realize I should make an account on dsource.org and post the 
 following there, but I'll mention here that it's quite disappointing 
 that Tango's idiomatic method of reading a line from the console 
 (Cin.nextLine(line) unless I missed something) chose to chop the newline 
 automatically. The Perl book spends half a page or so explaining why 
 it's _good_ that the newline is included in the line, and I've been 
 thankful for that on numerous occasions when writing Perl. Please put 
 the newline back in the line.
 
 Anyhow, here's the code. The D up-and-coming stdio version:
 
 import std.stdio;
 void main() {
   char[] line;
   while (readln(line)) {
     write(line);
   }
 }
 
 The Tango version:
 
 import tango.io.Console;
 void main() {
   char[] line;
   while (Cin.nextLine(line)) {
     Cout(line).newline;
   }
 }
 
 (The .newline adds back the information that nextLine promptly lost, 
 sigh.) I'm not sure whether this is the idiomatic way of reading and 
 writing lines in Tango, but tango.io.Stdout seems to say so: "If you 
 don't need formatted output or unicode translation, consider using the 
 module tango.io.Console directly." - which suggests that Console would 
 be the most primitive stdio library.
 
 The Perl version:

 while (<>) {
   print;
 }
 
 All programs operate in the same exact boring way: read a line from 
 stdin, print it, lather, rinse, repeat.
 
 I passed a 31 MB text file (containing a dictionary that I'm using in my 
 research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):
 
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio


There's a couple of things to look at here:

1) if there's an idiom in tango.io, it would be rewriting the example 
like this:  Cout.conduit.copy (Cin.conduit)

2) the output.newline on each line will cause a flush ~ this may or may 
not have something to do with it

3) the test would appear to be stressing the parsing of lines just as 
much (if not more) than the io system itself. All part-and-parcel to a 
degree, but it may be worth investigating

In order to track this down, we'd be interested to see the results of:

a) Cout.conduit.copy (Cin.conduit);

b) foregoing the output .newline, purely as an experiment

c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Just a head's up: Console is not the lowest IO level. It wraps both a 
streaming-buffer and console idioms around the raw IO. Raw IO in tango 
is based around two virtual methods: read(void[]) and write(void[])

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

 
 
 There's a couple of things to look at here:
 
 1) if there's an idiom in tango.io, it would be rewriting the example 
 like this:  Cout.conduit.copy (Cin.conduit)

The test code assumed taking a look at each line before printing it, so 
speed of line reading and writing was deemed as important, not speed of 
raw I/O, which we all know how to get.

 2) the output.newline on each line will cause a flush ~ this may or may 
 not have something to do with it

Probably.

 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to a 
 degree, but it may be worth investigating

I don't understand this.

 In order to track this down, we'd be interested to see the results of:
 
 a) Cout.conduit.copy (Cin.conduit);

The program wouldn't be comparable with the others.

 b) foregoing the output .newline, purely as an experiment

4.7s	tcat

 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
 what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Then probably that could be filed as a bug in Tango. The nextLine 
function should lock the file only once, thus giving each thread an 
entire line, not a portion of a line. Also, using block-oriented read 
for reading lines makes Tango incompatible with standard C usage (Tango 
might read more than one line into its buffers; if a C-level function 
tries to read from the file, it will be too late). Unfortunately there's 
no a public API for such stuff so system-specific approaches must be 
taken. readln on Linux uses Gnu's getline(), which locks the file only 
once per line. See:

http://www.gnu.org/software/libc/manual/html_node/Line-Input.html

Unfortunately there's one extra copy going on - from the mallocated 
buffer into D's gc'd array. That copy could be optimized away by using 
Gnu's malloc hooks:

http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html


Andrei

Mar 21 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio



 There's a couple of things to look at here:

 1) if there's an idiom in tango.io, it would be rewriting the example 
 like this:  Cout.conduit.copy (Cin.conduit)

 
 The test code assumed taking a look at each line before printing it, so 
 speed of line reading and writing was deemed as important, not speed of 
 raw I/O, which we all know how to get.

Yep, just trying to isolate things

 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to a 
 degree, but it may be worth investigating

 
 
 I don't understand this.

Just suggesting that the scanning for [\r]\n patterns is likely a good 
chunk of the CPU time

 b) foregoing the output .newline, purely as an experiment

 
 
 4.7s    tcat

Thanks. If tango.io were to retain CR on readln, then it would come out 
ahead of everything else in this particular test

Can you distill the benefits of retaining CR on a readline, please?

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio



 There's a couple of things to look at here:

 1) if there's an idiom in tango.io, it would be rewriting the example 
 like this:  Cout.conduit.copy (Cin.conduit)

 The test code assumed taking a look at each line before printing it, 
 so speed of line reading and writing was deemed as important, not 
 speed of raw I/O, which we all know how to get.

 
 Yep, just trying to isolate things
 
 3) the test would appear to be stressing the parsing of lines just as 
 much (if not more) than the io system itself. All part-and-parcel to 
 a degree, but it may be worth investigating


 I don't understand this.

 
 Just suggesting that the scanning for [\r]\n patterns is likely a good 
 chunk of the CPU time
 
 b) foregoing the output .newline, purely as an experiment


 4.7s    tcat

 
 Thanks. If tango.io were to retain CR on readln, then it would come out 
 ahead of everything else in this particular test

Well probably but must be tested. Newlines comprise about 3% of the file 
size.

 Can you distill the benefits of retaining CR on a readline, please?

I am pasting fragments from an email to Walter. He suggested this at a 
point, and I managed to persuade him to keep the newline in there.

Essentially it's about information. The naive loop:

while (readln(line)) {
   write(line);
}

is guaranteed 100% to produce an accurate copy of its input. The version 
that chops lines looks like:

while (readln(line)) {
   writeln(line);
}

This may or may not add a newline to the output, possibly creating a 
file larger by one byte. This is the kind of imprecision that makes the 
difference between a well-designed API and an almost-good one. Moreover, 
with the automated chopping it is basically impossible to write a 
program that exactly reproduces its input because readln essentially 
loses information.

Also, stdio also offers a readln() that creates a new line on every 
call. That is useful if you want fresh lines every read:

char[] line;
while ((line = readln()).length > 0) {
   ++dictionary[line];
}

The code _just works_ because an empty line means _precisely_ and 
without the shadow of a doubt that the file has ended. (An I/O error 
throws an exception, and does NOT return an empty line; that is another 
important point.) An API that uses automated chopping should not offer 
such a function because an empty line may mean that an empty line was 
read, or that it's eof time. So the API would force people to write 
convoluted code.

In the couple of years I've used Perl I've thanked the Perl folks for 
their readline decision numerous times.

Ever tried to do cin or fscanf? You can't do any intelligent input with 
them because they skip whitespace and newlines like it's out of style. 
All of my C++ applications use getline() or fgets() (both of which 
thankfully do include the newline) and then process the line in-situ.


Andrei

Mar 21 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[snip]

 4.7s    tcat


 Thanks. If tango.io were to retain CR on readln, then it would come 
 out ahead of everything else in this particular test

 
 
 Well probably but must be tested. Newlines comprise about 3% of the file 
 size.

Yeah, I can imagine. Module tango.io.Console at line 119 should have a 
slice in it ... if you change 'j' to be 'i+1' instead, that should 
remove the chop

Tango should still come out in front, although I have to say that 
benchmarks don't really tell very much in general i.e. doesn't mean much 
of anything important whether tango "wins" this or not (IMO)

Having said that, I'm very glad you ran this since it shows how much 
overhead there is in a flush operation (on *nix) that's very useful to know


 
 Can you distill the benefits of retaining CR on a readline, please?

 
 
 I am pasting fragments from an email to Walter. He suggested this at a 
 point, and I managed to persuade him to keep the newline in there.
 
 Essentially it's about information. The naive loop:
 
 while (readln(line)) {
   write(line);
 }
 
 is guaranteed 100% to produce an accurate copy of its input. The version 
 that chops lines looks like:
 
 while (readln(line)) {
   writeln(line);
 }
 
 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte. This is the kind of imprecision that makes the 
 difference between a well-designed API and an almost-good one. Moreover, 
 with the automated chopping it is basically impossible to write a 
 program that exactly reproduces its input because readln essentially 
 loses information.


That's a valid point

[snip]

Mar 21 2007

Walter Bright <newshound digitalmars.com> writes:

kris wrote:
 Having said that, I'm very glad you ran this since it shows how much 
 overhead there is in a flush operation (on *nix) that's very useful to know

The flush on newline should only be done if isatty() returns !=0.

Mar 21 2007

kris <foo bar.com> writes:

Walter Bright wrote:
 kris wrote:
 
 Having said that, I'm very glad you ran this since it shows how much 
 overhead there is in a flush operation (on *nix) that's very useful to 
 know

 
 
 The flush on newline should only be done if isatty() returns !=0.

yep; if you were to submit a ticket for that, it would be appreciated :)

http://www.dsource.org/projects/tango/newticket

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 
 4.7s    tcat


 Thanks. If tango.io were to retain CR on readln, then it would come 
 out ahead of everything else in this particular test


 Well probably but must be tested. Newlines comprise about 3% of the 
 file size.

 
 Yeah, I can imagine. Module tango.io.Console at line 119 should have a 
 slice in it ... if you change 'j' to be 'i+1' instead, that should 
 remove the chop

Yum.

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean much 
 of anything important whether tango "wins" this or not (IMO)

Why not? Programs using the standard input and output are ubiquitous, 
efficient, and extremely easy to combine. I write them all the time for 
processing huge amounts of data.

I didn't run the tests willy-nilly. I had a Perl script that took a 
night to run (it scrambles through some 20 GB of data), so I decided to 
give D a shot. The D equivalent was two times slower. With the new 
readln, it takes 98 minutes; parallelized, it is hand over fist another 
five times faster (which was impossible in the previous version because 
it used 98% CPU).

I was actually surprised that nobody noticed phobos' low I/O speed in 
years. It's a maker or breaker for me and many others.

If there's any chance that automated chopping could be removed from 
Tango, that would be awesome. Also it would be great to fix the 
incompatibility created by using read/write instead of getline.


Andrei

Mar 21 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

Most programs I run that do lots of I/O only take seconds to run, so if
they run 50% slower or faster, not only wouldn't I notice, I wouldn't care.
Taking a sip of coffee takes longer than that. 

That is why I haven't noticed. (Maybe I should continue working on my
mini-DataBase library project and give it a good "real world" workout <G>)

By the way, I do appreciate you doing this performance comparison and
improving Phobos' I/O routine.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 11:19:16 AM

Mar 21 2007

Davidl <Davidl 126.com> writes:

u r working on database?
i have a feeling that SQL ain't really suitable for
databse-related development, any better idea?

 On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For
 Email) wrote:

 I was actually surprised that nobody noticed phobos' low I/O speed in
 years. It's a maker or breaker for me and many others.

 Most programs I run that do lots of I/O only take seconds to run, so if
 they run 50% slower or faster, not only wouldn't I notice, I wouldn't  
 care.
 Taking a sip of coffee takes longer than that.

 That is why I haven't noticed. (Maybe I should continue working on my
 mini-DataBase library project and give it a good "real world" workout  
 <G>)

 By the way, I do appreciate you doing this performance comparison and
 improving Phobos' I/O routine.

Mar 21 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Thu, 22 Mar 2007 10:22:28 +0800, Davidl wrote:

 u r working on database?
 i have a feeling that SQL ain't really suitable for
 databse-related development, any better idea?

Yep. A light-weight, single-user D/B suitable for "home" applications. 

It has its own DSL so I'm hoping to eventually use some of D's new mixin
goodies to help generate optimal code from high-level Database statements.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 1:49:41 PM

Mar 21 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

[snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)

 
 
 Why not? 

If tango were terribly terribly slow instead, then it would be cause for 
concern. If I have some program that needs to run faster I'll find a way 
to do just that; another reason why tango.io is fairly modular


[snip]

 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

That assumes IO performance wasn't brought up as an issue before ;)


 If there's any chance that automated chopping could be removed from 
 Tango, that would be awesome. Also it would be great to fix the 
 incompatibility created by using read/write instead of getline.

Sure; could you submit a ticket for it, please, lest it fall by the 
wayside?

http://www.dsource.org/projects/tango/newticket

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 [snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)


 Why not? 

 
 If tango were terribly terribly slow instead, then it would be cause for 
 concern. If I have some program that needs to run faster I'll find a way 
 to do just that; another reason why tango.io is fairly modular

That's great, but by and large, the attitude that "this is the simple 
version; if you want performance, you gotta work for it" is precisely 
what I don't like about certain languages and APIs. This is, for 
example, why not everybody really condemns C++ iostreams in spite of 
them being a pinnacle of counter-performance in any contest, be it 
beauty, size, or speed. People know that C++ can do fast I/O and are 
driven by the attitude that you gotta work for it - there's no other way.

Just make the clear and simple code fastest. One thing I like about D is 
that it clearly strives to achieve best performance for simply-written code.

 [snip]
 
 I was actually surprised that nobody noticed phobos' low I/O speed in 
 years. It's a maker or breaker for me and many others.

 
 That assumes IO performance wasn't brought up as an issue before ;)
 
 
 If there's any chance that automated chopping could be removed from 
 Tango, that would be awesome. Also it would be great to fix the 
 incompatibility created by using read/write instead of getline.

 
 Sure; could you submit a ticket for it, please, lest it fall by the 
 wayside?
 
 http://www.dsource.org/projects/tango/newticket

For the \n, read/write, or both? :o)


Andrei

Mar 21 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 [snip]

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)



 Why not? 


 If tango were terribly terribly slow instead, then it would be cause 
 for concern. If I have some program that needs to run faster I'll find 
 a way to do just that; another reason why tango.io is fairly modular

 
 
 That's great, but by and large, the attitude that "this is the simple 
 version; if you want performance, you gotta work for it" is precisely 
 what I don't like about certain languages and APIs. This is, for 
 example, why not everybody really condemns C++ iostreams in spite of 
 them being a pinnacle of counter-performance in any contest, be it 
 beauty, size, or speed. People know that C++ can do fast I/O and are 
 driven by the attitude that you gotta work for it - there's no other way.
 
 Just make the clear and simple code fastest. One thing I like about D is 
 that it clearly strives to achieve best performance for simply-written 
 code.

Oh, if there's any implication that Tango ought to be "faster" than it 
is, then I suspect you're being unjust, Andrei. You'll be hard pressed 
to find, for example, some routine that hits the heap where that should 
be avoided. The library was built to avoid such pitfalls

That aside, tango.io appears to be fast enough and simple enough. The 
fastest in this case, even, assuming we do something useful about the CR 
chop, .newline is adjusted, or "\n" is used instead ;)


[snip]
 Sure; could you submit a ticket for it, please, lest it fall by the 
 wayside?

 http://www.dsource.org/projects/tango/newticket

 
 
 For the \n, read/write, or both? :o)

Both, if you prefer?

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 [snip]

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)



 Why not? 


 If tango were terribly terribly slow instead, then it would be cause 
 for concern. If I have some program that needs to run faster I'll 
 find a way to do just that; another reason why tango.io is fairly 
 modular


 That's great, but by and large, the attitude that "this is the simple 
 version; if you want performance, you gotta work for it" is precisely 
 what I don't like about certain languages and APIs. This is, for 
 example, why not everybody really condemns C++ iostreams in spite of 
 them being a pinnacle of counter-performance in any contest, be it 
 beauty, size, or speed. People know that C++ can do fast I/O and are 
 driven by the attitude that you gotta work for it - there's no other way.

 Just make the clear and simple code fastest. One thing I like about D 
 is that it clearly strives to achieve best performance for 
 simply-written code.

 
 Oh, if there's any implication that Tango ought to be "faster" than it 
 is, then I suspect you're being unjust, Andrei. You'll be hard pressed 
 to find, for example, some routine that hits the heap where that should 
 be avoided. The library was built to avoid such pitfalls
 
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the CR 
 chop, .newline is adjusted, or "\n" is used instead ;)

Do it and let's test.

Andrei

Mar 22 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

[snip]
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

 
 
 Do it and let's test.

you can try it right now with a Cout(line)("\n");

The option to eschew the chop is checked in also. You'll perhaps see 
from the Win32 tests that tango.io is pretty darned fast anyway?

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 [snip]
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)


 Do it and let's test.

 
 you can try it right now with a Cout(line)("\n");
 
 The option to eschew the chop is checked in also. You'll perhaps see 
 from the Win32 tests that tango.io is pretty darned fast anyway?

On my Linux box:

import tango.io.Console;
void main()
{
   char[] line;
   while (Cin.nextLine(line)) {
     Cout(line)("\n");
   }
}

7.8s		tcat


Andrei

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 [snip]

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)



 Why not? 


 If tango were terribly terribly slow instead, then it would be cause 
 for concern. If I have some program that needs to run faster I'll 
 find a way to do just that; another reason why tango.io is fairly 
 modular


 That's great, but by and large, the attitude that "this is the simple 
 version; if you want performance, you gotta work for it" is precisely 
 what I don't like about certain languages and APIs. This is, for 
 example, why not everybody really condemns C++ iostreams in spite of 
 them being a pinnacle of counter-performance in any contest, be it 
 beauty, size, or speed. People know that C++ can do fast I/O and are 
 driven by the attitude that you gotta work for it - there's no other way.

 Just make the clear and simple code fastest. One thing I like about D 
 is that it clearly strives to achieve best performance for 
 simply-written code.

 
 Oh, if there's any implication that Tango ought to be "faster" than it 
 is, then I suspect you're being unjust, Andrei. You'll be hard pressed 
 to find, for example, some routine that hits the heap where that should 
 be avoided. The library was built to avoid such pitfalls
 
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the CR 
 chop, .newline is adjusted, or "\n" is used instead ;)

Oh, but I forgot it's cheating: uses read/write so it's incompatible 
with C's stdio, which phobos is.

Andrei

Mar 22 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[snip]
 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

How can it possibly be "cheating" when the code was in place before you 
contrived this test ;)

I think you have to stretch a bit to find some /common/ and truly valid 
cases where what you refer to is important enough to warrant such 
attention.

If it truly were to become an issue (people actually run into problems 
with it on a regular basis) then tango.io could be changed to 
special-case this kind of thing; but at this time we prefer to avoid 
such things and adhere to the KISS principal instead.

FWIW, tango.io could trivially be sped up significantly on this 'test' 
-- as it stands, the implementation is quite pedestrian in nature ;)

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]
 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

 
 How can it possibly be "cheating" when the code was in place before you 
 contrived this test ;)
 
 I think you have to stretch a bit to find some /common/ and truly valid 
 cases where what you refer to is important enough to warrant such 
 attention.
 
 If it truly were to become an issue (people actually run into problems 
 with it on a regular basis) then tango.io could be changed to 
 special-case this kind of thing; but at this time we prefer to avoid 
 such things and adhere to the KISS principal instead.

"Principle" I guess. That sounds great. My opinion in the matter is 
simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives 
the programmer full access to them. It would be only nice, if it does 
not cost too much, to not be gratuitously incompatible with them. That's 
all. If you want to take the other route, you better disable access to 
C's getchar et al.

 FWIW, tango.io could trivially be sped up significantly on this 'test' 
 -- as it stands, the implementation is quite pedestrian in nature ;)

The 'test' is not a 'test', it's a test deriving from my attempts to 
find the bottleneck in a real D program.


Andrei

Mar 22 2007

kris <foo bar.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:
 [snip]

 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.


 How can it possibly be "cheating" when the code was in place before 
 you contrived this test ;)

 I think you have to stretch a bit to find some /common/ and truly 
 valid cases where what you refer to is important enough to warrant 
 such attention.

 If it truly were to become an issue (people actually run into problems 
 with it on a regular basis) then tango.io could be changed to 
 special-case this kind of thing; but at this time we prefer to avoid 
 such things and adhere to the KISS principal instead.

 
 
 "Principle" I guess. 

Yep. A thousand pardons for my late night spelling mistake. I'll be sure 
to reciprocate in future also, if that would be helpful?


 That sounds great. My opinion in the matter is 
 simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives 
 the programmer full access to them. It would be only nice, if it does 
 not cost too much, to not be gratuitously incompatible with them. That's 
 all. If you want to take the other route, you better disable access to 
 C's getchar et al.

Yes, thanks for that option. It is certainly one approach that has been 
considered before, and a trivial one to implement. We'll probably cross 
that bridge when we reach it. It's worth noting, however, that Tango is 
focused for usage with D programs; not C

BTW: the use of gratuitous here is wholly out of context; some might 
interpret the usage as an implication that Tango is based upon a whim ;)

 
 FWIW, tango.io could trivially be sped up significantly on this 'test' 
 -- as it stands, the implementation is quite pedestrian in nature ;)

 
 
 The 'test' is not a 'test', it's a test deriving from my attempts to 
 find the bottleneck in a real D program.

It's being referred to as a "benchmark" Andrei; I was trying to be 
somewhat less political by calling it a 'test'. Many pardons

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

 
 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

If I understand you correctly, you're saying that all IO packages must 
go through the standard C library so they stay in sync with the C IO 
routines?  What is the point of read/write, ReadFile/WriteFile, etc, then?


Sean

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:
 That aside, tango.io appears to be fast enough and simple enough. The 
 fastest in this case, even, assuming we do something useful about the 
 CR chop, .newline is adjusted, or "\n" is used instead ;)

 Oh, but I forgot it's cheating: uses read/write so it's incompatible 
 with C's stdio, which phobos is.

 
 If I understand you correctly, you're saying that all IO packages must 
 go through the standard C library so they stay in sync with the C IO 
 routines?  What is the point of read/write, ReadFile/WriteFile, etc, then?

I think for stdio, going through the standard C library would be very 
advisable. If, on the other hand, a library chooses to implement a file 
abstraction not exposing FILE*, it could use whichever means.

Andrei

Mar 22 2007

Walter Bright <newshound digitalmars.com> writes:

kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 [snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)


 Why not? 

 
 If tango were terribly terribly slow instead, then it would be cause for 
 concern. If I have some program that needs to run faster I'll find a way 
 to do just that; another reason why tango.io is fairly modular

One problem with C++, as I mentioned before, is that the 
straightforward, out of the box coding techniques don't get you fast 
code. One of my goals with D is to fix that - the straightforward, 
untuned code should get you most of the possible speed. I think the wc 
benchmark shows this off.

Having to recode one's programs to speed them up is a big productivity 
sapper. (The most egregious examples of this are people forced to recode 
bits of their python/java/ruby app in C++.)

What makes stdio so worth the effort to speed up is because the payoff 
is evident in 80-90% of the programs out there. Optimizing your own 
program speeds up only your own program - optimizing the library speeds 
everyone up.

Tango doesn't need to be terribly, terribly slow to be a cause for 
concern. It only needs to be slower than C++/Perl/Java to be a problem, 
because then it is a convenient excuse for people to not switch to D.

The conventional wisdom with C++ is that:

1) C++ code is inherently faster than in any other language
2) iostream has a great design
3) iostream is uber fast because it uses templates to inline everything

Andrei's benchmark blows that out of the water. Even interpreted Perl 
beats the pants off of C++ iostreams.

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 kris wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 kris wrote:

 [snip]
 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)


 Why not? 

 If tango were terribly terribly slow instead, then it would be cause 
 for concern. If I have some program that needs to run faster I'll find 
 a way to do just that; another reason why tango.io is fairly modular

 
 One problem with C++, as I mentioned before, is that the 
 straightforward, out of the box coding techniques don't get you fast 
 code. One of my goals with D is to fix that - the straightforward, 
 untuned code should get you most of the possible speed. I think the wc 
 benchmark shows this off.

You could tell from this and my (almost identical) post that Walter's 
propaganda got me thoroughly brainwashed :o).

Andrei

Mar 21 2007

kris <foo bar.com> writes:

Walter Bright wrote:
 kris wrote:
 
 Andrei Alexandrescu (See Website For Email) wrote:

 kris wrote:

 [snip]

 Tango should still come out in front, although I have to say that 
 benchmarks don't really tell very much in general i.e. doesn't mean 
 much of anything important whether tango "wins" this or not (IMO)



 Why not? 


 If tango were terribly terribly slow instead, then it would be cause 
 for concern. If I have some program that needs to run faster I'll find 
 a way to do just that; another reason why tango.io is fairly modular

 
 
 One problem with C++, as I mentioned before, is that the 
 straightforward, out of the box coding techniques don't get you fast 
 code. One of my goals with D is to fix that - the straightforward, 
 untuned code should get you most of the possible speed. I think the wc 
 benchmark shows this off.
 
 Having to recode one's programs to speed them up is a big productivity 
 sapper. (The most egregious examples of this are people forced to recode 
 bits of their python/java/ruby app in C++.)
 
 What makes stdio so worth the effort to speed up is because the payoff 
 is evident in 80-90% of the programs out there. Optimizing your own 
 program speeds up only your own program - optimizing the library speeds 
 everyone up.
 
 Tango doesn't need to be terribly, terribly slow to be a cause for 
 concern. It only needs to be slower than C++/Perl/Java to be a problem, 
 because then it is a convenient excuse for people to not switch to D.
 
 The conventional wisdom with C++ is that:
 
 1) C++ code is inherently faster than in any other language
 2) iostream has a great design
 3) iostream is uber fast because it uses templates to inline everything
 
 Andrei's benchmark blows that out of the water. Even interpreted Perl 
 beats the pants off of C++ iostreams.


tango.io is not even optimized for this case (unlike the new Phobos 
code), and yet it is still faster than all others once the flush() is 
removed?

The earlier point is only that optimization can easily be premature and 
misguided; typically better to get a flexible and effective design 
instead. This should not have given anyone cause to assume, assert, or 
imply that tango is in any way inefficient -- apparently that needs to 
be clarified ;)

For the record, my perspective of "terribly, terribly slow" is pretty 
much where C++ landed in this particular case

Mar 21 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:

[snip]

 The conventional wisdom with C++ is that:
 
 1) C++ code is inherently faster than in any other language
 2) iostream has a great design
 3) iostream is uber fast because it uses templates to inline everything
 
 Andrei's benchmark blows that out of the water. Even interpreted Perl
 beats the pants off of C++ iostreams.

This kind of simplistic bashing of a language or library
design based on testing of some unnamed implementation(s)
of that library doesn't give D a good image.  There are
other benchmarks that show C++ IOStreams beating C's
stdio on performance.  Those are also meaningless out of
context.

There are real issues with some of the design of IOStreams.
There are very real problems with many implementations of
IOStreams.  There are also good implementations that
perform pretty well, but overall IOStreams is not widely
viewed in the C++ community as "having a great design",
just as having a design that's OK, and a lot safer and
more cleanly extensible than C's stdio.

Of course C++ code isn't inherently faster than any other
language, and I've not come across anyone saying that it
is.  And one of the main problems with IOStreams is that
it makes excessive use of virtual functions in ways that
inhibit inlining, particularly in typical implementations
which drag in locale support even for programs that do
not use it.  The C++ community recognizes these problems,
and the C++ committee has addressed some of them (through
exposition) in its Technical Report on C++ performance.

I'm at a loss to understand why you would write what you
did.  It seems to be a straw man, but maybe there was
something else to it -- frustration that people assume
that D must be slower than C++?

-- James

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 Walter Bright wrote:
 
 [snip]
 
 The conventional wisdom with C++ is that:

 1) C++ code is inherently faster than in any other language
 2) iostream has a great design
 3) iostream is uber fast because it uses templates to inline everything

 Andrei's benchmark blows that out of the water. Even interpreted Perl
 beats the pants off of C++ iostreams.

 
 This kind of simplistic bashing of a language or library
 design based on testing of some unnamed implementation(s)
 of that library doesn't give D a good image.  There are
 other benchmarks that show C++ IOStreams beating C's
 stdio on performance.  Those are also meaningless out of
 context.

For the record, I used gcc 4.1.2 20060928 (prerelease) (Ubuntu 
4.1.1-13ubuntu5).

 There are real issues with some of the design of IOStreams.
 There are very real problems with many implementations of
 IOStreams.  There are also good implementations that
 perform pretty well, but overall IOStreams is not widely
 viewed in the C++ community as "having a great design",
 just as having a design that's OK, and a lot safer and
 more cleanly extensible than C's stdio.
 
 Of course C++ code isn't inherently faster than any other
 language, and I've not come across anyone saying that it
 is.  And one of the main problems with IOStreams is that
 it makes excessive use of virtual functions in ways that
 inhibit inlining, particularly in typical implementations
 which drag in locale support even for programs that do
 not use it.  The C++ community recognizes these problems,
 and the C++ committee has addressed some of them (through
 exposition) in its Technical Report on C++ performance.
 
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

I don't know why he wrote that, but my perception is that iostreams have 
always been "on the verge of an efficient implementation" for eight 
years now. What I've seen repeatedly year after year whenever I sat down 
to run a test was performance that make iostream practically unusable 
for any serious coding. I'd be faster at moving molasses upstream on a 
cold day. I am amazed how iostreams managed to maintain this clout for 
so long. If they were a guy, I'd love to know his trick. :o)


Andrei

Mar 22 2007

Walter Bright <newshound digitalmars.com> writes:

James Dennett wrote:
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

Maybe it is a bit of frustration on my part. I often run into people 
who, when faced with benchmarks showing that conventional D runs code 
faster than conventional C++, tell me in various ways that it can't be 
true. I must have:

1) written bad C++ code
2) lied
3) used a sabotaged C++ compiler
4) written some magic optimization that only works on that carefully 
crafted benchmark

So, I have some justification in saying what I did about the 
conventional wisdom of C++. I also know that the top tier of experienced 
C++ programmers are well aware such conventional wisdom is not true.

I have a lot of experience in making C++ code run fast. It doesn't come 
easy, it takes a lot of work back and forth with a profiler. It usually 
involves going around the C++ runtime library. That experience has 
certainly strongly influenced the design of D. I don't wish to have to 
write custom I/O just to get good I/O performance. I don't wish to keep 
doing all the clever string hacks trying to make 0 terminated strings fast.

I want the natural, straightforward D code to be (at least close to) the 
best performing way to implement an algorithm.

Mar 22 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 James Dennett wrote:
 I'm at a loss to understand why you would write what you
 did.  It seems to be a straw man, but maybe there was
 something else to it -- frustration that people assume
 that D must be slower than C++?

 
 Maybe it is a bit of frustration on my part. I often run into people
 who, when faced with benchmarks showing that conventional D runs code
 faster than conventional C++, tell me in various ways that it can't be
 true. I must have:
 
 1) written bad C++ code
 2) lied
 3) used a sabotaged C++ compiler
 4) written some magic optimization that only works on that carefully
 crafted benchmark
 
 So, I have some justification in saying what I did about the
 conventional wisdom of C++. I also know that the top tier of experienced
 C++ programmers are well aware such conventional wisdom is not true.
 
 I have a lot of experience in making C++ code run fast. It doesn't come
 easy, it takes a lot of work back and forth with a profiler. It usually
 involves going around the C++ runtime library. That experience has
 certainly strongly influenced the design of D. I don't wish to have to
 write custom I/O just to get good I/O performance. I don't wish to keep
 doing all the clever string hacks trying to make 0 terminated strings fast.
 
 I want the natural, straightforward D code to be (at least close to) the
 best performing way to implement an algorithm.

Good answer.  (Yes, seriously.)

It's certainly true that for code doing large amounts of
I/O where performance was an issue, I've always avoided
IOStreams in these situations; no implementation I've used
has been anywhere near fast enough.  IOStreams is also a
pain where robustness is required.  It's most useful for
simple tools that are used in tame environments.

The last time I had to get out a profiler to optimize
C++ code, it turned out to mostly be an exercise in
avoiding (a) a terribly inefficient implementation of
std::string, and (b) a mind-bogglingly inefficient
implementation of strftime.  Which I guess illustrates
how important it is that the out-of-the-box, natural
ways to write code should have performance that is not
too far removed from optimal.

It might be harsh, but not entirely unjustified, to say
that the "conventional wisdom" of many communities of
programmers is a long, long way from being wise.  As
the community behind a language grows larger, there is
a natural tendency for it not to have some a density
of experts; if D amasses a million users it's a safe
bet than most of them won't be as sharp as the average
D user is today.

-- James

Mar 22 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

James Dennett wrote:
 Walter Bright wrote:

 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.

I think there is a tendency to assume that APIs and languages which have 
(A) been around a long time and
(B) been used by millions of people
will probably be close to optimal.  It just makes sense that that would 
be the case.  Unfortunately, it's all too often just not true.

--bb

Mar 23 2007

Walter Bright <newshound digitalmars.com> writes:

Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:

 
 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


D bucks conventional wisdom in more than one way. There's a current 
debate going on among people involved in the next C++ standardization 
effort about whether to include garbage collection or not. The people 
involved are arguably the top tier of C++ programmers.

But still, there are one or two that repeat the conventional (and wrong) 
wisdom about garbage collection. Such conventional wisdom is much more 
common among the general population of C++ programmers.


 I think there is a tendency to assume that APIs and languages which have 
 (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that would 
 be the case.  Unfortunately, it's all too often just not true.

I just find it strange that C++, a language meant for building speedy 
applications, would incorporate iostreams, which is slow, not thread 
safe, and not exception safe.

Mar 23 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:

 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


 
 D bucks conventional wisdom in more than one way. There's a current
 debate going on among people involved in the next C++ standardization
 effort about whether to include garbage collection or not. The people
 involved are arguably the top tier of C++ programmers.
 
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

Which "wrong" assertions are those?

 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

 
 I just find it strange that C++, a language meant for building speedy
 applications, would incorporate iostreams, which is slow, not thread
 safe, and not exception safe.

I'm intrigued by your claim that IOStreams is not thread-safe;
the IOStreams framework is thread-safe in the same way that
the STL is thread-safe.  The one minor difference is that
IOStreams exposes some global variables, which is unfortunate
as they can easily be used in inappropriate ways in a
multi-threaded environment.  Then again, that is unsurprising
as C++ does not yet officially incorporate support for
multi-threading.  Is there something deeper in IOStreams that
you consider to be thread-unsafe, or is it just the matter of
its global variables?

-- James

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:
 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


 D bucks conventional wisdom in more than one way. There's a current
 debate going on among people involved in the next C++ standardization
 effort about whether to include garbage collection or not. The people
 involved are arguably the top tier of C++ programmers.

 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

 
 Which "wrong" assertions are those?
 
 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

 I just find it strange that C++, a language meant for building speedy
 applications, would incorporate iostreams, which is slow, not thread
 safe, and not exception safe.

 
 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.  Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.  Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

cout << a << b;

can't guarantee that a and b will be adjacent in the output. In contrast,

printf(format, a, b);

does give that guarantee. Moreover, that guarantee is not between 
separate threads in the same process, it's between whole processes! 
Guess which of the two is usable :o).

Btw, does tango provide such a guarantee for code such as Cout(a)(b)? 
 From the construct, my understanding is that it doesn't.


Andrei

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:

[snip]

 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.  Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.  Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

 
 cout << a << b;
 
 can't guarantee that a and b will be adjacent in the output. In contrast,
 
 printf(format, a, b);
 
 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

As you appear to be saying that printf has to flush every
time it's used, I'd guess that it's unusable for performance
reasons alone.  It's also really hard to implement such a
guarantee on most platforms without using some kind of
process-shared mutex, file lock, or similar.  Does printf
really incur that kind of overhead every time something is
written to a stream, or does its implementation make use
of platform-specific knowledge on which writes are atomic
at the OS level?

Within a process, this level of safety could be achieved
with only a little (usually redundant) synchronization.
Which is useful for debugging or simplistic logging,but
not for anything else I've seen.

(IOStreams has this wrong, in different ways: it's not
just the order of output that's ill-defined if a stream
is used concurrently across multiple threads.  Nasal
demons are also possible, I hear.)

 Btw, does tango provide such a guarantee for code such as Cout(a)(b)?
 From the construct, my understanding is that it doesn't.

I'll leave that for the Tango experts to answer.

-- James

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

 
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

In order for printf to work right it does not need to flush every time 
(you're right in that would lead to terrible performance). The usual 
thing that printf does is only do a flush if isatty() comes back with 
true. In fact, flushing the output at the end of each printf would not 
mitigate multithreading problems at all. In order for printf to be 
thread safe, all that's necessary is for it to acquire/release the C 
stream lock (C's implementation of stdio has a lock associated with each 
stream).

D's implementation of writef does the same thing. D's writef also wraps 
the whole thing in a try-finally, making it exception safe.

Iostreams'
	cout << a << b;
results in the equivalent of:
	(cout->out(a))->out(b);
The trouble is, there's no place to hang the lock acquire/release, nor 
the try-finally. It's a fundamental design problem.

 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

It does exactly one lock acquire/release for each printf, not for each 
character written.

 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?
 
 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

The problem is such synchronization would be invented and added on by 
the user, making it impossible to combine disparate libraries that write 
to stderr, for example, in a multithreading environment.

 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.
 
 (IOStreams has this wrong, in different ways: it's not
 just the order of output that's ill-defined if a stream
 is used concurrently across multiple threads.  Nasal
 demons are also possible, I hear.)

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

 
 In order for printf to work right it does not need to flush every time
 (you're right in that would lead to terrible performance). The usual
 thing that printf does is only do a flush if isatty() comes back with
 true. In fact, flushing the output at the end of each printf would not
 mitigate multithreading problems at all. In order for printf to be
 thread safe, all that's necessary is for it to acquire/release the C
 stream lock (C's implementation of stdio has a lock associated with each
 stream).

That would be true, except that Andrei wrote that
the guarantee applied to separate processes, and
that can only be guaranteed if you both use some
kind of synchronization between the processes *and*
flush the stream.

Andrei's claim went beyond mere thread-safety, and
that was what I responded to.

 D's implementation of writef does the same thing. D's writef also wraps
 the whole thing in a try-finally, making it exception safe.
 
 Iostreams'
     cout << a << b;
 results in the equivalent of:
     (cout->out(a))->out(b);
 The trouble is, there's no place to hang the lock acquire/release, nor
 the try-finally. It's a fundamental design problem.

There's a place:

locked(cout) << a << b;

can be made do the job, using RAII to lock at the
start of the expression and unlock at the end.

 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

 
 It does exactly one lock acquire/release for each printf, not for each
 character written.

Right.  I certainly did not intend to imply that any
serious design would be silly enough to lock for each
character written (which would be fairly useless
synchronization in any case).

 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

 
 The problem is such synchronization would be invented and added on by
 the user, making it impossible to combine disparate libraries that write
 to stderr, for example, in a multithreading environment.

Most libraries ought not to do so; coding dependencies
on globals into libraries is generally poor design.

The problem is not that users would have to write
synchronization.  Usually they need to do that. A
problem would be if some low-level locking inside
the I/O subsystems gave the impression that the
user did *not* need to synchronize their own code.

It's not quite as simple as this.  One (possibly
killer) argument for building synchronization into
low-level libraries is to reduce the cost of
dealing with support issues from bemused users
who expected not to have to consider thread-safety
when sharing streams between threads.

-- James

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

James Dennett wrote:
 Walter Bright wrote:
 That would be true, except that Andrei wrote that
 the guarantee applied to separate processes, and
 that can only be guaranteed if you both use some
 kind of synchronization between the processes *and*
 flush the stream.
 
 Andrei's claim went beyond mere thread-safety, and
 that was what I responded to.

Ok, but since it is typical to do a flush on newline if isatty(), that 
seems to resolve these inter-process problems.

 There's a place:
 
 locked(cout) << a << b;
 
 can be made do the job, using RAII to lock at the
 start of the expression and unlock at the end.

I don't think it is that easy, see: 
http://docs.sun.com/source/819-3690/Multithread.html
and 
http://www.atnf.csiro.au/computing/software/sol2docs/manuals/c++/lib_ref/MT.html


 Right.  I certainly did not intend to imply that any
 serious design would be silly enough to lock for each
 character written (which would be fairly useless
 synchronization in any case).

It's needed if only to avoid corrupting the I/O buffer itself.

 The problem is such synchronization would be invented and added on by
 the user, making it impossible to combine disparate libraries that write
 to stderr, for example, in a multithreading environment.

 Most libraries ought not to do so; coding dependencies
 on globals into libraries is generally poor design.

I think it is unreasonable to tell users they cannot use standard 
cin/cout/cerr in standard ways in their library code.

 The problem is not that users would have to write
 synchronization.  Usually they need to do that. A
 problem would be if some low-level locking inside
 the I/O subsystems gave the impression that the
 user did *not* need to synchronize their own code.
 
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

I think it is a killer argument. Multithreaded programming is hard 
enough without heaping more burdens on the user.

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

 In order for printf to work right it does not need to flush every time
 (you're right in that would lead to terrible performance). The usual
 thing that printf does is only do a flush if isatty() comes back with
 true. In fact, flushing the output at the end of each printf would not
 mitigate multithreading problems at all. In order for printf to be
 thread safe, all that's necessary is for it to acquire/release the C
 stream lock (C's implementation of stdio has a lock associated with each
 stream).

 
 That would be true, except that Andrei wrote that
 the guarantee applied to separate processes, and
 that can only be guaranteed if you both use some
 kind of synchronization between the processes *and*
 flush the stream.
 
 Andrei's claim went beyond mere thread-safety, and
 that was what I responded to.

Lines don't have to appear at exact times, they only must not 
interleave. So printf does not have to flush often. I've used 
printf-level atomicity for a long time on various systems and it works 
perfectly.

Is a system-dependent assumption? I don't know. It sure is there and is 
very helpful on all systems I used it with.


Andrei

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 cout << a << b;

 can't guarantee that a and b will be adjacent in the output. In
 contrast,

 printf(format, a, b);

 does give that guarantee. Moreover, that guarantee is not between
 separate threads in the same process, it's between whole processes!
 Guess which of the two is usable :o).

 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

 In order for printf to work right it does not need to flush every time
 (you're right in that would lead to terrible performance). The usual
 thing that printf does is only do a flush if isatty() comes back with
 true. In fact, flushing the output at the end of each printf would not
 mitigate multithreading problems at all. In order for printf to be
 thread safe, all that's necessary is for it to acquire/release the C
 stream lock (C's implementation of stdio has a lock associated with each
 stream).

 That would be true, except that Andrei wrote that
 the guarantee applied to separate processes, and
 that can only be guaranteed if you both use some
 kind of synchronization between the processes *and*
 flush the stream.

 Andrei's claim went beyond mere thread-safety, and
 that was what I responded to.

 
 Lines don't have to appear at exact times, they only must not
 interleave. So printf does not have to flush often. I've used
 printf-level atomicity for a long time on various systems and it works
 perfectly.

With sufficiently short lines, where the value of
"sufficiently" depends on which platform and which
kind of file descriptor you're writing to.  printf
is likely to end up calling write with no locking;
write isn't atomic past a certain (or uncertain)
size, and has no reason to make the boundary
coincide with the end of a line.

 Is a system-dependent assumption? I don't know. It sure is there and is
 very helpful on all systems I used it with.

Can you name one specific system where this is
documented as working reliably, or where it can
be shown to do so?  I've *seen* interleaving
between processes, and lived with it in debugging
code for performance reasons, but for reliable
output have used other mechanisms.  I understood
this to be a widely known problem with printf,
write et al.

-- James

Mar 24 2007

Sean Kelly <sean f4.ca> writes:

James Dennett wrote:
 
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

...since they obviously don't have to consider thread-safety when 
sharing other objects between threads.  I'll admit that a global output 
object might be seen as somehow magic to those who don't really 
understand what 'cout' represents, for example, how much of a problem 
would this really be?  The argument against building locking into C++ 
containers seems fairly well-settled, so why does there seem to be so 
much contention about output?  Is it that producing predictable behavior 
is easier or that the cost of locking is less of an issue since IO is 
expensive anyway?


Sean

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 James Dennett wrote:
 It's not quite as simple as this.  One (possibly
 killer) argument for building synchronization into
 low-level libraries is to reduce the cost of
 dealing with support issues from bemused users
 who expected not to have to consider thread-safety
 when sharing streams between threads.

 
 ...since they obviously don't have to consider thread-safety when 
 sharing other objects between threads.  I'll admit that a global output 
 object might be seen as somehow magic to those who don't really 
 understand what 'cout' represents, for example, how much of a problem 
 would this really be?  The argument against building locking into C++ 
 containers seems fairly well-settled, so why does there seem to be so 
 much contention about output?  Is it that producing predictable behavior 
 is easier or that the cost of locking is less of an issue since IO is 
 expensive anyway?

Good question(s). Might be also that I/O interface is considerably 
simpler than container interface. The classic example of failure of 
method-level synchronization with containers is

if (!cont.empty()) cont.pop();

With I/O, most of the time, covert synchronization at the call level is 
all you need.


Andrei

Mar 24 2007

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 
 D's implementation of writef does the same thing. D's writef also wraps 
 the whole thing in a try-finally, making it exception safe.
 
 Iostreams'
     cout << a << b;
 results in the equivalent of:
     (cout->out(a))->out(b);
 The trouble is, there's no place to hang the lock acquire/release, nor 
 the try-finally. It's a fundamental design problem.

The stream could acquire a lock and pass it to a proxy object which 
closes the lock on destruction.  This would work fine in C++ where the 
lifetime of such objects is deterministic, but the design is incredibly 
awkward.

 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream,

 
 It does exactly one lock acquire/release for each printf, not for each 
 character written.

This is still far too granular for most uses.  About the only time I 
actually use output without explicit synchronization are for throw-away 
debug output.

 or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.

 
 The problem is such synchronization would be invented and added on by 
 the user, making it impossible to combine disparate libraries that write 
 to stderr, for example, in a multithreading environment.

This is a valid point, but how often is it actually used in practice? 
Libraries generally do not perform error output of their own, and 
applications typically have a coherent approach for output.  In my time 
as a programmer, I can't think of a single instance where default 
synchronization to an output device actually mattered.  I can certainly 
appreciate this for its predictable behavior, but I don't know how often 
that predictability would actually matter to me.

 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.


Exactly.


Sean

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 It does exactly one lock acquire/release for each printf, not for each 
 character written.

 This is still far too granular for most uses.

I disagree. It's been working fine for nearly 20 years now. gcc 
implements it the same way, and it's hardly unusable for most uses.

 The problem is such synchronization would be invented and added on by 
 the user, making it impossible to combine disparate libraries that 
 write to stderr, for example, in a multithreading environment.

 This is a valid point, but how often is it actually used in practice? 
 Libraries generally do not perform error output of their own, and 
 applications typically have a coherent approach for output.  In my time 
 as a programmer, I can't think of a single instance where default 
 synchronization to an output device actually mattered.  I can certainly 
 appreciate this for its predictable behavior, but I don't know how often 
 that predictability would actually matter to me.

It apparently comes up often enough in C++ to merit 59,000 hits on 
"multithreaded iostreams" and many web pages outlining attempts to solve 
the problem.

It is a problem that is solved by every C stdio for multithreaded 
environments, although the C standard does not mention the word "thread".

Multithreading threatens to become far more common, not less, as we move 
to multicore machines.

If that isn't compelling, ok, but I suggest at a minimum that Tango not 
lock into a design that *precludes* adding thread synchronization 
without changing user code.

Mar 24 2007

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 
 The problem is such synchronization would be invented and added on by 
 the user, making it impossible to combine disparate libraries that 
 write to stderr, for example, in a multithreading environment.

 This is a valid point, but how often is it actually used in practice? 
 Libraries generally do not perform error output of their own, and 
 applications typically have a coherent approach for output.  In my 
 time as a programmer, I can't think of a single instance where default 
 synchronization to an output device actually mattered.  I can 
 certainly appreciate this for its predictable behavior, but I don't 
 know how often that predictability would actually matter to me.

 
 It apparently comes up often enough in C++ to merit 59,000 hits on 
 "multithreaded iostreams" and many web pages outlining attempts to solve 
 the problem.

True enough.  Though I wonder how much of a factor it is that C++ has no 
built-in support for multithreading, and if this has a positive or 
negative effect on the number of questions.

 It is a problem that is solved by every C stdio for multithreaded 
 environments, although the C standard does not mention the word "thread".
 
 Multithreading threatens to become far more common, not less, as we move 
 to multicore machines.
 
 If that isn't compelling, ok, but I suggest at a minimum that Tango not 
 lock into a design that *precludes* adding thread synchronization 
 without changing user code.

True enough.  I suppose that if nothing else, the option for 
synchronized output to stdout, stderr, and stdlog should be somehow 
available without user changes, as you say.


Sean

Mar 25 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

Numbers clearly tell the above is wrong. Here's the thing: I write 
programs that write lines to files. If I use cout, they don't work. If I 
use fprintf, the do work, and 10 times faster. And that's that.

 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

The C standard library takes care of it without me having to do anything 
in particular.

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.

I do not concur.


Andrei

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

 
 Numbers clearly tell the above is wrong. 

Only if they apply to the above.

 Here's the thing: I write
 programs that write lines to files. If I use cout, they don't work. If I
 use fprintf, the do work, and 10 times faster. And that's that.

Except that your test wasn't of the right thing; you
probably didn't test code that guaranteed atomicity
of writes between different processes.

 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 
 The C standard library takes care of it without me having to do anything
 in particular.

I've never seen a C library that guarantees atomicity of
writes between processes on a Unix-like system.  The
documentation of some systems does guarantee atomicity
of sufficiently small writes to certain types of file
descriptors, but I've not seen any Unix-like system
that guarantees atomicity for writes of unlimited sizes;
in some cases they can even be interrupted before the
full amount is written.  I've certainly seen the result
of C's *printf *not* being synchronized between processes
on a wide variety of systems.

 Within a process, this level of safety could be achieved
 with only a little (usually redundant) synchronization.
 Which is useful for debugging or simplistic logging,but
 not for anything else I've seen.

 
 I do not concur.

With my description of my own experience?  ;)

-- James

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

James Dennett wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 As you appear to be saying that printf has to flush every
 time it's used, I'd guess that it's unusable for performance
 reasons alone.

 Numbers clearly tell the above is wrong. 

 
 Only if they apply to the above.
 
 Here's the thing: I write
 programs that write lines to files. If I use cout, they don't work. If I
 use fprintf, the do work, and 10 times faster. And that's that.

 
 Except that your test wasn't of the right thing; you
 probably didn't test code that guaranteed atomicity
 of writes between different processes.
 
 It's also really hard to implement such a
 guarantee on most platforms without using some kind of
 process-shared mutex, file lock, or similar.  Does printf
 really incur that kind of overhead every time something is
 written to a stream, or does its implementation make use
 of platform-specific knowledge on which writes are atomic
 at the OS level?

 The C standard library takes care of it without me having to do anything
 in particular.

 
 I've never seen a C library that guarantees atomicity of
 writes between processes on a Unix-like system.  The
 documentation of some systems does guarantee atomicity
 of sufficiently small writes to certain types of file
 descriptors, but I've not seen any Unix-like system
 that guarantees atomicity for writes of unlimited sizes;
 in some cases they can even be interrupted before the
 full amount is written.  I've certainly seen the result
 of C's *printf *not* being synchronized between processes
 on a wide variety of systems.

If you did, fine. I take that part of my argument back. I'll also note 
that that doesn't make iostreams any more defensible :o).

Andrei

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Andrei Alexandrescu (See Website For Email) wrote:

[snip]

 I'll also note
 that that doesn't make iostreams any more defensible :o).

Trying to defend IOStreams is certainly a challenge.

I think I've tried enough, given what a sick puppy it
is, and now should leave it to suffer in peace.

-- James

Mar 24 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 James Dennett wrote:
 Walter Bright wrote:
 Bill Baxter wrote:
 James Dennett wrote:
 Walter Bright wrote:
 It might be harsh, but not entirely unjustified, to say
 that the "conventional wisdom" of many communities of
 programmers is a long, long way from being wise.  As
 the community behind a language grows larger, there is
 a natural tendency for it not to have some a density
 of experts; if D amasses a million users it's a safe
 bet than most of them won't be as sharp as the average
 D user is today.


 D bucks conventional wisdom in more than one way. There's a current
 debate going on among people involved in the next C++ standardization
 effort about whether to include garbage collection or not. The people
 involved are arguably the top tier of C++ programmers.

 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

 Which "wrong" assertions are those?

 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

 I just find it strange that C++, a language meant for building speedy
 applications, would incorporate iostreams, which is slow, not thread
 safe, and not exception safe.

 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.  Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.  Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

 
 cout << a << b;
 
 can't guarantee that a and b will be adjacent in the output. In contrast,
 
 printf(format, a, b);
 
 does give that guarantee. Moreover, that guarantee is not between 
 separate threads in the same process, it's between whole processes! 
 Guess which of the two is usable :o).

stringstream s;
s << a << b;
cout << s.str();

;-)

 Btw, does tango provide such a guarantee for code such as Cout(a)(b)? 
  From the construct, my understanding is that it doesn't.

No.  There really isn't any way to do automatic locking with chained 
opCall barring the use of proxy objects or something equally nasty. 
Also, it hurts efficiency to always lock regardless of whether the user 
is performing IO in multiple threads.  The preferred method here is:

synchronized( Cout )
     Cout( a )( b )( c )();


Sean

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 There really isn't any way to do automatic locking with chained 
 opCall barring the use of proxy objects or something equally nasty. 
 Also, it hurts efficiency to always lock regardless of whether the user 
 is performing IO in multiple threads.  The preferred method here is:
 
 synchronized( Cout )
     Cout( a )( b )( c )();

The trouble with that design is people working on subsystems or 
libraries, which will be combined by others into a working whole. Since 
it is extra work to add the synchronized statement, odds are pretty good 
it won't happen. Then, the whole gets erratic multithreading performance.

Ideally, things should be inverted so that thread safety is the default 
behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the 
extra work.

One way to solve this problem is to use variadic templates as outlined 
in http://www.digitalmars.com/d/variadic-function-templates.html

Back in the early days of Windows NT, when multithreaded programming was 
introduced to a mass platform, C compilers typically shipped with two 
runtime libraries - a single threaded one "for efficiency", and a 
multithreaded one. Also, to do multithreaded code, one had to predefine 
_MT or throw a command line switch. Inevitably, this was overlooked, and 
endless bugs consumed endless time. I made the decision early on to only 
ship threadsafe libraries, and have _MT always on. I've never regretted 
it, I'm sure it saved me a lot of tech support time, and avoided the 
perception that the compiler didn't work with multithreading.

Mar 24 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 There really isn't any way to do automatic locking with chained opCall 
 barring the use of proxy objects or something equally nasty. Also, it 
 hurts efficiency to always lock regardless of whether the user is 
 performing IO in multiple threads.  The preferred method here is:

 synchronized( Cout )
     Cout( a )( b )( c )();

 
 The trouble with that design is people working on subsystems or 
 libraries, which will be combined by others into a working whole. Since 
 it is extra work to add the synchronized statement, odds are pretty good 
 it won't happen. Then, the whole gets erratic multithreading performance.
 
 Ideally, things should be inverted so that thread safety is the default 
 behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the 
 extra work.
 
 One way to solve this problem is to use variadic templates as outlined 
 in http://www.digitalmars.com/d/variadic-function-templates.html
 
 Back in the early days of Windows NT, when multithreaded programming was 
 introduced to a mass platform, C compilers typically shipped with two 
 runtime libraries - a single threaded one "for efficiency", and a 
 multithreaded one. Also, to do multithreaded code, one had to predefine 
 _MT or throw a command line switch. Inevitably, this was overlooked, and 
 endless bugs consumed endless time. I made the decision early on to only 
 ship threadsafe libraries, and have _MT always on. I've never regretted 
 it, I'm sure it saved me a lot of tech support time, and avoided the 
 perception that the compiler didn't work with multithreading.

MS does the same now if I remember correctly: all of its libraries are 
MT by default.

I agree with Walter's sentiment that Cout(a)(b) is a design mistake. 
Fortunately, now we have compile-time variadic functions, which will 
make it easy to correct the design - Cout(a, b) can be made just as good 
without having to chase typeinfo's at runtime.


Andrei

Mar 24 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Back in the early days of Windows NT, when multithreaded programming 
 was introduced to a mass platform, C compilers typically shipped with 
 two runtime libraries - a single threaded one "for efficiency", and a 
 multithreaded one. Also, to do multithreaded code, one had to 
 predefine _MT or throw a command line switch. Inevitably, this was 
 overlooked, and endless bugs consumed endless time. I made the 
 decision early on to only ship threadsafe libraries, and have _MT 
 always on. I've never regretted it, I'm sure it saved me a lot of tech 
 support time, and avoided the perception that the compiler didn't work 
 with multithreading.

 
 MS does the same now if I remember correctly: all of its libraries are 
 MT by default.

Yup.  In fact, I just discovered that Visual Studio 2005 doesn't even 
provide a single-threaded build option any more.  In some ways it's a 
relief because it's allowed me to drop two build options and remove a 
bunch of #if defined(_MT) clauses.

 I agree with Walter's sentiment that Cout(a)(b) is a design mistake. 
 Fortunately, now we have compile-time variadic functions, which will 
 make it easy to correct the design - Cout(a, b) can be made just as good 
 without having to chase typeinfo's at runtime.

Agreed.


Sean

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

 Which "wrong" assertions are those?

gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for 
mission critical industrial apps, gc is for academic unusable languages, 
etc.

 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

 I just find it strange that C++, a language meant for building speedy
 applications, would incorporate iostreams, which is slow, not thread
 safe, and not exception safe.

 
 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.

Note the reliance here on global state that is neither thread nor 
exception safe:

std::ios_base::fmtflags flags_save = std::cout.flags();
std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 
789 << std::endl;
std::cout.flags(flags_save);


 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

That's not an excuse, as 1) multithreading was common long before C++98 
was written and 2) multithreading and exception safety was thought about 
and accounted for in much of the rest of the library design, despite 
threading not being official.

 Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

All I can do is point to the example above.

Mar 24 2007

James Dennett <jdennett acm.org> writes:

Walter Bright wrote:
 James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

 Which "wrong" assertions are those?

 
 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for
 mission critical industrial apps, gc is for academic unusable languages,
 etc.

I've seen only a minority of those claims made as part
of the C++ committee discussions of GC.  However:

GC *is* often used as a crutch by programmers who cannot
or do not want to take time to make a design in which
ownership is clear.

GC is unsuitable for *some* types of mission critical
applications.

These are true.  It's also true that:

Effective use of GC is not restricted to lazy/sloppy/
less capable programmers, and can be used by experts
to produce software that is more reliable in certain
ways.

GC is suitable for some types of mission critical
applications.

GC can affect performance, either positively or
negatively.

GC can affect memory footprint.

Working with GC can cause resource management issues
because many programmers are often tempted to think
less carefully about these issues when there is a
garbage collector to mitigate some of the damage.

Almost all discussions of the pros and cons of GC are
simplistic and unbalanced.

 I think there is a tendency to assume that APIs and languages which
 have (A) been around a long time and
 (B) been used by millions of people
 will probably be close to optimal.  It just makes sense that that
 would be the case.  Unfortunately, it's all too often just not true.

 I just find it strange that C++, a language meant for building speedy
 applications, would incorporate iostreams, which is slow, not thread
 safe, and not exception safe.

 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.

 
 Note the reliance here on global state that is neither thread nor
 exception safe:
 
 std::ios_base::fmtflags flags_save = std::cout.flags();
 std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" <<
 789 << std::endl;
 std::cout.flags(flags_save);

True, exception-safety is an issue.  There is not
a threading issue in the code above *except* that
it uses a global variable without synchronization;
you explicitly coded reliance on global state, by
using a global variable.  Unfortunately that's
easily done with the IOStreams interface.

 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

 
 That's not an excuse, as 1) multithreading was common long before C++98
 was written and 2) multithreading and exception safety was thought about
 and accounted for in much of the rest of the library design, despite
 threading not being official.

I wasn't aiming to make an excuse.  I was merely noting
that it's not surprising.  IOStreams was old before the
1998 standard was published; this was a case of the
standards committee doing what it was supposed to do,
i.e., standardizing existing practice.

 Is there something deeper in IOStreams that
 you consider to be thread-unsafe, or is it just the matter of
 its global variables?

 
 All I can do is point to the example above.

I see exception-safety issues, but no threading issue
apart from *if* your code fails to synchronize access
to a global variable.  So far as I can tell, there are
not thread-safety issues unless multiple threads share
a stream without synchronization (which is just as
much of a defect as if they shared a container without
synchronization).

Automatic synchronization tends to be at the wrong
level, just as in the case of containers etc.  Most
often in robust code it's redundant to make a stream
synchronize itself.

Anyway, I was just hoping to find out something I
didn't already know.  One thing we do know is that
IOStreams is not the gold standard for I/O interfaces,
though it does have strengths in extensibility and
type-safety compared to the alternatives in most
C-like languages.

-- James

Mar 24 2007

Walter Bright <newshound digitalmars.com> writes:

James Dennett wrote:
 Walter Bright wrote:
 James Dennett wrote:
 Walter Bright wrote:
 But still, there are one or two that repeat the conventional (and wrong)
 wisdom about garbage collection. Such conventional wisdom is much more
 common among the general population of C++ programmers.

 Which "wrong" assertions are those?

 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for
 mission critical industrial apps, gc is for academic unusable languages,
 etc.

 
 I've seen only a minority of those claims made as part
 of the C++ committee discussions of GC.

I think we're in agreement, as I said "one or two", and that such claims 
are not made in general by the top tier of C++ programmers.

 Almost all discussions of the pros and cons of GC are
 simplistic and unbalanced.

It's not humanly possible to mention every pro and every con in every 
discussion. Nobody is making a claim of absolutes, either. For every 
example, sure, you can find a counter-example.

That doesn't mean one cannot have a meaningful discussion about the pros 
  and cons of adding gc, and it doesn't mean we can't dismiss certain 
arguments against gc, like it being a crutch for lazy programmers.

 I'm intrigued by your claim that IOStreams is not thread-safe;
 the IOStreams framework is thread-safe in the same way that
 the STL is thread-safe.  The one minor difference is that
 IOStreams exposes some global variables, which is unfortunate
 as they can easily be used in inappropriate ways in a
 multi-threaded environment.

 Note the reliance here on global state that is neither thread nor
 exception safe:

 std::ios_base::fmtflags flags_save = std::cout.flags();
 std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" <<
 789 << std::endl;
 std::cout.flags(flags_save);

 
 True, exception-safety is an issue.  There is not
 a threading issue in the code above *except* that
 it uses a global variable without synchronization;
 you explicitly coded reliance on global state, by
 using a global variable.  Unfortunately that's
 easily done with the IOStreams interface.

That's the design of iostreams - reliance on global state with no 
multithreading protection. Using std::left is not a mistake on my part, 
it is a feature of iostreams. Also,
	cout << a << b;
has multithreading problems as well, as if two threads are writing to 
stdout, the output of a and b can be interleaved with the other thread's 
output. Note that:
	writefln(a, b);
is both exception safe and thread safe - there will be no interleaving 
of output.

 Then again, that is unsurprising
 as C++ does not yet officially incorporate support for
 multi-threading.

 That's not an excuse, as 1) multithreading was common long before C++98
 was written and 2) multithreading and exception safety was thought about
 and accounted for in much of the rest of the library design, despite
 threading not being official.

 
 I wasn't aiming to make an excuse.  I was merely noting
 that it's not surprising.  IOStreams was old before the
 1998 standard was published; this was a case of the
 standards committee doing what it was supposed to do,
 i.e., standardizing existing practice.

Iostreams was substantially redesigned for C++98. Iostreams has 
undergone two major, incompatible overhauls since it originally debuted. 
You can see the old ones in DMC++'s <iostream.h> and <oldstr/stream.h>.

 I see exception-safety issues, but no threading issue
 apart from *if* your code fails to synchronize access
 to a global variable.  So far as I can tell, there are
 not thread-safety issues unless multiple threads share
 a stream without synchronization (which is just as
 much of a defect as if they shared a container without
 synchronization).

You can use C's stdio and D's stdio (and even mix them) without 
exception safety problems or need for the user to supply any 
synchronization.

 Automatic synchronization tends to be at the wrong
 level, just as in the case of containers etc.  Most
 often in robust code it's redundant to make a stream
 synchronize itself.
 
 Anyway, I was just hoping to find out something I
 didn't already know.  One thing we do know is that
 IOStreams is not the gold standard for I/O interfaces,
 though it does have strengths in extensibility and
 type-safety compared to the alternatives in most
 C-like languages.

I agree it has strengths in extensibility and type-safety. But I set 
that against its poor performance, exception unsafety, and threading 
problems, and conclude it is not a design that should be emulated.

Mar 24 2007

0ffh <spam frankhirsch.net> writes:

Walter Bright wrote:
 Which "wrong" assertions are those?

 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for 
 mission critical industrial apps, gc is for academic unusable languages, 
 etc.

I admit I used to think similar to that, a somewhat longer while ago.
What made me change my mind was that Greenspun's Tenth Rule also
includes GC: I find that doing the dynamic memory management myself
results not only in bigger and more fragile source code, but also
may perform worse than GC unless I go about it very warily.

I think it is just not efficient to put a lot of work into that
with every application - it's much more efficient if somebody
solves the problem *once*, and properly, and that's that.

Happy hacking,

   Frank

p.s.
Thanks for your work... ;-)

Mar 25 2007

Dan <murpsoft hotmail.com> writes:

 Walter Bright wrote:
 Which "wrong" assertions are those?

 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for 
 mission critical industrial apps, gc is for academic unusable languages, 
 etc.


0ffh Wrote:
 I admit I used to think similar to that, a somewhat longer while ago.
 What made me change my mind was that Greenspun's Tenth Rule also
 includes GC: I find that doing the dynamic memory management myself
 results not only in bigger and more fragile source code, but also
 may perform worse than GC unless I go about it very warily.
 
 I think it is just not efficient to put a lot of work into that
 with every application - it's much more efficient if somebody
 solves the problem *once*, and properly, and that's that.

I totally agree that GC is a solid way of cutting bad code, which performs far
worse than the usually trivial overhead of having a GC.  

I do think though that it should be somewhat easier to declare something as not
being under the gc's influence so that when we want to be wary and we're
scratching for an extra 10% performance in a loop, we can do so more readily.

~~

At first I was astonished to see my 26kb source compiled to a whopping 82kb.  I
was wondering if it imported all of phobos...
Now I've realized that that extra mass did all the dynamic array stuff,
associative array stuff, gc and phobos.  Things that would have taken me just
as much in source to write...

Mar 26 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 
 Can you distill the benefits of retaining CR on a readline, please?

 
 I am pasting fragments from an email to Walter. He suggested this at a 
 point, and I managed to persuade him to keep the newline in there.
 
 Essentially it's about information. The naive loop:
 
 while (readln(line)) {
    write(line);
 }
 
 is guaranteed 100% to produce an accurate copy of its input. The version 
 that chops lines looks like:
 
 while (readln(line)) {
    writeln(line);
 }
 
 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte. This is the kind of imprecision that makes the 
 difference between a well-designed API and an almost-good one. Moreover, 
 with the automated chopping it is basically impossible to write a 
 program that exactly reproduces its input because readln essentially 
 loses information.


And exactly how often do people need to write this program? I would have
thought that the need to exactly reproduce the input is kind of rare,
because most programs read stuff to manipulate or deduce things from it,
and not to replicate it.
 
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:
 
 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }
 
 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

By "convoluted", you mean something like this ...

  char[] line;
  while ( io.readln(line) == io.Success )
  {
     ++dictionary[line];
  }
   


 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

And yet my code nearly always looks like ...

   line = trim_right(readln());

because I then have to parse the data contained in the line and white space
(blank, tab and new line) at the end of a line is just usually cruft. On
the other hand, as I have to trim the line anyhow, I guess it doesn't
matter if the routine ensures a new line or not. 

Another interesting twist is that some text files omit the new-line on the
last line in the file.

 Ever tried to do cin or fscanf? You can't do any intelligent input with 
 them because they skip whitespace and newlines like it's out of style. 
 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

I conclude that we tend to write different types of apps.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 10:55:43 AM

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
 Email) wrote:
 
 Can you distill the benefits of retaining CR on a readline, please?

 I am pasting fragments from an email to Walter. He suggested this at a 
 point, and I managed to persuade him to keep the newline in there.

 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version 
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte. This is the kind of imprecision that makes the 
 difference between a well-designed API and an almost-good one. Moreover, 
 with the automated chopping it is basically impossible to write a 
 program that exactly reproduces its input because readln essentially 
 loses information.

 
 
 And exactly how often do people need to write this program? I would have
 thought that the need to exactly reproduce the input is kind of rare,
 because most programs read stuff to manipulate or deduce things from it,
 and not to replicate it.

Of course. It's not about reproducing the input exactly, but about 
having all of the information in the input available to the program.

 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

 
 By "convoluted", you mean something like this ...
 
   char[] line;
   while ( io.readln(line) == io.Success )
   {
      ++dictionary[line];
   }

I said that the API would force people to write convoluted code if it 
wanted to offer char[] readln(). Consequently, your code is buggy in the 
likely case io.readln overwrites its buffer, which is mute testimony to 
the validity of my point :o).

 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

 
 And yet my code nearly always looks like ...
 
    line = trim_right(readln());

I often do that too. And I'm glad I can remove information I don't need, 
because clearly I couldn't add back information I've lost.

It should be pointed out that my point generalizes to more than 
newlines. I plan to add to phobos two routines that efficiently and 
atomically implement the following:

read_delim(FILE*, char[] buf, dchar delim);

and

read_delim(FILE*, char[] buf, char delim[]);

For such functions, particularly the last one, it is vital that the 
delimiter is KEPT in the resulting buffer.


Andrei

Mar 21 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Wed, 21 Mar 2007 17:21:40 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 Derek Parnell wrote:
 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
    ++dictionary[line];
 }

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

 
 By "convoluted", you mean something like this ...
 
   char[] line;
   while ( io.readln(line) == io.Success )
   {
      ++dictionary[line];
   }

 
 I said that the API would force people to write convoluted code if it 
 wanted to offer char[] readln(). Consequently, your code is buggy in the 
 likely case io.readln overwrites its buffer, which is mute testimony to 
 the validity of my point :o).

Actually you said "stdio also offers a readln() that creates a new line on
every call" and so does my fictious "io.readln(line)".  It can not
overwrite its buffer because it creates the buffer. 

  io.Status readln(out char[] pBuffer)
  {
     pBuffer.length = io.FirstGuessLength;
     
     // Note: This routine expand/contracts the buffer as required.
     fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer);

     // If I get this far then the low-level I/O system didn't fail me.
     return io.Success;
  }
 
 It should be pointed out that my point generalizes to more than 
 newlines. I plan to add to phobos two routines that efficiently and 
 atomically implement the following:
 
 read_delim(FILE*, char[] buf, dchar delim);

 and
 
 read_delim(FILE*, char[] buf, char delim[]);
 
 For such functions, particularly the last one, it is vital that the 
 delimiter is KEPT in the resulting buffer.

And that would be because it stops at the leftmost 'delim' that is
contained in "char[] delim" so the caller needs to know which one stopped
the input stream? I presume that this would support Unicode characters too?

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 11:26:34 AM

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 Actually you said "stdio also offers a readln() that creates a new line on
 every call" and so does my fictious "io.readln(line)".  It can not
 overwrite its buffer because it creates the buffer. 
 
   io.Status readln(out char[] pBuffer)
   {
      pBuffer.length = io.FirstGuessLength;
      
      // Note: This routine expand/contracts the buffer as required.
      fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer);
 
      // If I get this far then the low-level I/O system didn't fail me.
      return io.Success;
   }

Fine. It's just not clear what readln does from its signature. In 
contrast, stdio offers size_t readln(char[]) and char[] readln(), with 
clear semantics.

 It should be pointed out that my point generalizes to more than 
 newlines. I plan to add to phobos two routines that efficiently and 
 atomically implement the following:

 read_delim(FILE*, char[] buf, dchar delim);

 and

 read_delim(FILE*, char[] buf, char delim[]);

 For such functions, particularly the last one, it is vital that the 
 delimiter is KEPT in the resulting buffer.

 
 And that would be because it stops at the leftmost 'delim' that is
 contained in "char[] delim" so the caller needs to know which one stopped
 the input stream? I presume that this would support Unicode characters too?

It's the other way around: you read til the _last_ character of the 
delimiter, and you look back in the buffer. If the buffer has the 
delimiter as suffix, you're done. Otherwise, repeat (while appending to 
the buffer). This should work with Unicode streams too, although I'm not 
an expert in the matter.

My point is that at end-of-file you may want to know whether the 
delimiter was correctly present, as is required in certain protocols.


Andrei

Mar 21 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Wed, 21 Mar 2007 17:57:51 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 Derek Parnell wrote:
 Actually you said "stdio also offers a readln() that creates a new line on
 every call" and so does my fictious "io.readln(line)".  It can not
 overwrite its buffer because it creates the buffer. 
 
   io.Status readln(out char[] pBuffer)
   {
      pBuffer.length = io.FirstGuessLength;
      
      // Note: This routine expand/contracts the buffer as required.
      fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer);
 
      // If I get this far then the low-level I/O system didn't fail me.
      return io.Success;
   }

 
 Fine. It's just not clear what readln does from its signature. In 
 contrast, stdio offers size_t readln(char[]) and char[] readln(), with 
 clear semantics.

 
 read_delim(FILE*, char[] buf, char delim[]);



 
 It's the other way around: 

Right ... it was the "from its signature ... with clear semantics" that had
me fooled.

 My point is that at end-of-file you may want to know whether the 
 delimiter was correctly present, as is required in certain protocols.

Yes. A very good point indeed.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 12:07:34 PM

Mar 21 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[...]
 Can you distill the benefits of retaining CR on a readline, please?

 
 I am pasting fragments from an email to Walter. He suggested this at a 
 point, and I managed to persuade him to keep the newline in there.

I suspect Walter was thinking on something else at the time.

 Essentially it's about information. The naive loop:
 
 while (readln(line)) {
   write(line);
 }

I'm completely against that awful mess of code.

 is guaranteed 100% to produce an accurate copy of its input. The version 
 that chops lines looks like:
 
 while (readln(line)) {
   writeln(line);
 }
 
 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

Are you sure? Can you elaborate more on this?

 This is the kind of imprecision that makes the
 difference between a well-designed API and an almost-good one. Moreover, 
 with the automated chopping it is basically impossible to write a 
 program that exactly reproduces its input because readln essentially 
 loses information.

Same here.

 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:
 
 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

This way you'll get two different dictionaries on Windows and on Unix.
Wrong, very wrong.

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is another 
 important point.) An API that uses automated chopping should not offer 
 such a function because an empty line may mean that an empty line was 
 read, or that it's eof time. So the API would force people to write 
 convoluted code.

What is your definition of "convolute"?
I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

Per is something the world should get rid of, quickly.
Per is wrong, Perl is evil, Perl is useless.
You don't need Perl, try to cease using it.

The fact that this narrow-minded idea comes from Perl is not surprising.

 Ever tried to do cin or fscanf? You can't do any intelligent input with 
 them because they skip whitespace and newlines like it's out of style. 

I use them, and I find them very comfortable.
Again your definition of 'intelligent' is particular.
If you find Perl 'intelligent', this say a lot.

 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

You obviously program only for one single platform.
Being portable is way more complex than this.

Ciao

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Roberto Mariottini wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 [...]
  >
 Can you distill the benefits of retaining CR on a readline, please?

 I am pasting fragments from an email to Walter. He suggested this at a 
 point, and I managed to persuade him to keep the newline in there.

 I suspect Walter was thinking on something else at the time.

 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

 I'm completely against that awful mess of code.

What exactly would be bad about it?

 is guaranteed 100% to produce an accurate copy of its input. The 
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

 Are you sure? Can you elaborate more on this?

Very simple. If the file ends with a newline, the code reproduces it. If 
not, the code gratuitously appends a newline.

  > This is the kind of imprecision that makes the
 difference between a well-designed API and an almost-good one. 
 Moreover, with the automated chopping it is basically impossible to 
 write a program that exactly reproduces its input because readln 
 essentially loses information.

 Same here.

 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

 This way you'll get two different dictionaries on Windows and on Unix.
 Wrong, very wrong.

Yes, wrong, very wrong. Except it's not me who's wrong :o).

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is 
 another important point.) An API that uses automated chopping should 
 not offer such a function because an empty line may mean that an empty 
 line was read, or that it's eof time. So the API would force people to 
 write convoluted code.

 What is your definition of "convolute"?
 I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

You are objectively wrong. The code is portable. Newline translation 
takes care of it. Just try it.

 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

 Per is something the world should get rid of, quickly.
 Per is wrong, Perl is evil, Perl is useless.
 You don't need Perl, try to cease using it.

 The fact that this narrow-minded idea comes from Perl is not surprising.

What can I say? Thanks! I'm enlightened!

 Ever tried to do cin or fscanf? You can't do any intelligent input 
 with them because they skip whitespace and newlines like it's out of 
 style. 

 I use them, and I find them very comfortable.
 Again your definition of 'intelligent' is particular.
 If you find Perl 'intelligent', this say a lot.

To each their own :o). Oh, probably you could explain how I can read a 
string containing spaces, followed by ":" and a number with scanf. Takes 
one line in Perl and D's readfln (not yet distributed).

 All of my C++ applications use getline() or fgets() (both of which 
 thankfully do include the newline) and then process the line in-situ.

 You obviously program only for one single platform.
 Being portable is way more complex than this.

Yep, I saw that :o).

Andrei

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:

[...]
 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

 I'm completely against that awful mess of code.

 
 What exactly would be bad about it?

It's not clearly evident for a non-expert programmer that a new-line is 
appended at each line.
Take any programmer from any language of your choice and ask what this 
snippets is supposed to do.
This is against immediate comprehension of code.

 is guaranteed 100% to produce an accurate copy of its input. The 
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a 
 file larger by one byte.

 Are you sure? Can you elaborate more on this?

 
 Very simple. If the file ends with a newline, the code reproduces it. If 
 not, the code gratuitously appends a newline.

A newline is two bytes here.

 Moreover, with the automated chopping it is basically impossible to 
 write a program that exactly reproduces its input because readln 
 essentially loses information.



A text file is not a binary file.
A newline at end of file is completely irrelevant.

On the other side, no code should break if the last newline is there or 
not. The problem with your code is that the last line comes different 
from the others.

 Also, stdio also offers a readln() that creates a new line on every 
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

 This way you'll get two different dictionaries on Windows and on Unix.
 Wrong, very wrong.

 
 Yes, wrong, very wrong. Except it's not me who's wrong :o).

Ehm, can you elaborate how good is to put a '\n' at the end of any 
string when working with:

  - databases
  - communication programs
  - interprocess communication
  - distributed computing

 The code _just works_ because an empty line means _precisely_ and 
 without the shadow of a doubt that the file has ended. (An I/O error 
 throws an exception, and does NOT return an empty line; that is 
 another important point.) An API that uses automated chopping should 
 not offer such a function because an empty line may mean that an 
 empty line was read, or that it's eof time. So the API would force 
 people to write convoluted code.

 What is your definition of "convolute"?
 I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

 
 You are objectively wrong. 

Say 'subjectively'.
Assignments in boolean expressions should be avoided. The average 
programmer knows something about this magic, but fears to touch it, and 
never completely understand it.

Still, any programmer from any language would think that this code ends 
at the first empty line.

Here is one of the many possible non-convoluted versions:

char[] line = readln();
while (line.length > 0) {
   ++dictionary[chomp(line)];
   line = readln();
}

And this is how it should be:

char[] line = readln();
while (line != null) {
   ++dictionary[line];
   line = readln();
}

 The code is portable. Newline translation 
 takes care of it. Just try it.

Newline translation is an old problem with C, C++ and now with D.
Nothing can be resolved with newline translation.

Opening a file in binary mode on Unix and treating it like a text file 
works only as long as the program is run on Unix.
Newline translation is prone to portability errors, thus non-portable.

In my experience, newline translations pose more portability problems 
than it solves.

 In the couple of years I've used Perl I've thanked the Perl folks for 
 their readline decision numerous times.

 Per is something the world should get rid of, quickly.
 Per is wrong, Perl is evil, Perl is useless.
 You don't need Perl, try to cease using it.

 The fact that this narrow-minded idea comes from Perl is not surprising.

 
 What can I say? Thanks! I'm enlightened!

You'll be more enlightened if you had to work with big CGI scripts 
written in Perl, and eventually had to convert them to JSP to make the 
average (available) programmers able to work on them.

Sure, with Perl you can do many things in less than 10 lines.
But keep it less than 10 lines, or you are in troubles.

 Ever tried to do cin or fscanf? You can't do any intelligent input 
 with them because they skip whitespace and newlines like it's out of 
 style. 

 I use them, and I find them very comfortable.
 Again your definition of 'intelligent' is particular.
 If you find Perl 'intelligent', this say a lot.

 
 To each their own :o). Oh, probably you could explain how I can read a 
 string containing spaces, followed by ":" and a number with scanf. Takes 
 one line in Perl and D's readfln (not yet distributed).

scanf(" :%d", &i);

Ciao

Mar 23 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email)
<SeeWebsiteForEmail erdani.org> wrote:

 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

I'd just like to say that the chosen naming convention seems a bit unintuitive
to me out of the following reasons:

1) it seems odd that what you read with readln(), you need to write with
write() and not writeln().
2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.
3) in my personal experience (of a number of smaller and larger console
applications), it's much more often that I need to work with the contents of
lines (without line endings), rather than with. If you need to copy data while
preserving line endings, I would recommend using binary buffers for files - and
I've no idea why would you use standard input/output for binary data anyway.
4) it's much easier to add a line ending than to remove it.

Based on the above reasons, I would like to suggest to let readln() chop line
endings, and perhaps have another function (getline?) which keeps them.

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Mar 22 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For
Email) <SeeWebsiteForEmail erdani.org> wrote:
 
 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

 
 I'd just like to say that the chosen naming convention seems a bit unintuitive
to me out of the following reasons:
 
 1) it seems odd that what you read with readln(), you need to write with
write() and not writeln().

I suppose it is a little, but I think that's more an issue with text IO
in general; for instance, even *if* readln discarded the line ending,
readln and writeln wouldn't be symmetric anyway!  If you expect them to
be, then you're in for a nasty surprise :P

 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

Well, that's Pascal/Delphi/etc., not D.

 3) in my personal experience (of a number of smaller and larger console
applications), it's much more often that I need to work with the contents of
lines (without line endings), rather than with. If you need to copy data while
preserving line endings, I would recommend using binary buffers for files - and
I've no idea why would you use standard input/output for binary data anyway.

That's a valid point; I rarely need the line endings, that said, see [1] :)

 4) it's much easier to add a line ending than to remove it.

Actually, it's not.  Removing a line ending is as simple as slicing the
string.  *Adding* a line ending could involve a heap allocation, at
least a full copy.

What's more, how can you be sure there was a line-ending there at all?
What if it's the last line and it didn't have a line ending before EOF?

 Based on the above reasons, I would like to suggest to let readln() chop line
endings, and perhaps have another function (getline?) which keeps them.

[1]

There have been a few times I've needed the line-ending, and it's a
major pain when your IO library simply refuses to give it to you.  It
should be that the call gives you the whole line *including*
line-endings, but since stripping the line of its ending is so common
there should be either another function to do that, or a nice shortcut
to get it done.

Maybe we need readln and readlt for "read line and trim"...

</2c>

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Mar 22 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com>
wrote:

 4) it's much easier to add a line ending than to remove it.

 Actually, it's not.  Removing a line ending is as simple as slicing the
 string.  *Adding* a line ending could involve a heap allocation, at
 least a full copy.

I was actually talking about the complexity of the source, not the efficiency
of the generated code.
When readln gives you the line with a line ending, you have three cases:
1) a CR/LF line ending (Windows)
2) LF line ending (Unix)
3) no line ending at all (EOF)

You'd need to account for every of these when removing the line endings - and
write this code every time you're writing an app which just needs the contents
of lines from standard input - which, as you have agreed, is quite common.

 What's more, how can you be sure there was a line-ending there at all?
 What if it's the last line and it didn't have a line ending before EOF?

IMHO, most tools which work with standard input don't really need to know if
the last line has a line break at the end :)

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Mar 22 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com>
wrote:
 
 4) it's much easier to add a line ending than to remove it.

 Actually, it's not.  Removing a line ending is as simple as slicing the
 string.  *Adding* a line ending could involve a heap allocation, at
 least a full copy.

 
 I was actually talking about the complexity of the source, not the efficiency
of the generated code.
 When readln gives you the line with a line ending, you have three cases:
 1) a CR/LF line ending (Windows)
 2) LF line ending (Unix)
 3) no line ending at all (EOF)
 
 You'd need to account for every of these when removing the line endings - and
write this code every time you're writing an app which just needs the contents
of lines from standard input - which, as you have agreed, is quite common.

import std.string;

auto line = readln().chomp();

:)

 What's more, how can you be sure there was a line-ending there at all?
 What if it's the last line and it didn't have a line ending before EOF?

 
 IMHO, most tools which work with standard input don't really need to know if
the last line has a line break at the end :)

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Mar 22 2007

=?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:

Vladimir Panteleev wrote:

 When readln gives you the line with a line ending, you have three cases:
 1) a CR/LF line ending (Windows)
 2) LF line ending (Unix)
 3) no line ending at all (EOF)

Actually it is even four:
4) CR line ending (Mac)

But that's just for files coming from the old Mac OS (9),
normally Mac OS X uses Unix linefeeds for line endings...

--anders

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Anders F Bj=C3=B6rklund wrote:
 Vladimir Panteleev wrote:
=20
 When readln gives you the line with a line ending, you have three case=


s:
 1) a CR/LF line ending (Windows)
 2) LF line ending (Unix)
 3) no line ending at all (EOF)

=20
 Actually it is even four:
 4) CR line ending (Mac)
=20
 But that's just for files coming from the old Mac OS (9),
 normally Mac OS X uses Unix linefeeds for line endings...

I have some of these also. Legacy applications are not the most, but=20
they work, and for me that's it.

Ciao

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Vladimir Panteleev wrote:
 On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For
Email) <SeeWebsiteForEmail erdani.org> wrote:
 
 Essentially it's about information. The naive loop:

 while (readln(line)) {
    write(line);
 }

 is guaranteed 100% to produce an accurate copy of its input. The version
 that chops lines looks like:

 while (readln(line)) {
    writeln(line);
 }

 
 I'd just like to say that the chosen naming convention seems a bit unintuitive
to me out of the following reasons:
 
 1) it seems odd that what you read with readln(), you need to write with
write() and not writeln().

"Read a line. Write what you've read. Rinse. Lather. Repeat."

 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

That's a mistake, simple as that. Pascal has made many other similar 
mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html.

 3) in my personal experience (of a number of smaller and larger console
applications), it's much more often that I need to work with the contents of
lines (without line endings), rather than with. If you need to copy data while
preserving line endings, I would recommend using binary buffers for files - and
I've no idea why would you use standard input/output for binary data anyway.

I understand that. But again, getting rid of information when you have 
it is a much better proposition than regaining information when you 
irremediably lost.

Think that a file is produced by a utility or transmission that sends 
messages separated by a single-char or multi-char separator. If your 
reading primitive omits the separator, you don't know whether the last 
line is a fragment of a broken transmission or a valid line.

"Just call chomp."

 4) it's much easier to add a line ending than to remove it.

It's been already said: it's cheaper to remove it in all circumstances.

 Based on the above reasons, I would like to suggest to let readln() chop line
endings, and perhaps have another function (getline?) which keeps them.

For more balkanization, cognitive load, and confusion?

"Just call chomp."


Andrei

Mar 22 2007

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 22 Mar 2007 18:14:14 +0200, Andrei Alexandrescu (See Website For Email)
<SeeWebsiteForEmail erdani.org> wrote:

 Vladimir Panteleev wrote:
 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's
ReadLn doesn't preserve line endings.

 That's a mistake, simple as that. Pascal has made many other similar
 mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html.

<offtopic> That article has been written a quarter of a century ago, and
doesn't really represent the state of the latest Pascal
versions/implementations out there (the most prominent being Borland Delphi and
FreePascal). That said, switching from Pascal to D is still quite a great
experience for me nevertheless. </offtopic>

 "Just call chomp."

Ah, yes, missed that one.

<nitpick> But even so, you'd have to check for line endings twice - when
reading the stdin stream, and when calling chomp ;) </nitpick>

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[...]
 "Just call chomp."

Just add a call to chomp to your benchmarks.

Ciao

Mar 23 2007

Walter Bright <newshound digitalmars.com> writes:

kris wrote:
 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
 what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Here's the new std.stdio work in progress (doesn't yet include write()). 
Free free to leverage it as you see fit for Tango.
Some features of note:
1) It peeks under the hood of C's stdio implementation, meaning it's 
customized for Digital Mars' stdio, and gcc's stdio.
2) It throws on I/O errors.
3) Unlike C's stdio, it can handle streams of either wide or regular chars.
4) It does not go as far as directly using Posix read/write functions or 
Windows API functions. We wished to avoid that in the interests of 
interoperability with C's stdio.
5) It is fully interoperable with, and is synced with, C's stdio.
6) Note how nicely scope(exit) makes the code more readable!
----------------------------------------

// Written in the D programming language.

/* Written by Walter Bright and Andrei Alexandrescu
  * www.digitalmars.com
  * Placed in the Public Domain.
  */

/********************************
  * Standard I/O functions that extend $(B std.c.stdio).
  * $(B std.c.stdio) is automatically imported when importing
  * $(B std.stdio).
  * Macros:
  *	WIKI=Phobos/StdStdio
  */

module std.stdio;

public import std.c.stdio;

import std.format;
import std.utf;
import std.string;
import std.gc;
import std.c.stdlib;
import std.c.string;
import std.c.stddef;


version (DigitalMars)
{
     version (Windows)
     {
	// Specific to the way Digital Mars C does stdio
	version = DIGITAL_MARS_STDIO;
     }
}

version (DIGITAL_MARS_STDIO)
{
}
else
{
     // Specific to the way Gnu C does stdio
     version = GCC_IO;
     import std.c.linux.linux;
}

version (DIGITAL_MARS_STDIO)
{
     extern (C)
     {
	/* **
	 * Digital Mars under-the-hood C I/O functions
	 */
	int _fputc_nlock(int, FILE*);
	int _fputwc_nlock(int, FILE*);
	int _fgetc_nlock(FILE*);
	int _fgetwc_nlock(FILE*);
	int __fp_lock(FILE*);
	void __fp_unlock(FILE*);
     }
     alias _fputc_nlock FPUTC;
     alias _fputwc_nlock FPUTWC;
     alias _fgetc_nlock FGETC;
     alias _fgetwc_nlock FGETWC;

     alias __fp_lock FLOCK;
     alias __fp_unlock FUNLOCK;
}
else version (GCC_IO)
{
     /* **
      * Gnu under-the-hood C I/O functions; see
      * 
http://www.gnu.org/software/libc/manual/html_node/I_002fO-on-Streams.html#I_002fO-on-Streams
      */
     extern (C)
     {
	int fputc_unlocked(int, FILE*);
	int fputwc_unlocked(wchar_t, FILE*);
	int fgetc_unlocked(FILE*);
	int fgetwc_unlocked(FILE*);
	void flockfile(FILE*);
	void funlockfile(FILE*);
	ssize_t getline(char**, size_t*, FILE*);
	ssize_t getdelim (char**, size_t*, int, FILE*);
     }

     alias fputc_unlocked FPUTC;
     alias fputwc_unlocked FPUTWC;
     alias fgetc_unlocked FGETC;
     alias fgetwc_unlocked FGETWC;

     alias flockfile FLOCK;
     alias funlockfile FUNLOCK;
}
else
{
     static assert(0, "unsupported C I/O system");
}


/*********************
  * Thrown if I/O errors happen.
  */
class StdioException : Exception
{
     uint errno;			// operating system error code

     this(char[] msg)
     {
	super(msg);
     }

     this(uint errno)
     {	char* s = strerror(errno);
	super(std.string.toString(s).dup);
     }

     static void opCall(char[] msg)
     {
	throw new StdioException(msg);
     }

     static void opCall()
     {
	throw new StdioException(getErrno());
     }
}

private
void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int 
newline=false)
{   int orientation;

     orientation = fwide(fp, 0);

     /* Do the file stream locking at the outermost level
      * rather than character by character.
      */
     FLOCK(fp);
     scope(exit) FUNLOCK(fp);

     if (orientation <= 0)		// byte orientation or no orientation
     {
	void putc(dchar c)
	{
	    if (c <= 0x7F)
	    {
		FPUTC(c, fp);
	    }
	    else
	    {   char[4] buf;
		char[] b;

		b = std.utf.toUTF8(buf, c);
		for (size_t i = 0; i < b.length; i++)
		    FPUTC(b[i], fp);
	    }
	}

	std.format.doFormat(&putc, arguments, argptr);
	if (newline)
	    FPUTC('\n', fp);
     }
     else if (orientation > 0)		// wide orientation
     {
	version (Windows)
	{
	    void putcw(dchar c)
	    {
		assert(isValidDchar(c));
		if (c <= 0xFFFF)
		{
		    FPUTWC(c, fp);
		}
		else
		{   wchar[2] buf;

		    buf[0] = cast(wchar) ((((c - 0x10000) >> 10) & 0x3FF) + 0xD800);
		    buf[1] = cast(wchar) (((c - 0x10000) & 0x3FF) + 0xDC00);
		    FPUTWC(buf[0], fp);
		    FPUTWC(buf[1], fp);
		}
	    }
	}
	else version (linux)
	{
	    void putcw(dchar c)
	    {
		FPUTWC(c, fp);
	    }
	}
	else
	{
	    static assert(0);
	}

	std.format.doFormat(&putcw, arguments, argptr);
	if (newline)
	    FPUTWC('\n', fp);
     }
}


/***********************************
  * Arguments are formatted per the
  * $(LINK2 std_format.html#format-string, format strings)
  * and written to $(B stdout).
  */

void writef(...)
{
     writefx(stdout, _arguments, _argptr, 0);
}

/***********************************
  * Same as $(B writef), but a newline is appended
  * to the output.
  */

void writefln(...)
{
     writefx(stdout, _arguments, _argptr, 1);
}

/***********************************
  * Same as $(B writef), but output is sent to the
  * stream fp instead of $(B stdout).
  */

void fwritef(FILE* fp, ...)
{
     writefx(fp, _arguments, _argptr, 0);
}

/***********************************
  * Same as $(B writefln), but output is sent to the
  * stream fp instead of $(B stdout).
  */

void fwritefln(FILE* fp, ...)
{
     writefx(fp, _arguments, _argptr, 1);
}

/**********************************
  * Read line from stream fp.
  * Returns:
  *	null for end of file,
  *	char[] for line read from fp, including terminating '\n'
  * Params:
  *	fp = input stream
  * Throws:
  *	$(B StdioException) on error
  * Example:
  *	Reads $(B stdin) and writes it to $(B stdout).
---
import std.stdio;

int main()
{
     char[] buf;
     while ((buf = readln()) != null)
	writef("%s", buf);
     return 0;
}
---
  */
char[] readln(FILE* fp = stdin)
{
     char[] buf;
     readln(fp, buf);
     return buf;
}

/**********************************
  * Read line from stream fp and write it to buf[],
  * including terminating '\n'.
  *
  * This is often faster than readln(FILE*) because the buffer
  * is reused each call. Note that reusing the buffer means that
  * the previous contents of it need to be copied if needed.
  * Params:
  *	fp = input stream
  *	buf = buffer used to store the resulting line data. buf
  *		is resized as necessary.
  * Returns:
  *	0 for end of file, otherwise
  *	number of characters read
  * Throws:
  *	$(B StdioException) on error
  * Example:
  *	Reads $(B stdin) and writes it to $(B stdout).
---
import std.stdio;

int main()
{
     char[] buf;
     while (readln(stdin, buf))
	writef("%s", buf);
     return 0;
}
---
  */
size_t readln(FILE* fp, inout char[] buf)
{
     version (DIGITAL_MARS_STDIO)
     {
	FLOCK(fp);
	scope(exit) FUNLOCK(fp);

	if (__fhnd_info[fp._file] & FHND_WCHAR)
	{   /* Stream is in wide characters.
	     * Read them and convert to chars.
	     */
	    static assert(wchar_t.sizeof == 2);
	    buf.length = 0;
	    int c2;
	    for (int c; (c = FGETWC(fp)) != -1; )
	    {
		if ((c & ~0x7F) == 0)
		{   buf ~= c;
		    if (c == '\n')
			break;
		}
		else
		{
		    if (c >= 0xD800 && c <= 0xDBFF)
		    {
			if ((c2 == FGETWC(fp)) != -1 ||
			    c2 < 0xDC00 && c2 > 0xDFFF)
			{
			    StdioException("unpaired UTF-16 surrogate");
			}
			c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00);
		    }
		    std.utf.encode(buf, c);
		}
	    }
	    if (ferror(fp))
		StdioException();
	    return buf.length;
	}

	auto sz = std.gc.capacity(buf.ptr);
	//auto sz = buf.length;
	buf = buf.ptr[0 .. sz];
	if (fp._flag & _IONBF)
	{
	    /* Use this for unbuffered I/O, when running
	     * across buffer boundaries, or for any but the common
	     * cases.
	     */
	 L1:
	    char *p;

	    if (sz)
	    {
		p = buf.ptr;
	    }
	    else
	    {
		sz = 64;
		p = cast(char*) std.gc.malloc(sz);
		std.gc.hasNoPointers(p);
		buf = p[0 .. sz];
	    }
	    size_t i = 0;
	    for (int c; (c = FGETC(fp)) != -1; )
	    {
		if ((p[i] = c) != '\n')
		{
		    i++;
		    if (i < sz)
			continue;
		    buf = p[0 .. i] ~ readln(fp);
		    return buf.length;
		}
		else
		{
		    buf = p[0 .. i + 1];
		    return i + 1;
		}
	    }
	    if (ferror(fp))
		StdioException();
	    buf = p[0 .. i];
	    return i;
	}
	else
	{
	    int u = fp._cnt;
	    char* p = fp._ptr;
	    int i;
	    if (fp._flag & _IOTRAN)
	    {   /* Translated mode ignores \r and treats ^Z as end-of-file
		 */
		char c;
		while (1)
		{
		    if (i == u)		// if end of buffer
			goto L1;	// give up
		    c = p[i];
		    i++;
		    if (c != '\r')
		    {
			if (c == '\n')
			    break;
			if (c != 0x1A)
			    continue;
			goto L1;
		    }
		    else
		    {   if (i != u && p[i] == '\n')
			    break;
			goto L1;
		    }
		}
		if (i > sz)
		{
		    buf = cast(char[])std.gc.malloc(i);
		    std.gc.hasNoPointers(buf.ptr);
		}
		if (i - 1)
		    memcpy(buf.ptr, p, i - 1);
		buf[i - 1] = '\n';
		if (c == '\r')
		    i++;
	    }
	    else
	    {
		while (1)
		{
		    if (i == u)		// if end of buffer
			goto L1;	// give up
		    auto c = p[i];
		    i++;
		    if (c == '\n')
			break;
		}
		if (i > sz)
		{
		    buf = cast(char[])std.gc.malloc(i);
		    std.gc.hasNoPointers(buf.ptr);
		}
		memcpy(buf.ptr, p, i);
	    }
	    fp._cnt -= i;
	    fp._ptr += i;
	    buf = buf[0 .. i];
	    return i;
	}
     }
     else version (GCC_IO)
     {
	if (fwide(fp, 0) > 0)
	{   /* Stream is in wide characters.
	     * Read them and convert to chars.
	     */
	    FLOCK(fp);
	    scope(exit) FUNLOCK(fp);
	    version (Windows)
	    {
		buf.length = 0;
		int c2;
		for (int c; (c = FGETWC(fp)) != -1; )
		{
		    if ((c & ~0x7F) == 0)
		    {   buf ~= c;
			if (c == '\n')
			    break;
		    }
		    else
		    {
			if (c >= 0xD800 && c <= 0xDBFF)
			{
			    if ((c2 == FGETWC(fp)) != -1 ||
				c2 < 0xDC00 && c2 > 0xDFFF)
			    {
				StdioException("unpaired UTF-16 surrogate");
			    }
			    c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00);
			}
			std.utf.encode(buf, c);
		    }
		}
		if (ferror(fp))
		    StdioException();
		return buf.length;
	    }
	    else version (linux)
	    {
		buf.length = 0;
		for (int c; (c = FGETWC(fp)) != -1; )
		{
		    if ((c & ~0x7F) == 0)
			buf ~= c;
		    else
			std.utf.encode(buf, cast(dchar)c);
		    if (c == '\n')
			break;
		}
		if (ferror(fp))
		    StdioException();
		return buf.length;
	    }
	    else
	    {
		static assert(0);
	    }
	}

	char *lineptr = null;
	size_t n = 0;
	auto s = getdelim(&lineptr, &n, '\n', fp);
	if (s < 0)
	{
	    if (ferror(fp))
		StdioException();
	    buf.length = 0;		// end of file
	    return 0;
	}
	scope(exit) free(lineptr);
	buf = buf.ptr[0 .. std.gc.capacity(buf.ptr)];
	if (s <= buf.length)
	{
	    buf.length = s;
	    buf[] = lineptr[0 .. s];
	}
	else
	{
	    buf = lineptr[0 .. s].dup;
	}
	return s;
     }
     else
     {
	static assert(0);
     }
}

/** ditto */
size_t readln(inout char[] buf)
{
     return readln(stdin, buf);
}

Mar 21 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 kris wrote:
 c) on Linux, tango.io uses the c-lib posix.read/write functions. Is 
 that what phobos uses also? (on Win32, Tango uses direct Win32 calls 
 instead)

 
 Here's the new std.stdio work in progress (doesn't yet include write()). 
 Free free to leverage it as you see fit for Tango.
 Some features of note:
 1) It peeks under the hood of C's stdio implementation, meaning it's 
 customized for Digital Mars' stdio, and gcc's stdio.
 2) It throws on I/O errors.
 3) Unlike C's stdio, it can handle streams of either wide or regular chars.
 4) It does not go as far as directly using Posix read/write functions or 
 Windows API functions. We wished to avoid that in the interests of 
 interoperability with C's stdio.
 5) It is fully interoperable with, and is synced with, C's stdio.
 6) Note how nicely scope(exit) makes the code more readable!

snip]
 private
 void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int 
 newline=false)

[snip]

Oh, I meant to say that a while ago: some experiments I've done show 
that doing formatting with templates and direct calls is significantly 
faster than the way writefx does it (with a delegate). Probably that 
should be changed. writefln is still very slow. Changing the loop to:

void main() {
   char[] line;
   while (readln(line)) {
     writef("%s", line);
   }
}

yields:

17.8s		dcat

Fortunately, up-and-coming template features will allow the library to 
detect statically known format strings and parse them to render the most 
efficient writing method. And reading, too. I already have a prototype 
readfln function that statically figures out the correctness of its 
format string.


Andrei

Mar 21 2007

Roberto Mariottini <rmariottini mail.com> writes:

Walter Bright wrote:
 /**********************************
  * Read line from stream fp and write it to buf[],
  * including terminating '\n'.

Nooo!
Please get rid of such a awful Perl-ish hack!

Ciao

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Roberto Mariottini wrote:
 Walter Bright wrote:
 /**********************************
  * Read line from stream fp and write it to buf[],
  * including terminating '\n'.

 
 Nooo!
 Please get rid of such a awful Perl-ish hack!

Please justify your statements instead of using emotion, rhetoric, and 
implied assumptions.

Andrei

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Andrei Alexandrescu (See Website For Email) wrote:
[...]
 Please justify your statements instead of using emotion, rhetoric, and 
 implied assumptions.

See my previous post.

Ciao

Mar 22 2007

Derek Parnell <derek nomail.afraid.org> writes:

On Wed, 21 Mar 2007 14:36:10 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

 I'll mention here that it's quite disappointing 
 that Tango's idiomatic method of reading a line from the console 
 (Cin.nextLine(line) unless I missed something) chose to chop the newline 
 automatically. The Perl book spends half a page or so explaining why 
 it's _good_ that the newline is included in the line, and I've been 
 thankful for that on numerous occasions when writing Perl. 

LOL ... That is odd because in nearly every program I ever write that reads
text lines, the first thing I need to do after I read in the line is to
strip off the bloody newline character.

 Please put the newline back in the line.

... but leave in the option of reading it without a newline attached.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 10:12:54 AM

Mar 21 2007

Sean Kelly <sean f4.ca> writes:

Andrei Alexandrescu (See Website For Email) wrote:
 
 I passed a 31 MB text file (containing a dictionary that I'm using in my 
 research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):
 
 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

For what it's worth, I created a Win32 version of the Unix 'time' 
command recently.  Not too complicated, but if anyone is interested, I 
have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
quick and dirty implementation, but works for how I typically use it.

Mar 22 2007

Walter Bright <newshound digitalmars.com> writes:

Sean Kelly wrote:
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.


Alternatively,
  http://www.digitalmars.com/techtips/timing_code.html

Mar 22 2007

"Kristian Kilpi" <kjkilpi gmail.com> writes:

On Thu, 22 Mar 2007 19:45:16 +0200, Walter Bright  =

<newshound digitalmars.com> wrote:
 Sean Kelly wrote:
 For what it's worth, I created a Win32 version of the Unix 'time'  =


 command recently.  Not too complicated, but if anyone is interested, =


I  =

 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's =


a  =

 quick and dirty implementation, but works for how I typically use it.=


 Alternatively,
   http://www.digitalmars.com/techtips/timing_code.html

BTW, the following line (printed in bold in 'timing_code.html'):

   auto Timer t =3D new Timer();

uses 'auto' instead of 'scope'. There is also another identical line at =
 =

the bottom of the page.

I think I should also mention that DMD v1.007 uses 'auto' in the error  =

message 'Error: variable XXX reference to auto class must be auto'  =

(happens when a scope class object is declared without the 'scope'  =

keyword). Is this minor glitch corrected in DMD v1.009?

Mar 22 2007

Walter Bright <newshound digitalmars.com> writes:

Thanks for the tip, that needs to be fixed.

Mar 22 2007

torhu <fake address.dude> writes:

Sean Kelly wrote:
<snip>
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

Looks useful, my own tool just measures 'real' time.  But it breaks when 
using redirection, either way:

redirect stdin:
---
c:\prog\test\linetest>ptime cat <test.txt
cat: -: Bad file descriptor

real    0m0.000s
user    0m0.010s
sys     0m0.000s

---

redirect stdout:
---
c:\prog\test\linetest>ptime cat test.txt >NUL

---

The last one outputs nothing.  Printing to stderr would fix that.

Mar 23 2007

Sean Kelly <sean f4.ca> writes:

torhu wrote:
 Sean Kelly wrote:
 <snip>
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

 
 Looks useful, my own tool just measures 'real' time.  But it breaks when 
 using redirection, either way:
 
 redirect stdin:
 ---
 c:\prog\test\linetest>ptime cat <test.txt
 cat: -: Bad file descriptor
 
 real    0m0.000s
 user    0m0.010s
 sys     0m0.000s
 
 ---
 
 redirect stdout:
 ---
 c:\prog\test\linetest>ptime cat test.txt >NUL
 
 ---
 
 The last one outputs nothing.  Printing to stderr would fix that.

Hm, I suspect IO redirection must be a feature of the shell.  It's a bit 
of a hack, but this may work "ptime cmd /c cat < test.txt."  I'll see 
how complicated a real fix would be.


Sean

Mar 23 2007

torhu <fake address.dude> writes:

Sean Kelly wrote:
 Hm, I suspect IO redirection must be a feature of the shell.  It's a bit 
 of a hack, but this may work "ptime cmd /c cat < test.txt."  I'll see 
 how complicated a real fix would be.

I get the same error.  My own tool doesn't have such problems, but it 
only uses the standard C system() function.  Which might be too limited 
for what your tool does.

Mar 23 2007

Sean Kelly <sean f4.ca> writes:

torhu wrote:
 Sean Kelly wrote:
 Hm, I suspect IO redirection must be a feature of the shell.  It's a 
 bit of a hack, but this may work "ptime cmd /c cat < test.txt."  I'll 
 see how complicated a real fix would be.

 
 I get the same error.  My own tool doesn't have such problems, but it 
 only uses the standard C system() function.  Which might be too limited 
 for what your tool does.

Yeah, mine uses CreateProcess and then GetProcessTimes.  I'll give the 
docs a look later and see if I can figure out why it's not working.

Mar 23 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in 
 my research) through each of the programs above. The output was set to 
 /dev/null. I've ran the same program multiple times before the actual 
 test, so everything is cached and the process becomes 
 computationally-bound. Here are the results summed for 10 consecutive 
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

 
 For what it's worth, I created a Win32 version of the Unix 'time' 
 command recently.  Not too complicated, but if anyone is interested, I 
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a 
 quick and dirty implementation, but works for how I typically use it.

I was looking for something like this just the other day.
Link seems to be dead these days.  Is there a new URL for it?

--bb

Apr 20 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in
 my research) through each of the programs above. The output was set to
 /dev/null. I've ran the same program multiple times before the actual
 test, so everything is cached and the process becomes
 computationally-bound. Here are the results summed for 10 consecutive
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

 For what it's worth, I created a Win32 version of the Unix 'time'
 command recently.  Not too complicated, but if anyone is interested, I
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a
 quick and dirty implementation, but works for how I typically use it.

 I was looking for something like this just the other day.
 Link seems to be dead these days.  Is there a new URL for it?

I switched web hosts and have yet to re-upload all my old content.  I'll see
about getting this zipfile up in the next few days.


Sean

Apr 21 2008

Sean Kelly <sean invisibleduck.org> writes:

Sean Kelly wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Sean Kelly wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 I passed a 31 MB text file (containing a dictionary that I'm using in
 my research) through each of the programs above. The output was set to
 /dev/null. I've ran the same program multiple times before the actual
 test, so everything is cached and the process becomes
 computationally-bound. Here are the results summed for 10 consecutive
 runs (averaged over 5 epochs):

 13.9s        Tango
 6.6s        Perl
 5.0s        std.stdio

 For what it's worth, I created a Win32 version of the Unix 'time'
 command recently.  Not too complicated, but if anyone is interested, I
 have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip  It's a
 quick and dirty implementation, but works for how I typically use it.

 I was looking for something like this just the other day.
 Link seems to be dead these days.  Is there a new URL for it?

 
 I switched web hosts and have yet to re-upload all my old content.  I'll see
 about getting this zipfile up in the next few days.

Okay, I've uploaded it here:

http://invisibleduck.org/sean/tmp/ptime.zip


Sean

Apr 21 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Andrei Alexandrescu (See Website For Email) wrote:

 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

I have uploaded a snapshot with prebuilt libraries to 

http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz

The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual,
and add libtango.a to your compile command. The test program (io.d below)
should be compiled using the line

dmd -O -release -inline io.d libtango.a

io.d
-------

import tango.io.Console;

void main() {
    char[] line;
    // true means that newlines are retained
    while (Cin.nextLine(line, true))
        Cout(line);
}

--------

For the sake of reference, I created a file with 1.8 million (equal) lines,
at a total of 133 Megabytes. I ran it through the above program, and your
Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD
1.009.

Average times perl program : 1.65 seconds (real), 1.45 seconds (user)
Average times tango program: 1.08 seconds (real), 0.91 seconds (user)

Note that I also tried without the optimization flags to DMD, which resulted
in times that were about 10% faster than Perl.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Mar 22 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Lars Ivar Igesund wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

 
 I have uploaded a snapshot with prebuilt libraries to 
 
 http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz
 
 The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual,
 and add libtango.a to your compile command. The test program (io.d below)
 should be compiled using the line
 
 dmd -O -release -inline io.d libtango.a
 
 io.d
 -------
 
 import tango.io.Console;
 
 void main() {
     char[] line;
     // true means that newlines are retained
     while (Cin.nextLine(line, true))
         Cout(line);
 }
 
 --------

5.0s		tcat

Neat! Now that we got the performance problem out of the way, let's 
discuss stdio compatibility. I suggest you use getline on GNU platforms.

Andrei

Mar 22 2007

Lars Ivar Igesund <larsivar igesund.net> writes:

Andrei Alexandrescu (See Website For Email) wrote:

 Lars Ivar Igesund wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 
 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

 
 I have uploaded a snapshot with prebuilt libraries to
 
 http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz
 
 The prebuilt libraries are in the lib/ folder. Install libphobos.a as
 usual, and add libtango.a to your compile command. The test program (io.d
 below) should be compiled using the line
 
 dmd -O -release -inline io.d libtango.a
 
 io.d
 -------
 
 import tango.io.Console;
 
 void main() {
     char[] line;
     // true means that newlines are retained
     while (Cin.nextLine(line, true))
         Cout(line);
 }
 
 --------

 
 5.0s          tcat
 
 Neat! Now that we got the performance problem out of the way, let's
 discuss stdio compatibility. I suggest you use getline on GNU platforms.
 
 Andrei

Maybe discuss first why stdio compatibility is needed? Is the equivalent
functionality missing in Tango, and if so, would implementing it in Tango
remove this need for compatibility?

Then consider the hypothetical situation where all of libc functionality
(including posix functionality currently used in Tango, system calls, etc)
is exchanged with an equivalent libd. Somewhat depending on answer above,
would same reasoning apply?

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Mar 23 2007

"Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:

Lars Ivar Igesund wrote:
 Neat! Now that we got the performance problem out of the way, let's
 discuss stdio compatibility. I suggest you use getline on GNU platforms.

 Andrei

 
 Maybe discuss first why stdio compatibility is needed? Is the equivalent
 functionality missing in Tango, and if so, would implementing it in Tango
 remove this need for compatibility?

As long as the global "stdin" symbol is a FILE*, this would be highly 
recommendable. And given that phobos does offer stdin as a FILE*, stdio 
compatibility is important for programs that want to use phobos and 
tango simultaneously (e.g., a library using phobos linked with another 
one using tango).


Andrei

Mar 23 2007

Lars Ivar Igesund <larsivar igesund.net> writes:

Andrei Alexandrescu (See Website For Email) wrote:

 Lars Ivar Igesund wrote:
 Neat! Now that we got the performance problem out of the way, let's
 discuss stdio compatibility. I suggest you use getline on GNU platforms.

 Andrei

 
 Maybe discuss first why stdio compatibility is needed? Is the equivalent
 functionality missing in Tango, and if so, would implementing it in Tango
 remove this need for compatibility?

 
 As long as the global "stdin" symbol is a FILE*, this would be highly
 recommendable. And given that phobos does offer stdin as a FILE*, stdio
 compatibility is important for programs that want to use phobos and
 tango simultaneously (e.g., a library using phobos linked with another
 one using tango).

May I then suggest that you create a enhancement/wishlist ticket for this?
Thanks :)

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Mar 23 2007

Davidl <Davidl 126.com> writes:

great job!
i didn't know I/O performance could variate in such a great range.
and thanks for the great job from tango team.
heh, now d's I/O is as fast as c ?
or tango is even faster than C's I/O?

 Andrei Alexandrescu (See Website For Email) wrote:

 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming
 release), and Tango.

 I have uploaded a snapshot with prebuilt libraries to

 http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz

 The prebuilt libraries are in the lib/ folder. Install libphobos.a as  
 usual,
 and add libtango.a to your compile command. The test program (io.d below)
 should be compiled using the line

 dmd -O -release -inline io.d libtango.a

 io.d
 -------

 import tango.io.Console;

 void main() {
     char[] line;
     // true means that newlines are retained
     while (Cin.nextLine(line, true))
         Cout(line);
 }

 --------

 For the sake of reference, I created a file with 1.8 million (equal)  
 lines,
 at a total of 133 Megabytes. I ran it through the above program, and your
 Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD
 1.009.

 Average times perl program : 1.65 seconds (real), 1.45 seconds (user)
 Average times tango program: 1.08 seconds (real), 0.91 seconds (user)

 Note that I also tried without the optimization flags to DMD, which  
 resulted
 in times that were about 10% faster than Perl.

Mar 22 2007

Sean Kelly <sean f4.ca> writes:

Davidl wrote:
 great job!
 i didn't know I/O performance could variate in such a great range.
 and thanks for the great job from tango team.
 heh, now d's I/O is as fast as c ?
 or tango is even faster than C's I/O?

Tango is faster, at least for this particular test.


Sean

Mar 22 2007

Dave <Dave_member pathlink.com> writes:

Walter Bright Wrote:

 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

 
 I don't know exactly what sync'ing does in C++, but probably it isn't 
 the locking that you are thinking of.

 
 I think it means bringing the iostream I/O buffer in to sync with the 
 stdio I/O buffer, i.e. you can mix printf and iostream output and it 
 will appear in the same order the calls happen in the code.
 

That's exactly what it does... Quite a few times I've had to 'optimize' C++
iostream code using sync_with_stdio().

 D's readln is inherently synced in this manner.

Which of course begs the question -- Could an overload be added so it doesn't
sync (not the default)? Might be worth a test, and if the difference is
significant keep it.

Mar 22 2007

Walter Bright <newshound digitalmars.com> writes:

Dave wrote:
 Walter Bright Wrote:
 Andrei Alexandrescu (See Website For Email) wrote:
 Walter Bright wrote:
 Turning off sync is cheating - D's readln does syncing.

 I don't know exactly what sync'ing does in C++, but probably it isn't 
 the locking that you are thinking of.

 I think it means bringing the iostream I/O buffer in to sync with the 
 stdio I/O buffer, i.e. you can mix printf and iostream output and it 
 will appear in the same order the calls happen in the code.

 
 That's exactly what it does... Quite a few times I've had to 'optimize' C++
iostream code using sync_with_stdio().
 
 D's readln is inherently synced in this manner.

 
 Which of course begs the question -- Could an overload be added so it doesn't
sync (not the default)? Might be worth a test, and if the difference is
significant keep it.

Since the data has to be buffered anyway, might as well use stdio's 
buffer. I don't know why iostream felt the need to reimplement the 
buffers - certainly it isn't for performance <g>.

Mar 22 2007

Roberto Mariottini <rmariottini mail.com> writes:

Hi,
I have got no reply to my questions.
Can somebody answer them?

Ciao

-------- Original Message --------
Subject: Re: stdio performance in tango, stdlib, and perl
Date: Fri, 23 Mar 2007 10:08:24 +0100
From: Roberto Mariottini <rmariottini mail.com>
Organization: Digital Mars
Newsgroups: digitalmars.D
References: <4601A54A.8050307 erdani.org> 
<etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
<etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
<ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>

Andrei Alexandrescu (See Website For Email) wrote:
 Roberto Mariottini wrote:

[...]
 Essentially it's about information. The naive loop:

 while (readln(line)) {
   write(line);
 }

 I'm completely against that awful mess of code.

 What exactly would be bad about it?

It's not clearly evident for a non-expert programmer that a new-line is
appended at each line.
Take any programmer from any language of your choice and ask what this
snippets is supposed to do.
This is against immediate comprehension of code.

 is guaranteed 100% to produce an accurate copy of its input. The
 version that chops lines looks like:

 while (readln(line)) {
   writeln(line);
 }

 This may or may not add a newline to the output, possibly creating a
 file larger by one byte.

 Are you sure? Can you elaborate more on this?

 Very simple. If the file ends with a newline, the code reproduces it. If
 not, the code gratuitously appends a newline.

A newline is two bytes here.

 Moreover, with the automated chopping it is basically impossible to
 write a program that exactly reproduces its input because readln
 essentially loses information.

A text file is not a binary file.
A newline at end of file is completely irrelevant.

On the other side, no code should break if the last newline is there or
not. The problem with your code is that the last line comes different
from the others.

 Also, stdio also offers a readln() that creates a new line on every
 call. That is useful if you want fresh lines every read:

 char[] line;
 while ((line = readln()).length > 0) {
   ++dictionary[line];
 }

 This way you'll get two different dictionaries on Windows and on Unix.
 Wrong, very wrong.

 Yes, wrong, very wrong. Except it's not me who's wrong :o).

Ehm, can you elaborate how good is to put a '\n' at the end of any
string when working with:

  - databases
  - communication programs
  - interprocess communication
  - distributed computing

 The code _just works_ because an empty line means _precisely_ and
 without the shadow of a doubt that the file has ended. (An I/O error
 throws an exception, and does NOT return an empty line; that is
 another important point.) An API that uses automated chopping should
 not offer such a function because an empty line may mean that an
 empty line was read, or that it's eof time. So the API would force
 people to write convoluted code.

 What is your definition of "convolute"?
 I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.

 You are objectively wrong.

Say 'subjectively'.
Assignments in boolean expressions should be avoided. The average
programmer knows something about this magic, but fears to touch it, and
never completely understand it.

Still, any programmer from any language would think that this code ends
at the first empty line.

Here is one of the many possible non-convoluted versions:

char[] line = readln();
while (line.length > 0) {
   ++dictionary[chomp(line)];
   line = readln();
}

And this is how it should be:

char[] line = readln();
while (line != null) {
   ++dictionary[line];
   line = readln();
}

 The code is portable. Newline translation
 takes care of it. Just try it.

Newline translation is an old problem with C, C++ and now with D.
Nothing can be resolved with newline translation.

Opening a file in binary mode on Unix and treating it like a text file
works only as long as the program is run on Unix.
Newline translation is prone to portability errors, thus non-portable.

In my experience, newline translations pose more portability problems
than it solves.

 In the couple of years I've used Perl I've thanked the Perl folks for
 their readline decision numerous times.

 Per is something the world should get rid of, quickly.
 Per is wrong, Perl is evil, Perl is useless.
 You don't need Perl, try to cease using it.

 The fact that this narrow-minded idea comes from Perl is not surprising.

 What can I say? Thanks! I'm enlightened!

You'll be more enlightened if you had to work with big CGI scripts
written in Perl, and eventually had to convert them to JSP to make the
average (available) programmers able to work on them.

Sure, with Perl you can do many things in less than 10 lines.
But keep it less than 10 lines, or you are in troubles.

 Ever tried to do cin or fscanf? You can't do any intelligent input
 with them because they skip whitespace and newlines like it's out of
 style.

 I use them, and I find them very comfortable.
 Again your definition of 'intelligent' is particular.
 If you find Perl 'intelligent', this say a lot.

 To each their own :o). Oh, probably you could explain how I can read a
 string containing spaces, followed by ":" and a number with scanf. Takes
 one line in Perl and D's readfln (not yet distributed).

scanf(" :%d", &i);

Ciao

Mar 27 2007

"David B. Held" <dheld codelogicconsulting.com> writes:

Roberto Mariottini wrote:
 Hi,
 I have got no reply to my questions.
 Can somebody answer them?

Your "questions" hardly seem sincere.  Were you not simply posturing for 
your position?  Or do you want to see endless debate on chomp() vs. no 
chomp()?

Dave

 -------- Original Message --------
 Subject: Re: stdio performance in tango, stdlib, and perl
 Date: Fri, 23 Mar 2007 10:08:24 +0100
 From: Roberto Mariottini <rmariottini mail.com>
 Organization: Digital Mars
 Newsgroups: digitalmars.D
 References: <4601A54A.8050307 erdani.org> 
 <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
 <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
 <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>

 Andrei Alexandrescu (See Website For Email) wrote:
  > Roberto Mariottini wrote:
 [...]
  >>> Essentially it's about information. The naive loop:
  >>>
  >>> while (readln(line)) {
  >>>   write(line);
  >>> }
  >>
  >> I'm completely against that awful mess of code.
  >
  > What exactly would be bad about it?

 It's not clearly evident for a non-expert programmer that a new-line is
 appended at each line.
 Take any programmer from any language of your choice and ask what this
 snippets is supposed to do.
 This is against immediate comprehension of code.

  >>> is guaranteed 100% to produce an accurate copy of its input. The
  >>> version that chops lines looks like:
  >>>
  >>> while (readln(line)) {
  >>>   writeln(line);
  >>> }
  >>>
  >>> This may or may not add a newline to the output, possibly creating a
  >>> file larger by one byte.
  >>
  >> Are you sure? Can you elaborate more on this?
  >
  > Very simple. If the file ends with a newline, the code reproduces it. If
  > not, the code gratuitously appends a newline.

 A newline is two bytes here.

  >>> Moreover, with the automated chopping it is basically impossible to
  >>> write a program that exactly reproduces its input because readln
  >>> essentially loses information.

 A text file is not a binary file.
 A newline at end of file is completely irrelevant.

 On the other side, no code should break if the last newline is there or
 not. The problem with your code is that the last line comes different
 from the others.

  >>> Also, stdio also offers a readln() that creates a new line on every
  >>> call. That is useful if you want fresh lines every read:
  >>>
  >>> char[] line;
  >>> while ((line = readln()).length > 0) {
  >>>   ++dictionary[line];
  >>> }
  >>
  >> This way you'll get two different dictionaries on Windows and on Unix.
  >> Wrong, very wrong.
  >
  > Yes, wrong, very wrong. Except it's not me who's wrong :o).

 Ehm, can you elaborate how good is to put a '\n' at the end of any
 string when working with:

  - databases
  - communication programs
  - interprocess communication
  - distributed computing

  >>> The code _just works_ because an empty line means _precisely_ and
  >>> without the shadow of a doubt that the file has ended. (An I/O error
  >>> throws an exception, and does NOT return an empty line; that is
  >>> another important point.) An API that uses automated chopping should
  >>> not offer such a function because an empty line may mean that an
  >>> empty line was read, or that it's eof time. So the API would force
  >>> people to write convoluted code.
  >>
  >> What is your definition of "convolute"?
  >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
  >
  > You are objectively wrong.

 Say 'subjectively'.
 Assignments in boolean expressions should be avoided. The average
 programmer knows something about this magic, but fears to touch it, and
 never completely understand it.

 Still, any programmer from any language would think that this code ends
 at the first empty line.

 Here is one of the many possible non-convoluted versions:

 char[] line = readln();
 while (line.length > 0) {
   ++dictionary[chomp(line)];
   line = readln();
 }

 And this is how it should be:

 char[] line = readln();
 while (line != null) {
   ++dictionary[line];
   line = readln();
 }

  > The code is portable. Newline translation
  > takes care of it. Just try it.

 Newline translation is an old problem with C, C++ and now with D.
 Nothing can be resolved with newline translation.

 Opening a file in binary mode on Unix and treating it like a text file
 works only as long as the program is run on Unix.
 Newline translation is prone to portability errors, thus non-portable.

 In my experience, newline translations pose more portability problems
 than it solves.

  >>> In the couple of years I've used Perl I've thanked the Perl folks for
  >>> their readline decision numerous times.
  >>
  >> Per is something the world should get rid of, quickly.
  >> Per is wrong, Perl is evil, Perl is useless.
  >> You don't need Perl, try to cease using it.
  >>
  >> The fact that this narrow-minded idea comes from Perl is not 
 surprising.
  >
  > What can I say? Thanks! I'm enlightened!

 You'll be more enlightened if you had to work with big CGI scripts
 written in Perl, and eventually had to convert them to JSP to make the
 average (available) programmers able to work on them.

 Sure, with Perl you can do many things in less than 10 lines.
 But keep it less than 10 lines, or you are in troubles.

  >>> Ever tried to do cin or fscanf? You can't do any intelligent input
  >>> with them because they skip whitespace and newlines like it's out of
  >>> style.
  >>
  >> I use them, and I find them very comfortable.
  >> Again your definition of 'intelligent' is particular.
  >> If you find Perl 'intelligent', this say a lot.
  >
  > To each their own :o). Oh, probably you could explain how I can read a
  > string containing spaces, followed by ":" and a number with scanf. Takes
  > one line in Perl and D's readfln (not yet distributed).

 scanf(" :%d", &i);

 Ciao

Mar 27 2007

Derek Parnell <derek psych.ward> writes:

On Tue, 27 Mar 2007 16:27:57 +0200, Roberto Mariottini wrote:

 Hi,
 I have got no reply to my questions.
 Can somebody answer them?

 Ciao

 -------- Original Message --------
 Subject: Re: stdio performance in tango, stdlib, and perl
 Date: Fri, 23 Mar 2007 10:08:24 +0100
 From: Roberto Mariottini <rmariottini mail.com>
 Organization: Digital Mars
 Newsgroups: digitalmars.D
 References: <4601A54A.8050307 erdani.org> 
 <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> 
 <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> 
 <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org>

 Andrei Alexandrescu (See Website For Email) wrote:
  > Roberto Mariottini wrote:
 [...]
  >>> Essentially it's about information. The naive loop:
  >>>
  >>> while (readln(line)) {
  >>>   write(line);
  >>> }
  >>
  >> I'm completely against that awful mess of code.
  >
  > What exactly would be bad about it?

 It's not clearly evident for a non-expert programmer that a new-line is
 appended at each line.
 Take any programmer from any language of your choice and ask what this
 snippets is supposed to do.
 This is against immediate comprehension of code.

One of the small issues I have with 'readln' appending a newline
character(s) at the end of a line is that such characters are not actually
a part of the text line; they are delimiters that separate one line from
another. In essence they are the same type of thing as the null byte that
marks the ends of a C-style string. 

If the purpose of returning the newline character(s) by readln() is to
inform the caller that a complete line was actually read in, then I would
have thought that this is 'optional' data that the caller could choose to
know about or not. If I call readln() and a complete line was not read in I
would consider this an exception. And by the way, a text file that does not
terminate with a newline is not an exception in my point of view as this
could be just a situation in which a delimiting newline is not required
(there is nothing to delimit the last from).

  >>> is guaranteed 100% to produce an accurate copy of its input. The
  >>> version that chops lines looks like:
  >>>
  >>> while (readln(line)) {
  >>>   writeln(line);
  >>> }
  >>>
  >>> This may or may not add a newline to the output, possibly creating a
  >>> file larger by one byte.
  >>
  >> Are you sure? Can you elaborate more on this?
  >
  > Very simple. If the file ends with a newline, the code reproduces it. If
  > not, the code gratuitously appends a newline.

 A newline is two bytes here.

Som reanln() implementations disregard the actual newline as supplied by
the operating system and just append a single 0x0A byte for all operating
systems. And when it comes to outputing this, it is transformed back into
the appropriate newline sequence for the running opsys.

  >>> Moreover, with the automated chopping it is basically impossible to
  >>> write a program that exactly reproduces its input because readln
  >>> essentially loses information.

 A text file is not a binary file.
 A newline at end of file is completely irrelevant.

Exactly. It is merely a delimiter *between* lines.

 On the other side, no code should break if the last newline is there or
 not. The problem with your code is that the last line comes different
 from the others.

The last line does not need a delimiter - so some systems make it optional.

  >>> Also, stdio also offers a readln() that creates a new line on every
  >>> call. That is useful if you want fresh lines every read:
  >>>
  >>> char[] line;
  >>> while ((line = readln()).length > 0) {
  >>>   ++dictionary[line];
  >>> }
  >>
  >> This way you'll get two different dictionaries on Windows and on Unix.
  >> Wrong, very wrong.
  >
  > Yes, wrong, very wrong. Except it's not me who's wrong :o).

 Ehm, can you elaborate how good is to put a '\n' at the end of any
 string when working with:

   - databases
   - communication programs
   - interprocess communication
   - distributed computing

Does not make a lot of sense to me either. Like I said earlier, the first
thing I usually do when reading a line is to remove the damned newline
character(s).

  >>> The code _just works_ because an empty line means _precisely_ and
  >>> without the shadow of a doubt that the file has ended. (An I/O error
  >>> throws an exception, and does NOT return an empty line; that is
  >>> another important point.) An API that uses automated chopping should
  >>> not offer such a function because an empty line may mean that an
  >>> empty line was read, or that it's eof time. So the API would force
  >>> people to write convoluted code.
  >>
  >> What is your definition of "convolute"?
  >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
  >
  > You are objectively wrong.

 Say 'subjectively'.
 Assignments in boolean expressions should be avoided. The average
 programmer knows something about this magic, but fears to touch it, and
 never completely understand it.

 Still, any programmer from any language would think that this code ends
 at the first empty line.

 Here is one of the many possible non-convoluted versions:

 char[] line = readln();
 while (line.length > 0) {
    ++dictionary[chomp(line)];
    line = readln();
 }

 And this is how it should be:

 char[] line = readln();
 while (line != null) {
    ++dictionary[line];
    line = readln();
 }

This depends on distinguishing between an empty line and a null line.

  > The code is portable. Newline translation
  > takes care of it. Just try it.

 Newline translation is an old problem with C, C++ and now with D.
 Nothing can be resolved with newline translation.

 Opening a file in binary mode on Unix and treating it like a text file
 works only as long as the program is run on Unix.
 Newline translation is prone to portability errors, thus non-portable.

 In my experience, newline translations pose more portability problems
 than it solves.

Unless done right by the compiler/language and not having to be done by the
code writer each time. Much like a GC system.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Mar 27 2007

D Programming

C/C++ Programming

Other

digitalmars.D - stdio performance in tango, stdlib, and perl