digitalmars.D - Making byLine faster: we should be able to delegate this

Andrei Alexandrescu (26/26) Mar 22 2015 I just took a look at making byLine faster. It took less than one evenin...

weaselcat (4/10) Mar 22 2015 there's thousands of open bugs, and no real ranking of high

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (5/16) Mar 22 2015 We have votes and the importance field:

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (2/19) Mar 22 2015 Oh, and bounties of course: https://www.bountysource.com/teams/d/issues

Andrei Alexandrescu (3/29) Mar 22 2015 * Avoid most calls to GC.sizeOf.
Sad panda (5/14) Mar 22 2015 Lack of developer itch in a comparatively small developer base

Andrei Alexandrescu (4/16) Mar 22 2015 Thanks!

Vladimir Panteleev (4/7) Mar 22 2015 What about e.g.

Andrei Alexandrescu (2/8) Mar 22 2015 No matter, the static buffer is copied into the result. -- Andrei

tcak (6/20) Mar 22 2015 I didn't see the code though, won't using "static" buffer make

Andrei Alexandrescu (2/16) Mar 22 2015 D's statics are thread-local. -- Andrei

Steven Schveighoffer (6/10) Mar 23 2015 That's not expected. assumeSafeAppend should be pretty quick, and

Andrei Alexandrescu (5/14) Mar 23 2015 Yes, the code was that in

Steven Schveighoffer (22/37) Mar 23 2015 My investigation seems to suggest that assumeSafeAppend is not using

Andrei Alexandrescu (7/50) Mar 23 2015 I don't see the logic here. Unless the value is so small that noise

Steven Schveighoffer (9/44) Mar 23 2015 Yes, rethinking, you are right. I was jolted by the 35% thinking it was

Steven Schveighoffer (6/12) Mar 23 2015 https://github.com/D-Programming-Language/druntime/pull/1198

Andrei Alexandrescu (2/15) Mar 23 2015 Can't tell how much I appreciate this work! -- Andrei

John Colvin (8/37) Mar 23 2015 What would be really great would be a performance test suite for

Robert burner Schadek (3/5) Mar 23 2015 I'm working on it
rumbu (14/20) Mar 23 2015 I made the same test in C# using a 30MB plain ASCII text file.

Andrei Alexandrescu (16/34) Mar 23 2015 At this point it gets down to the performance of std.algorithm.count,
Tobias Pankrath (3/16) Mar 23 2015 Does the C# version validate the input? Using std.file.read

rumbu (33/53) Mar 23 2015 Source code is available at the link above. Since the C# version

Andrei Alexandrescu (2/13) Mar 23 2015 This is great investigative and measuring work. Thanks! -- Andrei

bioinfornatics (1/1) Mar 28 2015 What about hugepagesize system on LINUX ?
bioinfornatics (4/4) Mar 28 2015 Java has disruptor to provide the fatest way to ring file.
bioinfornatics (12/12) Mar 31 2015 Little ping I hope an answer about IO in D and disruptor form

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I just took a look at making byLine faster. It took less than one evening:

https://github.com/D-Programming-Language/phobos/pull/3089

I confess I am a bit disappointed with the leadership being unable to 
delegate this task to a trusty lieutenant in the community. There's been 
a bug opened on this for a long time, it gets regularly discussed here 
(with the wrong conclusions ("we must redo D's I/O because FILE* is 
killing it!") about performance bottlenecks drawn from unverified 
assumptions), and the techniques used to get a marked improvement in the 
diff above are trivial fare for any software engineer. The following 
factors each had a significant impact on speed:

* On OSX (which I happened to test with) getdelim() exists but wasn't 
being used. I made the implementation use it.

* There was one call to fwide() per line read. I used simple caching (a 
stream's width cannot be changed once set, making it a perfect candidate 
for caching).

(As an aside there was some unreachable code in ByLineImpl.empty, which 
didn't impact performance but was overdue for removal.)

* For each line read there was a call to malloc() and one to free(). I 
set things up that the buffer used for reading is reused by simply 
making the buffer static.

* assumeSafeAppend() was unnecessarily used once per line read. Its 
removal led to a whopping 35% on top of everything else. I'm not sure 
what it does, but boy it does takes its sweet time. Maybe someone should 
look into it.

Destroy.


Andrei

Mar 22 2015

"weaselcat" <weaselcat gmail.com> writes:

On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
wrote:
 I just took a look at making byLine faster. It took less than 
 one evening:

 https://github.com/D-Programming-Language/phobos/pull/3089

 I confess I am a bit disappointed with the leadership being 
 unable to delegate this task to a trusty lieutenant in the 
 community. There's been a bug opened on this for a long time,

there's thousands of open bugs, and no real ranking of high 
priority bugs or just minor things.

Mar 22 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.03.2015 um 08:18 schrieb weaselcat:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
 I just took a look at making byLine faster. It took less than one
 evening:

 https://github.com/D-Programming-Language/phobos/pull/3089

 I confess I am a bit disappointed with the leadership being unable to
 delegate this task to a trusty lieutenant in the community. There's
 been a bug opened on this for a long time,

 there's thousands of open bugs, and no real ranking of high priority
 bugs or just minor things.

We have votes and the importance field:
https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq

However, the byLine issue does not have particularly high priority by 
any of those measures.

Mar 22 2015

=?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig rejectedsoftware.com> writes:

Am 22.03.2015 um 08:43 schrieb Sönke Ludwig:
 Am 22.03.2015 um 08:18 schrieb weaselcat:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
 I just took a look at making byLine faster. It took less than one
 evening:

 https://github.com/D-Programming-Language/phobos/pull/3089

 I confess I am a bit disappointed with the leadership being unable to
 delegate this task to a trusty lieutenant in the community. There's
 been a bug opened on this for a long time,

 there's thousands of open bugs, and no real ranking of high priority
 bugs or just minor things.

 We have votes and the importance field:
 https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&columnlist=product%2Ccomponent%2Cassigned_to%2Cbug_status%2Cresolution%2Cshort_desc%2Cchangeddate%2Cvotes&list_id=199241&query_format=advanced&votes=1&votes_type=greaterthaneq


 However, the byLine issue does not have particularly high priority by
 any of those measures.

Oh, and bounties of course: https://www.bountysource.com/teams/d/issues

Mar 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/22/15 12:03 AM, Andrei Alexandrescu wrote:
 I just took a look at making byLine faster. It took less than one evening:

 https://github.com/D-Programming-Language/phobos/pull/3089

 I confess I am a bit disappointed with the leadership being unable to
 delegate this task to a trusty lieutenant in the community. There's been
 a bug opened on this for a long time, it gets regularly discussed here
 (with the wrong conclusions ("we must redo D's I/O because FILE* is
 killing it!") about performance bottlenecks drawn from unverified
 assumptions), and the techniques used to get a marked improvement in the
 diff above are trivial fare for any software engineer. The following
 factors each had a significant impact on speed:

 * On OSX (which I happened to test with) getdelim() exists but wasn't
 being used. I made the implementation use it.

 * There was one call to fwide() per line read. I used simple caching (a
 stream's width cannot be changed once set, making it a perfect candidate
 for caching).

 (As an aside there was some unreachable code in ByLineImpl.empty, which
 didn't impact performance but was overdue for removal.)

 * For each line read there was a call to malloc() and one to free(). I
 set things up that the buffer used for reading is reused by simply
 making the buffer static.

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone should
 look into it.

 Destroy.


 Andrei

* Avoid most calls to GC.sizeOf.

Andrei

Mar 22 2015

"Sad panda" <asdf asdf.com> writes:

On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
wrote:
 I confess I am a bit disappointed with the leadership being 
 unable to delegate this task to a trusty lieutenant in the 
 community. There's been a bug opened on this for a long time, 
 it gets regularly discussed here (with the wrong conclusions 
 ("we must redo D's I/O because FILE* is killing it!") about 
 performance bottlenecks drawn from unverified assumptions), and 
 the techniques used to get a marked improvement in the diff 
 above are trivial fare for any software engineer. The following 
 factors each had a significant impact on speed:

Lack of developer itch in a comparatively small developer base 
making the complement of no one dealing with it too small. :c

Cheers for taking the time, though! All the love for devs.

Mar 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/22/15 1:26 AM, Sad panda wrote:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
 I confess I am a bit disappointed with the leadership being unable to
 delegate this task to a trusty lieutenant in the community. There's
 been a bug opened on this for a long time, it gets regularly discussed
 here (with the wrong conclusions ("we must redo D's I/O because FILE*
 is killing it!") about performance bottlenecks drawn from unverified
 assumptions), and the techniques used to get a marked improvement in
 the diff above are trivial fare for any software engineer. The
 following factors each had a significant impact on speed:

 Lack of developer itch in a comparatively small developer base making
 the complement of no one dealing with it too small. :c

Heh, nicely put :o).

 Cheers for taking the time, though! All the love for devs.

Thanks!


Andrei

Mar 22 2015

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
wrote:
 * For each line read there was a call to malloc() and one to 
 free(). I set things up that the buffer used for reading is 
 reused by simply making the buffer static.

What about e.g.

zip(File("a.txt").byLine, File("b.txt").byLine)

Mar 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
 * For each line read there was a call to malloc() and one to free(). I
 set things up that the buffer used for reading is reused by simply
 making the buffer static.

 What about e.g.

 zip(File("a.txt").byLine, File("b.txt").byLine)

No matter, the static buffer is copied into the result. -- Andrei

Mar 22 2015

"tcak" <tcak gmail.com> writes:

On Sunday, 22 March 2015 at 16:03:11 UTC, Andrei Alexandrescu 
wrote:
 On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
 wrote:
 * For each line read there was a call to malloc() and one to 
 free(). I
 set things up that the buffer used for reading is reused by 
 simply
 making the buffer static.

 What about e.g.

 zip(File("a.txt").byLine, File("b.txt").byLine)

 No matter, the static buffer is copied into the result. -- 
 Andrei

I didn't see the code though, won't using "static" buffer make 
the function thread UNSAFE?

I think we should add somewhere in documentation about thread 
safety as well. Phobos doesn't have any.

Mar 22 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/22/15 10:13 AM, tcak wrote:
 On Sunday, 22 March 2015 at 16:03:11 UTC, Andrei Alexandrescu wrote:
 On 3/22/15 3:10 AM, Vladimir Panteleev wrote:
 On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu wrote:
 * For each line read there was a call to malloc() and one to free(). I
 set things up that the buffer used for reading is reused by simply
 making the buffer static.

 What about e.g.

 zip(File("a.txt").byLine, File("b.txt").byLine)

 No matter, the static buffer is copied into the result. -- Andrei

 I didn't see the code though, won't using "static" buffer make the
 function thread UNSAFE?

D's statics are thread-local. -- Andrei

Mar 22 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone should
 look into it.

That's not expected. assumeSafeAppend should be pretty quick, and 
DEFINITELY should not be a significant percentage of reading lines. I 
will look into it.

Just to verify, your test application was a simple byline loop?

-Steve

Mar 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/15 7:52 AM, Steven Schveighoffer wrote:
 On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone should
 look into it.

 That's not expected. assumeSafeAppend should be pretty quick, and
 DEFINITELY should not be a significant percentage of reading lines. I
 will look into it.

Thanks!

 Just to verify, your test application was a simple byline loop?

Yes, the code was that in 



Andrei

Mar 23 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/23/15 10:59 AM, Andrei Alexandrescu wrote:
 On 3/23/15 7:52 AM, Steven Schveighoffer wrote:
 On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone should
 look into it.

 That's not expected. assumeSafeAppend should be pretty quick, and
 DEFINITELY should not be a significant percentage of reading lines. I
 will look into it.

 Thanks!

 Just to verify, your test application was a simple byline loop?

 Yes, the code was that in


My investigation seems to suggest that assumeSafeAppend is not using 
that much time for what it does. The reason for the "35%" is that you 
are talking 35% of a very small value. At that level, and with these 
numbers of calls, combined with the fact that the calls MUST occur 
(these are opaque functions), I think we are talking about a non issue here.

This is what assumeSafeAppend does:

1. Access TypeInfo and convert array to "void[]" array (this could 
probably be adjusted to avoid using the TypeInfo, since assumeSafeAppend 
is a template).
2. Look up block info, which should be a loop through 8 array cache 
elements.
3. Verify the block has the APPENDABLE flag, and write the new "used" 
space into the right place.

I suspect some combination of memory cache failures, or virtual function 
calls on the TypeInfo, or failure to inline some functions is what's 
slowing it down. But let's not forget that the 35% savings was AFTER all 
the original savings. On my system, using a 2 million line file, the 
original took 2.2 seconds, the version with the superfluous 
assumeSafeAppend took .3 seconds, without it takes .15 seconds.

Still should be examined further, but I'm not as concerned as I was before.

-Steve

Mar 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/15 2:42 PM, Steven Schveighoffer wrote:
 On 3/23/15 10:59 AM, Andrei Alexandrescu wrote:
 On 3/23/15 7:52 AM, Steven Schveighoffer wrote:
 On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone
 should
 look into it.

 That's not expected. assumeSafeAppend should be pretty quick, and
 DEFINITELY should not be a significant percentage of reading lines. I
 will look into it.

 Thanks!

 Just to verify, your test application was a simple byline loop?

 Yes, the code was that in


 My investigation seems to suggest that assumeSafeAppend is not using
 that much time for what it does. The reason for the "35%" is that you
 are talking 35% of a very small value.

I don't see the logic here. Unless the value is so small that noise 
margins become significant (it isn't), 35% is large.

 At that level, and with these
 numbers of calls, combined with the fact that the calls MUST occur
 (these are opaque functions), I think we are talking about a non issue
 here.

I disagree with this assessment. In this case it takes us from losing to 
winning to Python.

 This is what assumeSafeAppend does:

 1. Access TypeInfo and convert array to "void[]" array (this could
 probably be adjusted to avoid using the TypeInfo, since assumeSafeAppend
 is a template).
 2. Look up block info, which should be a loop through 8 array cache
 elements.
 3. Verify the block has the APPENDABLE flag, and write the new "used"
 space into the right place.

 I suspect some combination of memory cache failures, or virtual function
 calls on the TypeInfo, or failure to inline some functions is what's
 slowing it down. But let's not forget that the 35% savings was AFTER all
 the original savings. On my system, using a 2 million line file, the
 original took 2.2 seconds, the version with the superfluous
 assumeSafeAppend took .3 seconds, without it takes .15 seconds.

 Still should be examined further, but I'm not as concerned as I was before.

We should.


Andrei

Mar 23 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/23/15 7:33 PM, Andrei Alexandrescu wrote:
 On 3/23/15 2:42 PM, Steven Schveighoffer wrote:
 On 3/23/15 10:59 AM, Andrei Alexandrescu wrote:
 On 3/23/15 7:52 AM, Steven Schveighoffer wrote:
 On 3/22/15 3:03 AM, Andrei Alexandrescu wrote:

 * assumeSafeAppend() was unnecessarily used once per line read. Its
 removal led to a whopping 35% on top of everything else. I'm not sure
 what it does, but boy it does takes its sweet time. Maybe someone
 should
 look into it.

 That's not expected. assumeSafeAppend should be pretty quick, and
 DEFINITELY should not be a significant percentage of reading lines. I
 will look into it.

 Thanks!

 Just to verify, your test application was a simple byline loop?

 Yes, the code was that in


 My investigation seems to suggest that assumeSafeAppend is not using
 that much time for what it does. The reason for the "35%" is that you
 are talking 35% of a very small value.

 I don't see the logic here. Unless the value is so small that noise
 margins become significant (it isn't), 35% is large.

 At that level, and with these
 numbers of calls, combined with the fact that the calls MUST occur
 (these are opaque functions), I think we are talking about a non issue
 here.

 I disagree with this assessment. In this case it takes us from losing to
 winning to Python.

Yes, rethinking, you are right. I was jolted by the 35% thinking it was 
35% of the original problem.

I re-examined and found something interesting -- assumeSafeAppend 
doesn't cache the block, it only uses the cache if it's ALREADY cached.

So a large chunk of that 35% is the runtime looking up that block info 
in the heap. On my machine, this brings the time from .3 down to .2 s.

I also found a bad memory corruption bug you introduced. I'll make some PRs.

-Steve

Mar 23 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 3/23/15 9:17 PM, Steven Schveighoffer wrote:
 I re-examined and found something interesting -- assumeSafeAppend
 doesn't cache the block, it only uses the cache if it's ALREADY cached.

 So a large chunk of that 35% is the runtime looking up that block info
 in the heap. On my machine, this brings the time from .3 down to .2 s.

 I also found a bad memory corruption bug you introduced. I'll make some
 PRs.

https://github.com/D-Programming-Language/druntime/pull/1198
https://github.com/D-Programming-Language/phobos/pull/3098

Note, this doesn't affect performance in this case, as assumeSafeAppend 
isn't used any more.

-Steve

Mar 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/15 6:44 PM, Steven Schveighoffer wrote:
 On 3/23/15 9:17 PM, Steven Schveighoffer wrote:
 I re-examined and found something interesting -- assumeSafeAppend
 doesn't cache the block, it only uses the cache if it's ALREADY cached.

 So a large chunk of that 35% is the runtime looking up that block info
 in the heap. On my machine, this brings the time from .3 down to .2 s.

 I also found a bad memory corruption bug you introduced. I'll make some
 PRs.

 https://github.com/D-Programming-Language/druntime/pull/1198
 https://github.com/D-Programming-Language/phobos/pull/3098

 Note, this doesn't affect performance in this case, as assumeSafeAppend
 isn't used any more.

Can't tell how much I appreciate this work! -- Andrei

Mar 23 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 22 March 2015 at 07:03:14 UTC, Andrei Alexandrescu 
wrote:
 I just took a look at making byLine faster. It took less than 
 one evening:

 https://github.com/D-Programming-Language/phobos/pull/3089

 I confess I am a bit disappointed with the leadership being 
 unable to delegate this task to a trusty lieutenant in the 
 community. There's been a bug opened on this for a long time, 
 it gets regularly discussed here (with the wrong conclusions 
 ("we must redo D's I/O because FILE* is killing it!") about 
 performance bottlenecks drawn from unverified assumptions), and 
 the techniques used to get a marked improvement in the diff 
 above are trivial fare for any software engineer. The following 
 factors each had a significant impact on speed:

 * On OSX (which I happened to test with) getdelim() exists but 
 wasn't being used. I made the implementation use it.

 * There was one call to fwide() per line read. I used simple 
 caching (a stream's width cannot be changed once set, making it 
 a perfect candidate for caching).

 (As an aside there was some unreachable code in 
 ByLineImpl.empty, which didn't impact performance but was 
 overdue for removal.)

 * For each line read there was a call to malloc() and one to 
 free(). I set things up that the buffer used for reading is 
 reused by simply making the buffer static.

 * assumeSafeAppend() was unnecessarily used once per line read. 
 Its removal led to a whopping 35% on top of everything else. 
 I'm not sure what it does, but boy it does takes its sweet 
 time. Maybe someone should look into it.

 Destroy.


 Andrei

What would be really great would be a performance test suite for 
phobos. D is reaching a point where "It'll probably be fast 
because we did it right" or "I remember it being fast-ish 3 years 
ago when i wrote a small toy test" isn't going to cut it. Real 
data is needed, with comparisons to other languages where 
possible.

Mar 23 2015

"Robert burner Schadek" <rburners gmail.com> writes:

On Monday, 23 March 2015 at 15:00:07 UTC, John Colvin wrote:
 What would be really great would be a performance test suite 
 for phobos.

I'm working on it

https://github.com/D-Programming-Language/phobos/pull/2995

Mar 23 2015

"rumbu" <rumbu rumbu.ro> writes:

On Monday, 23 March 2015 at 15:00:07 UTC, John Colvin wrote:

 What would be really great would be a performance test suite 
 for phobos. D is reaching a point where "It'll probably be fast 
 because we did it right" or "I remember it being fast-ish 3 
 years ago when i wrote a small toy test" isn't going to cut it. 
 Real data is needed, with comparisons to other languages where 
 possible.


Compared to fastest method proposed by Andrei, results are not 
the best:

D:
readText.representation.count!(c => c == '\n') - 428 ms
byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms


File.ReadAllLines.Length - 216 ms;

Win64, D 2.066.1, Optimizations were turned on in both cases.

The .net code is clearly not performance oriented 
(http://referencesource.microsoft.com/#mscorlib/system/io/file.c
,675b2259e8706c26), 
I suspect that .net runtime is performing some optimizations 
under the hood.

Mar 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/15 10:43 AM, rumbu wrote:
 On Monday, 23 March 2015 at 15:00:07 UTC, John Colvin wrote:

 What would be really great would be a performance test suite for
 phobos. D is reaching a point where "It'll probably be fast because we
 did it right" or "I remember it being fast-ish 3 years ago when i
 wrote a small toy test" isn't going to cut it. Real data is needed,
 with comparisons to other languages where possible.


 to fastest method proposed by Andrei, results are not the best:

 D:
 readText.representation.count!(c => c == '\n') - 428 ms
 byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms


 File.ReadAllLines.Length - 216 ms;

 Win64, D 2.066.1, Optimizations were turned on in both cases.

 The .net code is clearly not performance oriented
 (http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26),
 I suspect that .net runtime is performing some optimizations under the
 hood.

At this point it gets down to the performance of std.algorithm.count, 
which could and should be improved. This code accelerates speed 2.5x 
over count and brings it in the zone of wc -l, which is probably near 
the lower bound achievable:

   auto bytes = args[1].readText.representation;
   for (auto p = bytes.ptr, lim = p + bytes.length;; )
   {
     import core.stdc.string;
     auto r = cast(immutable(ubyte)*) memchr(p, '\n', lim - p);
     if (!r) break;
     ++linect;
     p = r + 1;
   }

Would anyone want to put some work into accelerating count?


Andrei

Mar 23 2015

"Tobias Pankrath" <tobias pankrath.net> writes:


 Compared to fastest method proposed by Andrei, results are not 
 the best:

 D:
 readText.representation.count!(c => c == '\n') - 428 ms
 byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms


 File.ReadAllLines.Length - 216 ms;

 Win64, D 2.066.1, Optimizations were turned on in both cases.

 The .net code is clearly not performance oriented 
 (http://referencesource.microsoft.com/#mscorlib/system/io/file.c
,675b2259e8706c26), 
 I suspect that .net runtime is performing some optimizations 
 under the hood.


instead of readText.representation halves the runtime on my 
machine.

Mar 23 2015

"rumbu" <rumbu rumbu.ro> writes:

On Monday, 23 March 2015 at 19:25:08 UTC, Tobias Pankrath wrote:

 Compared to fastest method proposed by Andrei, results are not 
 the best:

 D:
 readText.representation.count!(c => c == '\n') - 428 ms
 byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms


 File.ReadAllLines.Length - 216 ms;

 Win64, D 2.066.1, Optimizations were turned on in both cases.

 The .net code is clearly not performance oriented 
 (http://referencesource.microsoft.com/#mscorlib/system/io/file.c
,675b2259e8706c26), 
 I suspect that .net runtime is performing some optimizations 
 under the hood.


 instead of readText.representation halves the runtime on my 
 machine.


works internally with streams and UTF-16 chars, the pseudocode 
looks like this:

---
initilialize a LIST with 16 items;
while (!eof)
{
   read 4096 bytes in a buffer;
   decode them to UTF-16 in a wchar[] buffer
   while (moredata in the buffer)
   {
     read from buffer until (\n or \r\n or \r);
     discard end of line;
     if (nomorespace in LIST)
        double its size.
     add the line to LIST.
   }
}
return number of items in the LIST.
---

Since this code is clearly not the best for this task, as I 
suspected, I looked into jitted code and it seems that the .net 
runtime is smart enough to recognize this pattern and is doing 
the following:
- file is mapped into memory using CreateFileMapping
- does not perform any decoding, since \r and \n are ASCII
- does not create any list
- searches incrementally for \r, \r\n, \n using CompareStringA 
and LOCALE_INVARIANT and increments at each end of line
- there is no temporary memory allocation since searching is 
performed directly on the mapping handle
- returns the count.

Mar 23 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/15 2:13 PM, rumbu wrote:
 Since this code is clearly not the best for this task, as I suspected, I
 looked into jitted code and it seems that the .net runtime is smart
 enough to recognize this pattern and is doing the following:
 - file is mapped into memory using CreateFileMapping
 - does not perform any decoding, since \r and \n are ASCII
 - does not create any list
 - searches incrementally for \r, \r\n, \n using CompareStringA and
 LOCALE_INVARIANT and increments at each end of line
 - there is no temporary memory allocation since searching is performed
 directly on the mapping handle
 - returns the count.

This is great investigative and measuring work. Thanks! -- Andrei

Mar 23 2015

"bioinfornatics" <bioinfornatics fedoraproject.org> writes:

What about hugepagesize system on LINUX ?

Mar 28 2015

"bioinfornatics" <bioinfornatics fedoraproject.org> writes:

Java has disruptor to provide the fatest way to ring file.

website: http://lmax-exchange.github.io/disruptor/
technical information: 
http://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf

Mar 28 2015

"bioinfornatics" <bioinfornatics fedoraproject.org> writes:

Little ping I hope an answer about IO in D and disruptor form 
java world

Disruptor seem to provide a smart implementation between IO and 
their buffer.

What did you think about it?
D could to provided a high level way to process efficiently a 
file. (using Range, forwardrange ... will be better)

I think for this kind of usual process D should to be battery 
included.

Whithout the need to know if you are on SSD or HD, if the page 
size is 4096, if hugepagesize is enabled ...

will be realy awesome to have an abstraction layer on this

Mar 31 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Making byLine faster: we should be able to delegate this