digitalmars.D - std.benchmark ready for review. Manager sought after

Andrei Alexandrescu (16/16) Apr 07 2012 Hello,

Vladimir Panteleev (11/17) Apr 07 2012 Nice, some comments:

Andrei Alexandrescu (7/23) Apr 07 2012 To me too :o).

Vladimir Panteleev (17/24) Apr 08 2012 Hmm, maybe there's a misunderstanding, but what I meant was:

Andrei Alexandrescu (3/15) Apr 08 2012 I understand, thanks.

Somedude (3/26) Apr 08 2012 Like it. Would it be a good idea to add a column with an average memory

Andrei Alexandrescu (6/8) Apr 08 2012 Interesting idea. I saw

Marco Leise (5/15) Apr 08 2012 Oh please, I use these functions already and with the similarities among...

Somedude (8/24) Apr 08 2012 I thought about monitoring as well.

Dmitry Olshansky (9/11) Apr 08 2012 In general it's next to impossible and/or entirely OS-specific.

Somedude (2/15) Apr 08 2012 That's what I thought, thank you.

Piotr Szturmaj (7/13) Apr 08 2012 For algorithms that process sequences, it would be nice to have results

Andrei Alexandrescu (4/18) Apr 08 2012 The framework aims at generality and simplicity. I'd rather have it

Denis Shelomovskij (11/11) Apr 08 2012 Very good but minimum isn't a best guess. Personally I (and there will

Andrei Alexandrescu (16/24) Apr 08 2012 I've analyzed this quite a bit at work and the average and median are

Manfred Nowak (10/13) Apr 08 2012 On the contrary:

Andrei Alexandrescu (20/32) Apr 08 2012 That sounds quite tenuous to me. How do you measure it, and what

Manfred Nowak (28/47) Apr 09 2012 This is in doubt, because you yourself wrote "the machine itself has

Andrei Alexandrescu (13/45) Apr 09 2012 Which is great, unless the program wants to measure the cache memory

Francois Chabot (5/8) Apr 09 2012 I disagree. If a regression suddenly causes a function to become

Andrei Alexandrescu (5/11) Apr 09 2012 But cache binding depends on a variety of cache characteristics, i.e.

Manfred Nowak (23/31) Apr 09 2012 I did not mean to run two or more benchmarks in parallel but only

Andrei Alexandrescu (9/30) Apr 14 2012 I doubt that's the case. Once we discuss running several instances in

Denis Shelomovskij (21/44) Apr 09 2012 Yes of course. I mean than an algorithm itself should follow some

Andrei Alexandrescu (10/17) Apr 09 2012 As I explained, the average takes noise and outliers (some very large,

Denis Shelomovskij (8/18) Apr 10 2012 Looks like a misunderstanding because of my bad English: "Recording the

Martin Nowak (4/10) Apr 10 2012 The benchmarked function itself could have a relevant variance, e.g.

Dmitry Olshansky (16/24) Apr 08 2012 +1

Andrei Alexandrescu (4/8) Apr 08 2012 This would be interesting to explore, could you give some more detail

Dmitry Olshansky (9/16) Apr 08 2012 I thought to just throw it in. But if were to do it in under an hour:

Caligo (4/8) Apr 08 2012 I probably missed this somewhere, but what happens to `std.datetime :

Andrei Alexandrescu (4/14) Apr 08 2012 Yes, I forgot to mention that. Part of the proposal is that

Steven Schveighoffer (2/2) Apr 09 2012 Added to trello.
Francois Chabot (12/12) Apr 09 2012 Why is there so much emphasis on printBenchmarks()?

Andrei Alexandrescu (10/15) Apr 09 2012 The functionality is to make benchmark easy to use, meaningful, and easy...
Somedude (5/20) Apr 09 2012 The printBenchmark facility is cool and should be included imho.

Andrei Alexandrescu (5/8) Apr 09 2012 Yes, I had unittest in mind when writing the library. If one needs more

Jens Mueller (33/54) Apr 10 2012 I come to think the same.

Andrei Alexandrescu (16/45) Apr 14 2012 It's in the names. If the name of a benchmark starts with

Jens Mueller (11/64) Apr 15 2012 I know. That wasn't my question. How do I choose between percentage vs.

Andrei Alexandrescu (8/8) Apr 14 2012 There have been quite a few good comments, but no review manager offer.

Jacob Carlborg (6/14) Apr 15 2012 As far as I know, there are already several other projects/modules in

Jonathan M Davis (6/22) Apr 16 2012 There are, but none of them are being reviewed at the moment or being pu...

Jens Mueller (6/8) Apr 15 2012 I will do this.

Lars T. Kyllingstad (4/13) Apr 16 2012 The review process is based on Boost's:

Jens Mueller (5/20) Apr 16 2012 The page is shorter than expected. Last time I checked I found something

Dmitry Olshansky (24/43) Apr 17 2012 It's peer review followed by voting and that's all about it.

Jens Mueller (4/53) Apr 17 2012 Andrei is already preparing. Review will start soon.

Andrei Alexandrescu (6/7) Apr 17 2012 I'd like to gather a bit of confidence that it's okay with people that

SomeDude (7/15) Apr 17 2012 I agree, there is no good reason for the queue to be stopped like
Jonathan M Davis (6/12) Apr 17 2012 I have no problem with it. None of the others is being pushed for review...
H. S. Teoh (6/12) Apr 17 2012 [...]
Jonathan M Davis (4/15) Apr 17 2012 Check it in? You mean start the review on it? I hope that you don't mean...
Marco Leise (5/15) Apr 18 2012 In a supermarket, there are three people in front of you in the queue. T...

SomeDude (2/8) Apr 18 2012 Well, we are not in a supermarket. It was correct to ask.

Famous (47/47) Apr 15 2012 Instead of the originally proposed layout

Famous (1/10) Apr 15 2012

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (236/255) Sep 27 2013 Where have std.benchmark gone?

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Hello,


I finally found the time to complete std.benchmark. I got to a very 
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at 
https://github.com/D-Programming-Language/phobos/pull/529. (There's some 
noise in there caused by my git n00biness). Documentation is at 
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this 
proposal. Clearly D's user base is highly interested in efficiency, and 
many of the upcoming libraries have efficiency a virtual part of their 
design. So we should get std.benchmark in soon and require that new 
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant 
benchmarks battery, which will help improving Phobos and porting to new 
platforms.


Andrei

Apr 07 2012

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu 
wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a 
 very simple API design, starting where I like it: one line of 
 code.

Nice, some comments:

1) Is it possible to do something about the "kilonanoseconds", 
and print the two-letter SI time abbreviation instead 
(μs/ns/ms/s/...)?

2) The meaning of "epoch" in this context is unfamiliar to me:

 Measurement is done in epochs. For each function benchmarked, 
 the smallest time is taken over all epochs.

3) "benchmark_relative_file read" should be replaced with a 
language construct. E.g. a function call like 
relativeBenchmark("file read"), or an enum value like getopt's.

4) There is a TODO in benchmarkSuspend's example.

Apr 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/7/12 11:45 PM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

 Nice, some comments:

 1) Is it possible to do something about the "kilonanoseconds", and print
 the two-letter SI time abbreviation instead (μs/ns/ms/s/...)?

Cool.

 2) The meaning of "epoch" in this context is unfamiliar to me:

 Measurement is done in epochs. For each function benchmarked, the
 smallest time is taken over all epochs.


To me too :o).

 3) "benchmark_relative_file read" should be replaced with a language
 construct. E.g. a function call like relativeBenchmark("file read"), or
 an enum value like getopt's.

No can do. Need a function name-based convention so we can automate 
scheduleForBenchmarking.

 4) There is a TODO in benchmarkSuspend's example.

Will destroy.


Andrei

Apr 07 2012

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 8 April 2012 at 05:41:11 UTC, Andrei Alexandrescu 
wrote:
 3) "benchmark_relative_file read" should be replaced with a 
 language
 construct. E.g. a function call like relativeBenchmark("file 
 read"), or
 an enum value like getopt's.

 No can do. Need a function name-based convention so we can 
 automate scheduleForBenchmarking.

Hmm, maybe there's a misunderstanding, but what I meant was:

The "benchmark_relative_" prefix makes sense for function names 
(for scheduleForBenchmarking), but not so much for string 
literals for benchmark names. The string literal 
"benchmark_relative_file read" looks like the words "benchmark 
relative file" are grouped together, with "read" added on. So, my 
suggestion would be to wrap the "benchmark_relative_" prefix - 
when used with benchmark name strings - into a semantical 
function / enum / etc. In my example above, relativeBenchmark 
would be:

string relativeBenchmark(string s) { return "benchmark_relative_" 
~ s; }

I suppose it can be summed up as a tradeoff between complexity 
(you need to explain both the function name usage and the 
relativeBenchmark wrapper usage) vs. code prettiness.

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 2:02 AM, Vladimir Panteleev wrote:
 The "benchmark_relative_" prefix makes sense for function names (for
 scheduleForBenchmarking), but not so much for string literals for
 benchmark names. The string literal "benchmark_relative_file read" looks
 like the words "benchmark relative file" are grouped together, with
 "read" added on. So, my suggestion would be to wrap the
 "benchmark_relative_" prefix - when used with benchmark name strings -
 into a semantical function / enum / etc. In my example above,
 relativeBenchmark would be:

 string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; }

 I suppose it can be summed up as a tradeoff between complexity (you need
 to explain both the function name usage and the relativeBenchmark
 wrapper usage) vs. code prettiness.

I understand, thanks.

Andrei

Apr 08 2012

Somedude <lovelydear mailmetrash.com> writes:

Le 08/04/2012 05:25, Andrei Alexandrescu a �crit :
 Hello,
 
 
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.
 
 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
 
 If reasonable and at all possible, I'd like to bump the priority of this
 proposal. Clearly D's user base is highly interested in efficiency, and
 many of the upcoming libraries have efficiency a virtual part of their
 design. So we should get std.benchmark in soon and require that new
 addition come with benchmarks for their essential functionality.
 
 My vision is that in the future Phobos will have a significant
 benchmarks battery, which will help improving Phobos and porting to new
 platforms.
 
 
 Andrei

Like it. Would it be a good idea to add a column with an average memory
used ?

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

Interesting idea. I saw 
http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows 
and it looks like it's not an easy problem. Should we make this part of 
the initial release?

Andrei

Apr 08 2012

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

 
 Interesting idea. I saw 
 http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows 
 and it looks like it's not an easy problem. Should we make this part of 
 the initial release?
 
 Andrei

Oh please, I use these functions already and with the similarities amongst
operating systems they should be somewhere in Phobos! My thought was more in
the line of std.procmgmt/std.process though. Since they are process management
functions useful for server monitoring as well, not just benchmarking.

-- 
Marco

Apr 08 2012

Somedude <lovelydear mailmetrash.com> writes:

Le 08/04/2012 18:21, Marco Leise a �crit :
 Am Sun, 08 Apr 2012 09:35:14 -0500
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 
 On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

 Interesting idea. I saw 
 http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows 
 and it looks like it's not an easy problem. Should we make this part of 
 the initial release?

 Andrei

 
 Oh please, I use these functions already and with the similarities amongst
operating systems they should be somewhere in Phobos! My thought was more in
the line of std.procmgmt/std.process though. Since they are process management
functions useful for server monitoring as well, not just benchmarking.
 

I thought about monitoring as well.
But then again, following your remark, monitoring may in fact be a
completely different matter.
This would need a little bit more thought I guess. Adding monitoring
capabilities to processes would be interesting. But I guess we are
already quite far from benchmarking now.
Unless std.benchmark makes use of these monitoring capabilities.

Apr 08 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 08.04.2012 12:16, Somedude wrote:
[snip]
 Like it. Would it be a good idea to add a column with an average memory
 used ?

In general it's next to impossible and/or entirely OS-specific.
What can be done I think is adding a query function to GC interface that 
returns amount of RAM currently allocated on it's heap.

So adding GC heap usage can be done albeit it's not really a stable 
metric. This way one gets a nice hint on why something is slow ;)

-- 
Dmitry Olshansky

Apr 08 2012

Somedude <lovelydear mailmetrash.com> writes:

Le 08/04/2012 17:48, Dmitry Olshansky a �crit :
 On 08.04.2012 12:16, Somedude wrote:
 [snip]
 Like it. Would it be a good idea to add a column with an average memory
 used ?

 
 In general it's next to impossible and/or entirely OS-specific.
 What can be done I think is adding a query function to GC interface that
 returns amount of RAM currently allocated on it's heap.
 
 So adding GC heap usage can be done albeit it's not really a stable
 metric. This way one gets a nice hint on why something is slow ;)
 

That's what I thought, thank you.

Apr 08 2012

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

Andrei Alexandrescu wrote:
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

For algorithms that process sequences, it would be nice to have results 
represented in cycles per item, This should give more consistent results 
across different CPU familes and different clock speeds.

Specifically, I think about cycles per byte, see 
http://en.wikipedia.org/wiki/Cycles_per_byte
Example: http://www.cryptopp.com/benchmarks.html

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 5:46 AM, Piotr Szturmaj wrote:
 Andrei Alexandrescu wrote:
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

 For algorithms that process sequences, it would be nice to have results
 represented in cycles per item, This should give more consistent results
 across different CPU familes and different clock speeds.

 Specifically, I think about cycles per byte, see
 http://en.wikipedia.org/wiki/Cycles_per_byte
 Example: http://www.cryptopp.com/benchmarks.html

The framework aims at generality and simplicity. I'd rather have it 
simple and universal rather than subtle.

Andrei

Apr 08 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

Very good but minimum isn't a best guess. Personally I (and there will 
be a lot of such maniacs I suppose) will think that this (minimum) time 
can be significantly smaller than average time.

So a parameter (probably with a default value) should be added. 
Something like enum of flags telling what we want to know. At least 
these looks usable: minTime, <some mean time>, maxTime, 
standardDeviation, graph (yes, good old ASCII art).

Yes, graph is needed.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

I've analyzed this quite a bit at work and the average and median are 
not very informative. You need the mode, but in most benchmarks the mode 
is very close to the minimum, so using the minimum is even better.

In speed measurements, all noise is additive (there's no noise that may 
make a benchmark appear to run faster). There are also quite a few 
outliers. Recording the average will include a fair amount of noise.

Clearly there is noise during normal use as well, but incorporating it 
in benchmarks as a matter of course reduces the usefulness of benchmarks 
as a mean to improve performance of the benchmarked code.

 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation, graph (yes, good old ASCII art).

Standard deviation is also not very useful because it includes all 
outliers (some of which are very far away from the mode).

 Yes, graph is needed.

I am not sure about that. We may provide the raw measurement data for 
programs that want to plot things, but plotting is beyond the charter of 
std.benchmark.


Andrei

Apr 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

 Clearly there is noise during normal use as well, but
 incorporating it in benchmarks as a matter of course reduces the
 usefulness of benchmarks 

On the contrary:
1) The "noise during normal use" has to be measured in order to detect 
the sensibility of the benchmarked program to that noise.
2) The noise the benchmarked program produces has to be measured too, 
because the running benchmarked program probably increases the noise 
for all other running programs.   

In addition: the noise produced by a machine under heavy load might 
bring the performance of the benchmarked program down to zero.

-manfred

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 3:03 PM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:

 Clearly there is noise during normal use as well, but
 incorporating it in benchmarks as a matter of course reduces the
 usefulness of benchmarks

 On the contrary:
 1) The "noise during normal use" has to be measured in order to detect
 the sensibility of the benchmarked program to that noise.

That sounds quite tenuous to me. How do you measure it, and what 
conclusions do you draw other than there's a more or less other stuff 
going on on the machine, and the machine itself has complex interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

where:

A > 0 is actual benchmark time

Q > 0 quantization noise (uniform distribution)

N > 0 various other noises (interrupts, task switching, networking, CPU 
dynamically changing frequency, etc). Many people jump on Gaussian as an 
approximation, but my tests suggest it's hardly so because it has a lot 
of jerky outliers.

How do we estimate A given T?

 2) The noise the benchmarked program produces has to be measured too,
 because the running benchmarked program probably increases the noise
 for all other running programs.

How to measure that? Also, that noise does not need to be measured as 
much as eliminated to the extent possible. This is because the benchmark 
app noise is a poor model of the application-induced noise.

 In addition: the noise produced by a machine under heavy load might
 bring the performance of the benchmarked program down to zero.

Of course. That's why the documentation emphasizes the necessity of 
baselines. A measurement without baselines is irrelevant.


Andrei

Apr 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:


 all noise is additive (there's no noise that may make a benchmark
 appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has 
complex interactions". This complex interactions might lower the time 
needed for an operation of the benchmarked program.

Examples that come to mind:
a) needed data is already in a (faster) cache because it belongs to a 
memory block, from which some data is needed by some program not 
belonging to the benchmarked set---and that block isnt replaced yet.
b) needed data is stored in a hdd whose I/O scheduler uses the elevator 
algorithm and serves the request by pure chance instantly, because the 
position of the needed data is between two positions accessed by some 
programs not belonging to the benchmarked set.
 
Especially a hdd, if used, will be responsible for a lot of noise you 
define as "quantization noise (uniform distribution)" even if the head 
stays at the same cylinder. Not recognizing this noise would only mean 
that the data is cached and interpreting the only true read from the 
hdd as a jerky outlier sems quite wrong.
 

 1) The "noise during normal use" has to be measured in order to
 detect the sensibility of the benchmarked program to that noise.

 How do you measure it, and what 
 conclusions do you draw other than there's a more or less other
 stuff going on on the machine, and the machine itself has complex
 interactions? 
 
 Far as I can tell a time measurement result is:
 
 T = A + Q + N

For example by running more than one instance of the benchmarked 
program in paralell and use the thereby gathered statistical routines 
to spread T into the additiv components A, Q and N.


 2) The noise the benchmarked program produces has to be measured
 too, because the running benchmarked program probably increases
 the noise for all other running programs.

 
 How to measure that?

Similar to the above note.


 Also, that noise does not need to be measured
 as much as eliminated to the extent possible.

I wouldn't define two programs to be equivalent based on the time until 
completion only. That time might be identical for both programs, but if 
only one of the programs increases the answering time of the machine  
to inacceptability I would choose the other. 

-manfred

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 9:25 AM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:
 all noise is additive (there's no noise that may make a benchmark
 appear to run faster)

 This is in doubt, because you yourself wrote "the machine itself has
 complex interactions". This complex interactions might lower the time
 needed for an operation of the benchmarked program.

 Examples that come to mind:
 a) needed data is already in a (faster) cache because it belongs to a
 memory block, from which some data is needed by some program not
 belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory 
itself, in which case it would use special assembler instructions or 
large memset()s. (We do such at Facebook.)

 b) needed data is stored in a hdd whose I/O scheduler uses the elevator
 algorithm and serves the request by pure chance instantly, because the
 position of the needed data is between two positions accessed by some
 programs not belonging to the benchmarked set.

 Especially a hdd, if used, will be responsible for a lot of noise you
 define as "quantization noise (uniform distribution)" even if the head
 stays at the same cylinder. Not recognizing this noise would only mean
 that the data is cached and interpreting the only true read from the
 hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself 
should make sure the HDD cache is cleared. (What I recall they do on 
Linux is unmounting and remounting the drive.) Otherwise, it adds a 
useless component to the timing.

 1) The "noise during normal use" has to be measured in order to
 detect the sensibility of the benchmarked program to that noise.

 How do you measure it, and what
 conclusions do you draw other than there's a more or less other
 stuff going on on the machine, and the machine itself has complex
 interactions?

 Far as I can tell a time measurement result is:

 T = A + Q + N

 For example by running more than one instance of the benchmarked
 program in paralell and use the thereby gathered statistical routines
 to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that exposes 
them to even more noise (scheduling, CPU count, current machine load 
etc). I don't understand the part of the sentence starting with "...use 
the thereby...", I'd be grateful if you elaborated.


Andrei

Apr 09 2012

"Francois Chabot" <francois.chabot.dev gmail.com> writes:

 Which is great, unless the program wants to measure the cache 
 memory itself, in which case it would use special assembler 
 instructions or large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become
heavily cache-bound, it should show up in benchmarks somehow,
regardless of the previous expected behavior of the function.

--
Francois

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 11:29 AM, Francois Chabot wrote:
 Which is great, unless the program wants to measure the cache memory
 itself, in which case it would use special assembler instructions or
 large memset()s. (We do such at Facebook.)

 I disagree. If a regression suddenly causes a function to become
 heavily cache-bound, it should show up in benchmarks somehow,
 regardless of the previous expected behavior of the function.

But cache binding depends on a variety of cache characteristics, i.e. 
the machine. The question is whether we accept a heavy dependence of the 
benchmark on the machine.

Andrei

Apr 09 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Andrei Alexandrescu wrote:

 For example by running more than one instance of the benchmarked
 program in paralell and use the thereby gathered statistical 
 routines to spread T into the additiv components A, Q and N.


 I disagree with running two benchmarks in parallel because that
 exposes them to even more noise (scheduling, CPU count, current
 machine load etc).

I did not mean to run two or more benchmarks in parallel but only 
more instances of the benchmarked program _without_ the environment 
supplied by the benchmakr. Of course may there be more noise. But if 
so, that noise is dependent on the additional running instances and 
to nothing else.

Now let T(n) be the time needed to run n instances in parallel on a 
single processor. According to your definition and your remark above:

�   T(n) = n*A + Q + N + P(n)

where P(n)>=0 and P(1)==0 is the additional noise with which the 
benchmarked program disturbs itself.

Please observe that Q and N eliminate themselves on subtraction:
�  T(2) - T(1) = A + P(2) - P(1)
�  T(3) - T(2) = A + P(3) - P(2)
...

P(n+1)-P(n) measures the sensitivity of the benchmarked program to 
its one noise. As you wrote yourself its value should be close to 
zero until the machine is close to falling flat.


 I don't understand the part of the sentence starting with "...use
 the thereby...", I'd be grateful if you elaborated. 

Ohh..., an unrecognized deletion. I have written:
| use the thereby gathered _data to feed_ statistical routines to
| spread T
as elaborated above.

-manfred

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

(Resuming a few past discussions about std.benchmark.)

On 4/9/12 1:26 PM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:
 I disagree with running two benchmarks in parallel because that
 exposes them to even more noise (scheduling, CPU count, current
 machine load etc).

 I did not mean to run two or more benchmarks in parallel but only
 more instances of the benchmarked program _without_ the environment
 supplied by the benchmakr. Of course may there be more noise. But if
 so, that noise is dependent on the additional running instances and
 to nothing else.

I doubt that's the case. Once we discuss running several instances in 
parallel, I predict the vagaries of scheduling become much more prevalent.

 Now let T(n) be the time needed to run n instances in parallel on a
 single processor. According to your definition and your remark above:

 �   T(n) = n*A + Q + N + P(n)

 where P(n)>=0 and P(1)==0 is the additional noise with which the
 benchmarked program disturbs itself.

 Please observe that Q and N eliminate themselves on subtraction:
 �  T(2) - T(1) = A + P(2) - P(1)
 �  T(3) - T(2) = A + P(3) - P(2)
 ...

 P(n+1)-P(n) measures the sensitivity of the benchmarked program to
 its one noise. As you wrote yourself its value should be close to
 zero until the machine is close to falling flat.

It would be great if that applied, but I'm afraid things get 
significantly more complex. For example, the noise will have a 
per-processor component, and the processes will compete for cache among 
themselves.


Andrei

Apr 14 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

08.04.2012 21:31, Andrei Alexandrescu пишет:
 On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

 I've analyzed this quite a bit at work and the average and median are
 not very informative. You need the mode, but in most benchmarks the mode
 is very close to the minimum, so using the minimum is even better.

 In speed measurements, all noise is additive (there's no noise that may
 make a benchmark appear to run faster). There are also quite a few
 outliers. Recording the average will include a fair amount of noise.

Yes of course. I mean than an algorithm itself should follow some 
restrictions for such measurement. It should run with the same speed 
every time. Many algorithms follows this convention, but not all.

Why will recording the average produce so much noise? As I see, floating 
point arithmetic is now used without a strong reason so it looks like a 
time of this part isn't valuable. Or is it just a temporary solution?

Anyway it should be configurable using a CT parameter, so it will not 
abuse one who doesn't need it.

 Clearly there is noise during normal use as well, but incorporating it
 in benchmarks as a matter of course reduces the usefulness of benchmarks
 as a mean to improve performance of the benchmarked code.

A graph is needed exactly because of that. Without a graph it really 
gives very little.

 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation, graph (yes, good old ASCII art).

 Standard deviation is also not very useful because it includes all
 outliers (some of which are very far away from the mode).

So a graph is needed.

 Yes, graph is needed.

 I am not sure about that. We may provide the raw measurement data for
 programs that want to plot things, but plotting is beyond the charter of
 std.benchmark.

Sorry, meant a histogram, not a curve. A histogram can be shown in a 
console very well. And it is needed because its the easiest way to show 
benchmarked program behaviour (and noise behaviour). It also requires 
only about 80 integers to store information and shouldn't produce much 
noise.

IMHO, a histogram gives lots of information and will be a good addition.


-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
 Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large, 
e.g. milliseconds in a benchmark that takes microseconds) into account. 
The minimum is shielded from this issue. In the limit, the minimum for 
infinitely many measurements is the sought-after result.

 As I see, floating
 point arithmetic is now used without a strong reason so it looks like a
 time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".

 Anyway it should be configurable using a CT parameter, so it will not
 abuse one who doesn't need it.

We'd like the framework to do the right thing, and the average does not 
seem to be it.

 IMHO, a histogram gives lots of information and will be a good addition.

I disagree.


Andrei

Apr 09 2012

Denis Shelomovskij <verylonglogin.reg gmail.com> writes:

09.04.2012 17:26, Andrei Alexandrescu пишет:
 On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
 Why will recording the average produce so much noise?

 As I explained, the average takes noise and outliers (some very large,
 e.g. milliseconds in a benchmark that takes microseconds) into account.
 The minimum is shielded from this issue. In the limit, the minimum for
 infinitely many measurements is the sought-after result.

 As I see, floating
 point arithmetic is now used without a strong reason so it looks like a
 time of this part isn't valuable. Or is it just a temporary solution?

 I don't understand "time of this part".

Looks like a misunderstanding because of my bad English: "Recording the 
average will include a fair amount of noise" means for me that a process 
of "recording the average" produces "a fair amount of noise" itself, not 
that "a result includes noise".


-- 
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 10 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

 I've analyzed this quite a bit at work and the average and median are  
 not very informative. You need the mode, but in most benchmarks the mode  
 is very close to the minimum, so using the minimum is even better.

How is it better?

 In speed measurements, all noise is additive (there's no noise that may  
 make a benchmark appear to run faster). There are also quite a few  
 outliers. Recording the average will include a fair amount of noise.

The benchmarked function itself could have a relevant variance, e.g.
when using randomized algorithms. You already get that from the LRU table
in the array append cache.

Apr 10 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 08.04.2012 20:59, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

Prime example is networking.

 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation,

+1

  graph (yes, good old ASCII art).
 Yes, graph is needed.

No not ASCII art. It's too coarse grained and the only good thing it can 
do is look nicely in terminal (or give an overall picture but it's 
limited to say ~25 blocks in a histogram).

What would be nice to have an option to output bare-bones info so that 
one can easily adapt it and feed the result to e.g. gnuplot.
I think csv or it's version with tabs is good enough in this case.

Another cool addition IMHO would be parametric benchmarks, so there is a 
function and a set of parameters (one parameter is fine too) to 
benchmark on. It makes that much more sense with graphs as algorithm 
profile plotted for various inputs (sizes) can be very revealing.

-- 
Dmitry Olshansky

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
 Another cool addition IMHO would be parametric benchmarks, so there is a
 function and a set of parameters (one parameter is fine too) to
 benchmark on. It makes that much more sense with graphs as algorithm
 profile plotted for various inputs (sizes) can be very revealing.

This would be interesting to explore, could you give some more detail 
please?

Andrei

Apr 08 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 08.04.2012 21:40, Andrei Alexandrescu wrote:
 On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
 Another cool addition IMHO would be parametric benchmarks, so there is a
 function and a set of parameters (one parameter is fine too) to
 benchmark on. It makes that much more sense with graphs as algorithm
 profile plotted for various inputs (sizes) can be very revealing.

 This would be interesting to explore, could you give some more detail
 please?

I thought to just throw it in. But if were to do it in under an hour:

benchmark_func_blah(int problem_size);
auto benchmark_params_func_blah = [ 1, 10, 100, 1_000, 10_000 ...];

or slightly more general - use the range protocol:
auto benchmark_params_func_blah = getRangeOfValues();

I'm not sure the prefix is place to put it but here the rough idea.

-- 
Dmitry Olshansky

Apr 08 2012

Caligo <iteronvexor gmail.com> writes:

On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Hello,

 I finally found the time to complete std.benchmark. I got to a very simple
 API design, starting where I like it: one line of code.

 Andrei


I probably missed this somewhere, but what happens to `std.datetime :
benchmark` ?  Is it going to be deprecated ?

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 1:03 PM, Caligo wrote:
 On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>  wrote:
 Hello,

 I finally found the time to complete std.benchmark. I got to a very simple
 API design, starting where I like it: one line of code.

 Andrei


 I probably missed this somewhere, but what happens to `std.datetime :
 benchmark` ?  Is it going to be deprecated ?

Yes, I forgot to mention that. Part of the proposal is that 
std.datetime.StopWatch and std.datetime.benchmark are be deprecated.

Andrei

Apr 08 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

Added to trello.

-Steve

Apr 09 2012

"Francois Chabot" <francois.chabot.dev gmail.com> writes:

Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this 
library, and yet they are relegated to second-class citizen: "Oh, 
I guess you can use this". Normally, I wouldn't be so picky, but 
this is a standard library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just 
a bonus. Any benchmarking part of a large project is bound to 
format the output itself (to log benchmark results against 
revisions in a database or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at 
first glance. Something along the lines of benchmarkModule() and 
benchmarkAllModules() would be more sensible.

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 10:23 AM, Francois Chabot wrote:
 Why is there so much emphasis on printBenchmarks()?

 benchmark() and runBenchmarks() are clearly the core of this library,
 and yet they are relegated to second-class citizen: "Oh, I guess you can
 use this". Normally, I wouldn't be so picky, but this is a standard
 library. Focus should be on functionality.

The functionality is to make benchmark easy to use, meaningful, and easy 
to interpret. I don't want to add a complicated library for 
postprocessing benchmarks because most nobody will use it.

The first function in the documentation is what most people will want to 
bring themselves to using. The functions that provide the data are 
eminently available so I disagree with the "second-class citizen" 
characterization. You want to use them, use them. They don't need to be 
given rockstar billing.


Andrei

Apr 09 2012

Somedude <lovelydear mailmetrash.com> writes:

Le 09/04/2012 17:23, Francois Chabot a écrit :
 Why is there so much emphasis on printBenchmarks()?
 
 benchmark() and runBenchmarks() are clearly the core of this library,
 and yet they are relegated to second-class citizen: "Oh, I guess you can
 use this". Normally, I wouldn't be so picky, but this is a standard
 library. Focus should be on functionality.
 
 Providing formatted output is a nice bonus, but to me, it's just a
 bonus. Any benchmarking part of a large project is bound to format the
 output itself (to log benchmark results against revisions in a database
 or something like that).
 
 Also, benchmark() and runBenchmarks() are kind of confusing at first
 glance. Something along the lines of benchmarkModule() and
 benchmarkAllModules() would be more sensible.

The printBenchmark facility is cool and should be included imho.
It helps benchmarking being as standard as unit testing.
We don't want to have to write again and again the same boilerplate code
for such trivial uses.

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 11:44 AM, Somedude wrote:
 It helps benchmarking being as standard as unit testing.
 We don't want to have to write again and again the same boilerplate code
 for such trivial uses.

Yes, I had unittest in mind when writing the library. If one needs more 
than one statement to get an informative benchmark off the ground, we 
failed.

Andrei

Apr 09 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Andrei Alexandrescu wrote:
 Hello,
 
 
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.
 
 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's
 some noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
 
 If reasonable and at all possible, I'd like to bump the priority of
 this proposal. Clearly D's user base is highly interested in
 efficiency, and many of the upcoming libraries have efficiency a
 virtual part of their design. So we should get std.benchmark in soon
 and require that new addition come with benchmarks for their
 essential functionality.
 
 My vision is that in the future Phobos will have a significant
 benchmarks battery, which will help improving Phobos and porting to
 new platforms.

I come to think the same.

How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
a useful addition.
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() => {
         tms t;
         times(&t);
         return t.tms_utime;
      }

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
  * User CPU time
  * System CPU time
  * Time spent in memory allocations
Count measurements:
  * Memory usage
  * L1/L2/L3 cache misses
  * Number of executed instructions
  * Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

All in all the user interface has been greatly simplified.

Jens

Apr 10 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/10/12 5:40 AM, Jens Mueller wrote:
 How come that the times based relative report and the percentage based
 relative report are mixed in one result? And how do I choose which one
 I'd like to see in the output.

It's in the names. If the name of a benchmark starts with 
benchmark_relative_, then that benchmark is considered relative to the 
last non-relative benchmark. Using a naming convention allows complete 
automation in benchmarking a module.

I figure it's fine that all results appear together because the absence 
of data in the relative column clarifies which is which.

 When benchmarking you can measure different things at the same time. In
 this regard the current proposal is limited. It just measures wall clock
 time. I believe extending the StopWatch to measure e.g. user CPU time is
 a useful addition.

Generally I fear piling too much on StopWatch because every feature adds 
its own noise. But there's value in collecting the result of times(). 
What would be the Windows equivalent?

 In general, allowing user defined measurements would be great.
 E.g. to measure the time spend in user mode.
 () =>  {
           tms t;
           times(&t);
           return t.tms_utime;
        }

 Note, that this code does not need to be portable. You can also use
 version() else static assert.

 Things that come to mind that I'd like to measure.
 Time measurements:
    * User CPU time
    * System CPU time
    * Time spent in memory allocations
 Count measurements:
    * Memory usage
    * L1/L2/L3 cache misses
    * Number of executed instructions
    * Number of memory allocations

 Of course wall clock time is the ultimate measure when benchmarking.
 But often you need to investigate further (doing more measurements).

 Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework too 
much. Let's see:

* Memory usage is, I think, difficult in Windows.
* Estimating cache misses and executed instructions is significant research
* Number of memory allocations requires instrumenting druntime


Andrei

Apr 14 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Andrei Alexandrescu wrote:
 On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

 
 It's in the names. If the name of a benchmark starts with
 benchmark_relative_, then that benchmark is considered relative to
 the last non-relative benchmark. Using a naming convention allows
 complete automation in benchmarking a module.
 
 I figure it's fine that all results appear together because the
 absence of data in the relative column clarifies which is which.

I know. That wasn't my question. How do I choose between percentage vs.
factors for a relative benchmark, e.g. 200% vs. 2?

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
a useful addition.

 
 Generally I fear piling too much on StopWatch because every feature
 adds its own noise. But there's value in collecting the result of
 times(). What would be the Windows equivalent?

I'm not a Windows user but GetProcessTimes
seems to be the Windows equivalent.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683223%28v=vs.85%29.aspx

In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
          tms t;
          times(&t);
          return t.tms_utime;
       }

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
   * User CPU time
   * System CPU time
   * Time spent in memory allocations
Count measurements:
   * Memory usage
   * L1/L2/L3 cache misses
   * Number of executed instructions
   * Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

 
 Absolutely. I just fear about expanding the charter of the framework
 too much. Let's see:
 
 * Memory usage is, I think, difficult in Windows.
 * Estimating cache misses and executed instructions is significant research
 * Number of memory allocations requires instrumenting druntime

I think that std.benchmark shouldn't do this. But I think we should
figure out whether adding a user-defined measurement function is
possible. Making sure that be we have a design that is flexible enough
to capture different measurements.

Jens

Apr 15 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

There have been quite a few good comments, but no review manager offer. 
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than 
later because it can be used by subsequent submissions (many of which 
allege efficiency as a major advantage) to show they improve over the 
state of the art.

Thanks,

Andrei

Apr 14 2012

Jacob Carlborg <doob me.com> writes:

On 2012-04-15 04:58, Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager offer.
 Could someone please take this role?

 Again, it would be great to get std.benchmark in sooner rather than
 later because it can be used by subsequent submissions (many of which
 allege efficiency as a major advantage) to show they improve over the
 state of the art.

 Thanks,

 Andrei

As far as I know, there are already several other projects/modules in 
the pipeline before std.benchmark. If we are to review them in the order 
they are submitted.

-- 
/Jacob Carlborg

Apr 15 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Sunday, April 15, 2012 13:41:28 Jacob Carlborg wrote:
 On 2012-04-15 04:58, Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager offer.
 Could someone please take this role?
 
 Again, it would be great to get std.benchmark in sooner rather than
 later because it can be used by subsequent submissions (many of which
 allege efficiency as a major advantage) to show they improve over the
 state of the art.
 
 Thanks,
 
 Andrei

 
 As far as I know, there are already several other projects/modules in
 the pipeline before std.benchmark. If we are to review them in the order
 they are submitted.

There are, but none of them are being reviewed at the moment or being pushed 
for a review. And there _is_ an argument for std.benchmark being high priority 
based on how it could affect future reviews. We should probably start reviewing 
_something_ soon though. We haven't had very good momentum on that of late.

- Jonathan M Davis

Apr 16 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager
 offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

Jens

Apr 15 2012

"Lars T. Kyllingstad" <public kyllingen.net> writes:

On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
 Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review 
 manager
 offer. Could someone please take this role?

 I will do this.
 But I will need to get more familiar with the process. And add 
 it to
 http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
 managers.


The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

-Lars

Apr 16 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Lars T. Kyllingstad wrote:
 On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it
to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

 
 
 The review process is based on Boost's:
 
 http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something
way longer. Since the Phobos' review process is based on Boost's where
does deviate?

Jens

Apr 16 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 17.04.2012 1:00, Jens Mueller wrote:
 Lars T. Kyllingstad wrote:
 On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
 Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager
 offer. Could someone please take this role?

 I will do this.
 But I will need to get more familiar with the process. And add it
 to
 http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
 managers.


 The review process is based on Boost's:

 http://www.boost.org/community/reviews.html#Review_Manager

 The page is shorter than expected. Last time I checked I found something
 way longer. Since the Phobos' review process is based on Boost's where
 does deviate?

It's peer review followed by voting and that's all about it.
I don't think you should seek any formal standards, documents, 
guidelines, etc.

The manager role is rather simple:

1. Manger picks module from review queue and posts an announcement that 
formal review of it starts today. Post should contain relevant 
information about module and links to source/documentation. Also 
importantly it sets the exact time the review ends and the exact time 
voting ends. (usually 2 weeks review, 1 week for voting)

2. When review ends Manager ether opens a vote thread.
If it's obvious that module needs futher work (on explicit request from 
author) Manager may choose to prolong or postpone review thus putting it 
back into queue.

3. When vote period ends count up the votes and declare the result.

And that's it.
There are no hard rule on which module has the priority in queue.
It is loosely calculated on basis of how long it was ready for review 
and how important the functionality is.

P.S. If you are not sure are up to it just say the word and I'll 
volunteer instead. We need to push this forward as the review queue is 
going to overflow real soon ;)

-- 
Dmitry Olshansky

Apr 17 2012

Jens Mueller <jens.k.mueller gmx.de> writes:

Dmitry Olshansky wrote:
 On 17.04.2012 1:00, Jens Mueller wrote:
Lars T. Kyllingstad wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it
to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.


The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something
way longer. Since the Phobos' review process is based on Boost's where
does deviate?

 
 It's peer review followed by voting and that's all about it.
 I don't think you should seek any formal standards, documents,
 guidelines, etc.
 
 The manager role is rather simple:
 
 1. Manger picks module from review queue and posts an announcement
 that formal review of it starts today. Post should contain relevant
 information about module and links to source/documentation. Also
 importantly it sets the exact time the review ends and the exact
 time voting ends. (usually 2 weeks review, 1 week for voting)
 
 2. When review ends Manager ether opens a vote thread.
 If it's obvious that module needs futher work (on explicit request
 from author) Manager may choose to prolong or postpone review thus
 putting it back into queue.
 
 3. When vote period ends count up the votes and declare the result.
 
 And that's it.
 There are no hard rule on which module has the priority in queue.
 It is loosely calculated on basis of how long it was ready for
 review and how important the functionality is.

Many thanks for your explanation.

 P.S. If you are not sure are up to it just say the word and I'll
 volunteer instead. We need to push this forward as the review queue
 is going to overflow real soon ;)

Andrei is already preparing. Review will start soon.

Jens

Apr 17 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that 
std.benchmark skips to the front of the queue instead of waiting in 
line. Please reply to this if agree/disagree.

Thanks,

Andrei

Apr 17 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Tuesday, 17 April 2012 at 21:15:31 UTC, Andrei Alexandrescu 
wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

 I'd like to gather a bit of confidence that it's okay with 
 people that std.benchmark skips to the front of the queue 
 instead of waiting in line. Please reply to this if 
 agree/disagree.

 Thanks,

 Andrei

I agree, there is no good reason for the queue to be stopped like 
it is now, but since noone seems to step up for the other 
modules, let's not waste time and review your module. Hopefully, 
the other modules will be reviewed ASAP. We gain nothing in 
waiting.

Apr 17 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, April 17, 2012 16:15:43 Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

 
 I'd like to gather a bit of confidence that it's okay with people that
 std.benchmark skips to the front of the queue instead of waiting in
 line. Please reply to this if agree/disagree.

I have no problem with it. None of the others is being pushed for review or 
promoted at present. I'd just as soon have the std.benchmark review get under 
way rather than sit around waiting for one of those earlier in the queue to 
get started.

- Jonathan M Davis

Apr 17 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

 
 I'd like to gather a bit of confidence that it's okay with people
 that std.benchmark skips to the front of the queue instead of
 waiting in line. Please reply to this if agree/disagree.

[...]

Let's just check it in since it's ready. No point in waiting.


T

-- 
Some days you win; most days you lose.

Apr 17 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, April 17, 2012 18:55:07 H. S. Teoh wrote:
 On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

 
 I'd like to gather a bit of confidence that it's okay with people
 that std.benchmark skips to the front of the queue instead of
 waiting in line. Please reply to this if agree/disagree.

 
 [...]
 
 Let's just check it in since it's ready. No point in waiting.

Check it in? You mean start the review on it? I hope that you don't mean 
"check it in" as in commit it. It needs to be reviewed and voted on first.

- Jonathan M Davis

Apr 17 2012

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 17 Apr 2012 16:15:43 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

 
 I'd like to gather a bit of confidence that it's okay with people that 
 std.benchmark skips to the front of the queue instead of waiting in 
 line. Please reply to this if agree/disagree.
 
 Thanks,
 
 Andrei

In a supermarket, there are three people in front of you in the queue. Then
they all decide they forgot something and disappear in the back. You exchange
some looks with the cashier and decide that you'll be long finished before any
of them come back. Go ahead.

-- 
Marco

Apr 18 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Wednesday, 18 April 2012 at 09:44:12 UTC, Marco Leise wrote:
 Am Tue, 17 Apr 2012 16:15:43 -0500
 In a supermarket, there are three people in front of you in the 
 queue. Then they all decide they forgot something and disappear 
 in the back. You exchange some looks with the cashier and 
 decide that you'll be long finished before any of them come 
 back. Go ahead.

Well, we are not in a supermarket. It was correct to ask.

Apr 18 2012

"Famous" <none none.net> writes:

Instead of the originally proposed layout

================================================================
Benchmark                               relative ns/iter  iter/s
================================================================
---[ module_one ]-----------------------------------------------
file write                                        140.2K    7.1K
file read                                 517.8%   27.1K   36.9K
array creation                             1.2Kx  116.0     8.6M
================================================================

I would like to see something more similar to the following:

================================================================
Benchmark                                Time    Performance
                                                 ---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
file read                                 27.10µ  36.90k   5.17
array creation                           116.00n   8.62M   1.21k
================================================================

Transition to the new layout can be achieved as follows:

         * Consider layout guidelines given in

            http://mirrors.ctan.org/info/german/tabsatz/tabsatz.pdf

         * Reorder columns considering their semantic and introduce
           groups.

         * Rename 'relative' to 'factor' and avoid usage of 'x' for
           'times'.

         * Divide column 'ns/iter' by a billion (see Vladimir's
           post).

         * Right align module name (abbreviate if it is too long).

         * Avoid '%' but rely on the prefixes specified in the SI:

http://en.wikipedia.org/w/index.php?title=International_System_of_Units&oldid=487057838#Units_and_prefixes

           Then, using a fixed width of 7 characters we have 3
           significant digits, at least:

            123450000         -> 123.45M
             12345000         ->  12.35M
              1234500         ->   1.23M
               123450         -> 123.45k
                12345         ->  12.35k
                 1234.5       ->   1.23k
                  123.45      -> 123.45
                   12.345     ->  12.35
                    1.2345    ->   1.23
                    0.12345   -> 123.45m
                    0.012345  ->  12.35m
                    0.0012345 ->   1.23m

Cheers,
Famous

Apr 15 2012

"Famous" <none none.net> writes:

Maybe the layout is not destroyed, this time:

 ================================================================
 Benchmark                                Time    Performance
                                                  ---------------
 name                                     s/iter  iter/s  factor
 -----------------------[ module_one ]--- ------- ------- -------
 file write                               140.20µ   7.13k
 file read                                 27.10µ  36.90k   5.17
 array creation                           116.00n   8.62M   1.21k
 ================================================================

Apr 15 2012

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a 
 very simple API design, starting where I like it: one line of 
 code.

 Code is in the form of a pull request at 
 https://github.com/D-Programming-Language/phobos/pull/529. 
 (There's some noise in there caused by my git n00biness). 
 Documentation is at 
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

 If reasonable and at all possible, I'd like to bump the 
 priority of this proposal. Clearly D's user base is highly 
 interested in efficiency, and many of the upcoming libraries 
 have efficiency a virtual part of their design. So we should 
 get std.benchmark in soon and require that new addition come 
 with benchmarks for their essential functionality.

 My vision is that in the future Phobos will have a significant 
 benchmarks battery, which will help improving Phobos and 
 porting to new platforms.


 Andrei

Where have std.benchmark gone?

I've written a benchmarking framework in C++ that I would like to
port to D.
If you've already implemented some cool stuff its better to
reuse/merge it with mine.

My C++ code has a bench of external deps (not included in this 
post).
If you're interested
I'll send you the rest. It's inspired by Haskell's powerful
benchmarking framework(s) because it cant automatically figure
out how to randomize various kinds of types including recursive
instantiatinos of builtins, STL pairs, tuples and even containers
using a randomization framework that uses a
STL-container-type_trait (rand.hpp which is not included).

I'm guessing implementing such a framework in D is an order of 
magnitude easier than in C++.

Innovative things about timed.hpp is constant time benchmarking
meaning that the loop will generate a larger and large instance
of the structure until a specified amount of time has passed. I
believe this i a more user-friendly way of benchmarking than most
other frameworks do. And with D's supercool struct/class tupleof
even more automation of instance generation is possible.

Destroy!

Code:

/*! \file timed.hpp
    * \brief Benchmark Functions on Auto-Randomized Arguments.
    * \author Per Nordlöw (per.nordlow gmail.com)
    * \license Boost
    *
    * \see http://en.cppreference.com/w/cpp/thread/async
    * \see
http://stackoverflow.com/questions/10059239/gccs-behaviour-with-stdasyncstdlaunchasync-vs-clangs-behaviour
    * \see
http://stackoverflow.com/questions/8895350/quickcheck-like-template-function-benching-in-c
    * \see
http://functionalcpp.wordpress.com/2013/08/03/generalized-function-evaluation/
    * \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
    * \see https://limcore.wordpress.com/2008/04/13/8/
    * \see
http://www.randombit.net/bitbashing/programming/parallel_function_invocation_using_std_async.html
    * \todo Use const boost::posix_time::time_duration & dur =
boost::posix_time::seconds(1)
    */

#pragma once
#include <utility>
#include <vector>
#include <future>
#include <chrono>
#include <thread>
#include <type_traits>
#include "rand.hpp"
#include "type_traits_x.hpp"

namespace pnw
{

struct timed {
       typedef std::chrono::high_resolution_clock hrc;
       typedef std::chrono::milliseconds millis;
       typedef std::chrono::microseconds micros;
public:
       timed(size_t max_length = 1024*256,
             millis total_duration_max = millis(1000),
             millis call_duration_max = millis(0),
             uint n_tries = 1)
           : m_max_length(max_length),
             m_tot_dur(total_duration_max),
             m_call_dur(call_duration_max),
             m_n_tries(n_tries)
       {}

       /*! Benchmark Function/Lambda/Functor \p f with non-void
return value.
        * \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
        */
       template<class Func>
       auto call(Func&& f)
           -> typename std::enable_if<!
std::is_void<decltype(f())>::value, // if non-void return
                                      std::pair<decltype(f()),
                                                typename
hrc::duration> >::type {
           decltype(f()) result;
           auto tA = hrc::now();
           for (uint i = 0; i < m_n_tries; i++) { result = f(); } 
//
call it \c m_n_tries times
           auto span = hrc::now() - tA;
           return std::make_pair(result, span);
       }

       /*! Benchmark Function/Lambda/Functor \p f with \c void
return value.
        * \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
        */
       template<class Func>
       auto call(Func&& f)
           -> typename
std::enable_if<std::is_void<decltype(f())>::value, // if void
return
                                      typename hrc::duration>::type
{
           auto tA = hrc::now();
           for (uint i = 0; i < m_n_tries; i++) { f(); } // call it
\c m_n_tries times
           return hrc::now() - tA; // reverse order?
       }

#if 0
       /*! Call Function \p f after \p us microseconds.
        * \see
http://stackoverflow.com/questions/10804683/delayed-function-call
        */
       template<class Func, class... Args>
       inline std::future<typename
std::result_of<Func(Args...)>::type>
       delayed_async(Func&& f, Args&&... args, micros us) {
           return std::async(std::launch::async, [f, us, args...] 
()
{
                   std::this_thread::sleep_for(us);
                   return f(args...);            // call
               } );
       }
#endif

       /*! Benchmark the \em Single Argument Function \p f using
Randomized Instances of type \p Cont.
        * \tparam C Container Type.
        * \tparam B benchS (Times) Container.
        * \param[in] f Function.
        * \return vector of pairs of \c C-instance size and
execute-time.
        * \todo Estimate Time Complexity Model: O(c), O(n), ...,
O(n^p), O(log(n)) O(exp(n))
        *       and predict cost of next iteration to halt within
time-limit.
        * \see
http://stackoverflow.com/questions/7245235/syntax-of-c-templates-with-function-type-parameters
        * \see http://en.cppreference.com/w/cpp/chrono/steady_clock
        */
       template<class C, class B = std::vector<std::pair<size_t,
                                                         double> > 

       typename std::enable_if<is_container<C>::value, B>::type
       call_s(void (*f)(C&)) {
           const auto deadline = hrc::now() + m_tot_dur;
           B bm;                    // benchmarks
           C c;                     // test data
           for (size_t n = 1; n <= m_max_length; n *= 2) {
               // efficient way of adding more random values to a
vector
               auto c2 = rand_n<C>(n); c.insert(begin(c), 
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
               auto tA = hrc::now(); // log start
               for (uint i = 0; i < m_n_tries; i++) { f(c); } //
call it \c m_n_tries times
               auto span = hrc::now() - tA;
               if (m_call_dur.count() != 0 and
                   span >= m_call_dur) { break; } // if function
call took too long
               auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
               bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
               if (hrc::now() + span >= deadline) { // if no time
for yet another measure taking at least \c span
                   break;
               }
           }
           return bm;
       }

       /*! Benchmark the \em Single Argument Function \p f using
Randomized Instances of type \p Cont.
        * \see Previous overload of \c call_s
        */
       template<class C, class B = std::vector<std::pair<size_t,
double> > >
       typename std::enable_if<is_container<C>::value, B>::type
       call_s(void (*f)(const C&)) {
           return call_s(reinterpret_cast<void (*)(C&)>(f));
       }

       /*! Benchmark Function \p f using Randomized Instances of
type \p Cont.
        * \tparam C Container.
        * \tparam B benchS (Times) Container.
        * \param[in] f Function.
        * \return vector of pairs of \c C-instance size and
execute-time.
        */
       template<class C, class B = std::vector<std::pair<size_t,
                                                         double> > 

       typename std::enable_if<is_container<C>::value, B>::type
       call(void (*f)(typename C::iterator,
                      typename C::iterator))
       {
           const auto deadline = hrc::now() + m_tot_dur;
           B bm;                    // benchmarks
           C c;                     // test data
           for (size_t n = 1; n <= m_max_length; n *= 2) {
               // efficient way of adding more random values to a
vector
               auto c2 = rand_n<C>(n); c.insert(begin(c), 
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
               auto tA = hrc::now(); // log start
               for (uint i = 0; i < m_n_tries; i++) { f(c.begin(),
c.end()); } // call it \c m_n_tries times
               auto span = hrc::now() - tA;
               if (m_call_dur.count() != 0 and
                   span >= m_call_dur) { break; } // if function
call took too long
               auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
               bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
               if (hrc::now() + span >= deadline) { // if there's 
no
time for more measures
                   break;
               }
           }
           return bm;
       }

public:
       size_t m_max_length = 1024*256; ///< Maximum container
instance length of \c T.
       millis m_tot_dur = millis(1000); ///< Maximum time to wait
for whole benchmarking to complete.
       millis m_call_dur = millis(0); ///< Maximum time to wait for
a call \p f to complete.
       uint m_n_tries = 1;    ///< Number of tries per size of
randomizeda instance.
       std::launch m_launch_policy = std::launch::async |
std::launch::deferred;    ///< Thread launch policy.
};

}

Sep 27 2013

D Programming

C/C++ Programming

Other

digitalmars.D - std.benchmark ready for review. Manager sought after