www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.benchmark ready for review. Manager sought after

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


I finally found the time to complete std.benchmark. I got to a very 
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at 
https://github.com/D-Programming-Language/phobos/pull/529. (There's some 
noise in there caused by my git n00biness). Documentation is at 
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this 
proposal. Clearly D's user base is highly interested in efficiency, and 
many of the upcoming libraries have efficiency a virtual part of their 
design. So we should get std.benchmark in soon and require that new 
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant 
benchmarks battery, which will help improving Phobos and porting to new 
platforms.


Andrei
Apr 07 2012
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu 
wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a 
 very simple API design, starting where I like it: one line of 
 code.

Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)? 2) The meaning of "epoch" in this context is unfamiliar to me:
 Measurement is done in epochs. For each function benchmarked, 
 the smallest time is taken over all epochs.

3) "benchmark_relative_file read" should be replaced with a language construct. E.g. a function call like relativeBenchmark("file read"), or an enum value like getopt's. 4) There is a TODO in benchmarkSuspend's example.
Apr 07 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/7/12 11:45 PM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)?

Cool.
 2) The meaning of "epoch" in this context is unfamiliar to me:

 Measurement is done in epochs. For each function benchmarked, the
 smallest time is taken over all epochs.


To me too :o).
 3) "benchmark_relative_file read" should be replaced with a language
 construct. E.g. a function call like relativeBenchmark("file read"), or
 an enum value like getopt's.

No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.
 4) There is a TODO in benchmarkSuspend's example.

Will destroy. Andrei
Apr 07 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 2:02 AM, Vladimir Panteleev wrote:
 The "benchmark_relative_" prefix makes sense for function names (for
 scheduleForBenchmarking), but not so much for string literals for
 benchmark names. The string literal "benchmark_relative_file read" looks
 like the words "benchmark relative file" are grouped together, with
 "read" added on. So, my suggestion would be to wrap the
 "benchmark_relative_" prefix - when used with benchmark name strings -
 into a semantical function / enum / etc. In my example above,
 relativeBenchmark would be:

 string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; }

 I suppose it can be summed up as a tradeoff between complexity (you need
 to explain both the function name usage and the relativeBenchmark
 wrapper usage) vs. code prettiness.

I understand, thanks. Andrei
Apr 08 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 8 April 2012 at 05:41:11 UTC, Andrei Alexandrescu 
wrote:
 3) "benchmark_relative_file read" should be replaced with a 
 language
 construct. E.g. a function call like relativeBenchmark("file 
 read"), or
 an enum value like getopt's.

No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.

Hmm, maybe there's a misunderstanding, but what I meant was: The "benchmark_relative_" prefix makes sense for function names (for scheduleForBenchmarking), but not so much for string literals for benchmark names. The string literal "benchmark_relative_file read" looks like the words "benchmark relative file" are grouped together, with "read" added on. So, my suggestion would be to wrap the "benchmark_relative_" prefix - when used with benchmark name strings - into a semantical function / enum / etc. In my example above, relativeBenchmark would be: string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; } I suppose it can be summed up as a tradeoff between complexity (you need to explain both the function name usage and the relativeBenchmark wrapper usage) vs. code prettiness.
Apr 08 2012
prev sibling next sibling parent reply Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 05:25, Andrei Alexandrescu a crit :
 Hello,
 
 
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.
 
 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
 
 If reasonable and at all possible, I'd like to bump the priority of this
 proposal. Clearly D's user base is highly interested in efficiency, and
 many of the upcoming libraries have efficiency a virtual part of their
 design. So we should get std.benchmark in soon and require that new
 addition come with benchmarks for their essential functionality.
 
 My vision is that in the future Phobos will have a significant
 benchmarks battery, which will help improving Phobos and porting to new
 platforms.
 
 
 Andrei

Like it. Would it be a good idea to add a column with an average memory used ?
Apr 08 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Apr 08 2012
parent Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 18:21, Marco Leise a crit :
 Am Sun, 08 Apr 2012 09:35:14 -0500
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 
 On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei

Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking.

I thought about monitoring as well. But then again, following your remark, monitoring may in fact be a completely different matter. This would need a little bit more thought I guess. Adding monitoring capabilities to processes would be interesting. But I guess we are already quite far from benchmarking now. Unless std.benchmark makes use of these monitoring capabilities.
Apr 08 2012
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 12:16, Somedude wrote:
[snip]
 Like it. Would it be a good idea to add a column with an average memory
 used ?

In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;) -- Dmitry Olshansky
Apr 08 2012
parent Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 17:48, Dmitry Olshansky a crit :
 On 08.04.2012 12:16, Somedude wrote:
 [snip]
 Like it. Would it be a good idea to add a column with an average memory
 used ?

In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;)

That's what I thought, thank you.
Apr 08 2012
prev sibling next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Andrei Alexandrescu wrote:
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html
Apr 08 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 5:46 AM, Piotr Szturmaj wrote:
 Andrei Alexandrescu wrote:
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.

 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's some
 noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html

The framework aims at generality and simplicity. I'd rather have it simple and universal rather than subtle. Andrei
Apr 08 2012
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 4/8/12 3:16 AM, Somedude wrote:
 Like it. Would it be a good idea to add a column with an average memory
 used ?

Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei

Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking. -- Marco
Apr 08 2012
prev sibling next sibling parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
Very good but minimum isn't a best guess. Personally I (and there will 
be a lot of such maniacs I suppose) will think that this (minimum) time 
can be significantly smaller than average time.

So a parameter (probably with a default value) should be added. 
Something like enum of flags telling what we want to know. At least 
these looks usable: minTime, <some mean time>, maxTime, 
standardDeviation, graph (yes, good old ASCII art).

Yes, graph is needed.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
Apr 08 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise. Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarks as a mean to improve performance of the benchmarked code.
 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation, graph (yes, good old ASCII art).

Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).
 Yes, graph is needed.

I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark. Andrei
Apr 08 2012
next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

 Clearly there is noise during normal use as well, but
 incorporating it in benchmarks as a matter of course reduces the
 usefulness of benchmarks 

On the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise. 2) The noise the benchmarked program produces has to be measured too, because the running benchmarked program probably increases the noise for all other running programs. In addition: the noise produced by a machine under heavy load might bring the performance of the benchmarked program down to zero. -manfred
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:03 PM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:

 Clearly there is noise during normal use as well, but
 incorporating it in benchmarks as a matter of course reduces the
 usefulness of benchmarks

On the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise.

That sounds quite tenuous to me. How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N where: A > 0 is actual benchmark time Q > 0 quantization noise (uniform distribution) N > 0 various other noises (interrupts, task switching, networking, CPU dynamically changing frequency, etc). Many people jump on Gaussian as an approximation, but my tests suggest it's hardly so because it has a lot of jerky outliers. How do we estimate A given T?
 2) The noise the benchmarked program produces has to be measured too,
 because the running benchmarked program probably increases the noise
 for all other running programs.

How to measure that? Also, that noise does not need to be measured as much as eliminated to the extent possible. This is because the benchmark app noise is a poor model of the application-induced noise.
 In addition: the noise produced by a machine under heavy load might
 bring the performance of the benchmarked program down to zero.

Of course. That's why the documentation emphasizes the necessity of baselines. A measurement without baselines is irrelevant. Andrei
Apr 08 2012
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:


 all noise is additive (there's no noise that may make a benchmark
 appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet. b) needed data is stored in a hdd whose I/O scheduler uses the elevator algorithm and serves the request by pure chance instantly, because the position of the needed data is between two positions accessed by some programs not belonging to the benchmarked set. Especially a hdd, if used, will be responsible for a lot of noise you define as "quantization noise (uniform distribution)" even if the head stays at the same cylinder. Not recognizing this noise would only mean that the data is cached and interpreting the only true read from the hdd as a jerky outlier sems quite wrong.
 1) The "noise during normal use" has to be measured in order to
 detect the sensibility of the benchmarked program to that noise.

conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N

For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.
 2) The noise the benchmarked program produces has to be measured
 too, because the running benchmarked program probably increases
 the noise for all other running programs.

How to measure that?

Similar to the above note.
 Also, that noise does not need to be measured
 as much as eliminated to the extent possible.

I wouldn't define two programs to be equivalent based on the time until completion only. That time might be identical for both programs, but if only one of the programs increases the answering time of the machine to inacceptability I would choose the other. -manfred
Apr 09 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 9:25 AM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:
 all noise is additive (there's no noise that may make a benchmark
 appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)
 b) needed data is stored in a hdd whose I/O scheduler uses the elevator
 algorithm and serves the request by pure chance instantly, because the
 position of the needed data is between two positions accessed by some
 programs not belonging to the benchmarked set.

 Especially a hdd, if used, will be responsible for a lot of noise you
 define as "quantization noise (uniform distribution)" even if the head
 stays at the same cylinder. Not recognizing this noise would only mean
 that the data is cached and interpreting the only true read from the
 hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself should make sure the HDD cache is cleared. (What I recall they do on Linux is unmounting and remounting the drive.) Otherwise, it adds a useless component to the timing.
 1) The "noise during normal use" has to be measured in order to
 detect the sensibility of the benchmarked program to that noise.

conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N

For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc). I don't understand the part of the sentence starting with "...use the thereby...", I'd be grateful if you elaborated. Andrei
Apr 09 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:29 AM, Francois Chabot wrote:
 Which is great, unless the program wants to measure the cache memory
 itself, in which case it would use special assembler instructions or
 large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function.

But cache binding depends on a variety of cache characteristics, i.e. the machine. The question is whether we accept a heavy dependence of the benchmark on the machine. Andrei
Apr 09 2012
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

 For example by running more than one instance of the benchmarked
 program in paralell and use the thereby gathered statistical 
 routines to spread T into the additiv components A, Q and N.


 I disagree with running two benchmarks in parallel because that
 exposes them to even more noise (scheduling, CPU count, current
 machine load etc).

I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else. Now let T(n) be the time needed to run n instances in parallel on a single processor. According to your definition and your remark above: T(n) = n*A + Q + N + P(n) where P(n)>=0 and P(1)==0 is the additional noise with which the benchmarked program disturbs itself. Please observe that Q and N eliminate themselves on subtraction: T(2) - T(1) = A + P(2) - P(1) T(3) - T(2) = A + P(3) - P(2) ... P(n+1)-P(n) measures the sensitivity of the benchmarked program to its one noise. As you wrote yourself its value should be close to zero until the machine is close to falling flat.
 I don't understand the part of the sentence starting with "...use
 the thereby...", I'd be grateful if you elaborated. 

Ohh..., an unrecognized deletion. I have written: | use the thereby gathered _data to feed_ statistical routines to | spread T as elaborated above. -manfred
Apr 09 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(Resuming a few past discussions about std.benchmark.)

On 4/9/12 1:26 PM, Manfred Nowak wrote:
 Andrei Alexandrescu wrote:
 I disagree with running two benchmarks in parallel because that
 exposes them to even more noise (scheduling, CPU count, current
 machine load etc).

I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else.

I doubt that's the case. Once we discuss running several instances in parallel, I predict the vagaries of scheduling become much more prevalent.
 Now let T(n) be the time needed to run n instances in parallel on a
 single processor. According to your definition and your remark above:

    T(n) = n*A + Q + N + P(n)

 where P(n)>=0 and P(1)==0 is the additional noise with which the
 benchmarked program disturbs itself.

 Please observe that Q and N eliminate themselves on subtraction:
   T(2) - T(1) = A + P(2) - P(1)
   T(3) - T(2) = A + P(3) - P(2)
 ...

 P(n+1)-P(n) measures the sensitivity of the benchmarked program to
 its one noise. As you wrote yourself its value should be close to
 zero until the machine is close to falling flat.

It would be great if that applied, but I'm afraid things get significantly more complex. For example, the noise will have a per-processor component, and the processes will compete for cache among themselves. Andrei
Apr 14 2012
prev sibling parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
08.04.2012 21:31, Andrei Alexandrescu пишет:
 On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise.

Yes of course. I mean than an algorithm itself should follow some restrictions for such measurement. It should run with the same speed every time. Many algorithms follows this convention, but not all. Why will recording the average produce so much noise? As I see, floating point arithmetic is now used without a strong reason so it looks like a time of this part isn't valuable. Or is it just a temporary solution? Anyway it should be configurable using a CT parameter, so it will not abuse one who doesn't need it.
 Clearly there is noise during normal use as well, but incorporating it
 in benchmarks as a matter of course reduces the usefulness of benchmarks
 as a mean to improve performance of the benchmarked code.

A graph is needed exactly because of that. Without a graph it really gives very little.
 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation, graph (yes, good old ASCII art).

Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).

So a graph is needed.
 Yes, graph is needed.

I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark.

Sorry, meant a histogram, not a curve. A histogram can be shown in a console very well. And it is needed because its the easiest way to show benchmarked program behaviour (and noise behaviour). It also requires only about 80 integers to store information and shouldn't produce much noise. IMHO, a histogram gives lots of information and will be a good addition. -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 09 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
 Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.
 As I see, floating
 point arithmetic is now used without a strong reason so it looks like a
 time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".
 Anyway it should be configurable using a CT parameter, so it will not
 abuse one who doesn't need it.

We'd like the framework to do the right thing, and the average does not seem to be it.
 IMHO, a histogram gives lots of information and will be a good addition.

I disagree. Andrei
Apr 09 2012
parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
09.04.2012 17:26, Andrei Alexandrescu пишет:
 On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
 Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.
 As I see, floating
 point arithmetic is now used without a strong reason so it looks like a
 time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".

Looks like a misunderstanding because of my bad English: "Recording the average will include a fair amount of noise" means for me that a process of "recording the average" produces "a fair amount of noise" itself, not that "a result includes noise". -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 10 2012
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 20:59, Denis Shelomovskij wrote:
 Very good but minimum isn't a best guess. Personally I (and there will
 be a lot of such maniacs I suppose) will think that this (minimum) time
 can be significantly smaller than average time.

Prime example is networking.
 So a parameter (probably with a default value) should be added.
 Something like enum of flags telling what we want to know. At least
 these looks usable: minTime, <some mean time>, maxTime,
 standardDeviation,

+1 graph (yes, good old ASCII art).
 Yes, graph is needed.

No not ASCII art. It's too coarse grained and the only good thing it can do is look nicely in terminal (or give an overall picture but it's limited to say ~25 blocks in a histogram). What would be nice to have an option to output bare-bones info so that one can easily adapt it and feed the result to e.g. gnuplot. I think csv or it's version with tabs is good enough in this case. Another cool addition IMHO would be parametric benchmarks, so there is a function and a set of parameters (one parameter is fine too) to benchmark on. It makes that much more sense with graphs as algorithm profile plotted for various inputs (sizes) can be very revealing. -- Dmitry Olshansky
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
 Another cool addition IMHO would be parametric benchmarks, so there is a
 function and a set of parameters (one parameter is fine too) to
 benchmark on. It makes that much more sense with graphs as algorithm
 profile plotted for various inputs (sizes) can be very revealing.

This would be interesting to explore, could you give some more detail please? Andrei
Apr 08 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 21:40, Andrei Alexandrescu wrote:
 On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
 Another cool addition IMHO would be parametric benchmarks, so there is a
 function and a set of parameters (one parameter is fine too) to
 benchmark on. It makes that much more sense with graphs as algorithm
 profile plotted for various inputs (sizes) can be very revealing.

This would be interesting to explore, could you give some more detail please?

I thought to just throw it in. But if were to do it in under an hour: benchmark_func_blah(int problem_size); auto benchmark_params_func_blah = [ 1, 10, 100, 1_000, 10_000 ...]; or slightly more general - use the range protocol: auto benchmark_params_func_blah = getRangeOfValues(); I'm not sure the prefix is place to put it but here the rough idea. -- Dmitry Olshansky
Apr 08 2012
prev sibling next sibling parent "Francois Chabot" <francois.chabot.dev gmail.com> writes:
 Which is great, unless the program wants to measure the cache 
 memory itself, in which case it would use special assembler 
 instructions or large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function. -- Francois
Apr 09 2012
prev sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
 I've analyzed this quite a bit at work and the average and median are  
 not very informative. You need the mode, but in most benchmarks the mode  
 is very close to the minimum, so using the minimum is even better.

 In speed measurements, all noise is additive (there's no noise that may  
 make a benchmark appear to run faster). There are also quite a few  
 outliers. Recording the average will include a fair amount of noise.

when using randomized algorithms. You already get that from the LRU table in the array append cache.
Apr 10 2012
prev sibling next sibling parent reply Caligo <iteronvexor gmail.com> writes:
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Hello,

 I finally found the time to complete std.benchmark. I got to a very simple
 API design, starting where I like it: one line of code.

 Andrei

I probably missed this somewhere, but what happens to `std.datetime : benchmark` ? Is it going to be deprecated ?
Apr 08 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:03 PM, Caligo wrote:
 On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>  wrote:
 Hello,

 I finally found the time to complete std.benchmark. I got to a very simple
 API design, starting where I like it: one line of code.

 Andrei

I probably missed this somewhere, but what happens to `std.datetime : benchmark` ? Is it going to be deprecated ?

Yes, I forgot to mention that. Part of the proposal is that std.datetime.StopWatch and std.datetime.benchmark are be deprecated. Andrei
Apr 08 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
Added to trello.

-Steve
Apr 09 2012
prev sibling next sibling parent reply "Francois Chabot" <francois.chabot.dev gmail.com> writes:
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this 
library, and yet they are relegated to second-class citizen: "Oh, 
I guess you can use this". Normally, I wouldn't be so picky, but 
this is a standard library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just 
a bonus. Any benchmarking part of a large project is bound to 
format the output itself (to log benchmark results against 
revisions in a database or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at 
first glance. Something along the lines of benchmarkModule() and 
benchmarkAllModules() would be more sensible.
Apr 09 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 10:23 AM, Francois Chabot wrote:
 Why is there so much emphasis on printBenchmarks()?

 benchmark() and runBenchmarks() are clearly the core of this library,
 and yet they are relegated to second-class citizen: "Oh, I guess you can
 use this". Normally, I wouldn't be so picky, but this is a standard
 library. Focus should be on functionality.

The functionality is to make benchmark easy to use, meaningful, and easy to interpret. I don't want to add a complicated library for postprocessing benchmarks because most nobody will use it. The first function in the documentation is what most people will want to bring themselves to using. The functions that provide the data are eminently available so I disagree with the "second-class citizen" characterization. You want to use them, use them. They don't need to be given rockstar billing. Andrei
Apr 09 2012
prev sibling parent reply Somedude <lovelydear mailmetrash.com> writes:
Le 09/04/2012 17:23, Francois Chabot a écrit :
 Why is there so much emphasis on printBenchmarks()?
 
 benchmark() and runBenchmarks() are clearly the core of this library,
 and yet they are relegated to second-class citizen: "Oh, I guess you can
 use this". Normally, I wouldn't be so picky, but this is a standard
 library. Focus should be on functionality.
 
 Providing formatted output is a nice bonus, but to me, it's just a
 bonus. Any benchmarking part of a large project is bound to format the
 output itself (to log benchmark results against revisions in a database
 or something like that).
 
 Also, benchmark() and runBenchmarks() are kind of confusing at first
 glance. Something along the lines of benchmarkModule() and
 benchmarkAllModules() would be more sensible.

The printBenchmark facility is cool and should be included imho. It helps benchmarking being as standard as unit testing. We don't want to have to write again and again the same boilerplate code for such trivial uses.
Apr 09 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:44 AM, Somedude wrote:
 It helps benchmarking being as standard as unit testing.
 We don't want to have to write again and again the same boilerplate code
 for such trivial uses.

Yes, I had unittest in mind when writing the library. If one needs more than one statement to get an informative benchmark off the ground, we failed. Andrei
Apr 09 2012
prev sibling next sibling parent reply Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
 Hello,
 
 
 I finally found the time to complete std.benchmark. I got to a very
 simple API design, starting where I like it: one line of code.
 
 Code is in the form of a pull request at
 https://github.com/D-Programming-Language/phobos/pull/529. (There's
 some noise in there caused by my git n00biness). Documentation is at
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
 
 If reasonable and at all possible, I'd like to bump the priority of
 this proposal. Clearly D's user base is highly interested in
 efficiency, and many of the upcoming libraries have efficiency a
 virtual part of their design. So we should get std.benchmark in soon
 and require that new addition come with benchmarks for their
 essential functionality.
 
 My vision is that in the future Phobos will have a significant
 benchmarks battery, which will help improving Phobos and porting to
 new platforms.

I come to think the same. How come that the times based relative report and the percentage based relative report are mixed in one result? And how do I choose which one I'd like to see in the output. When benchmarking you can measure different things at the same time. In this regard the current proposal is limited. It just measures wall clock time. I believe extending the StopWatch to measure e.g. user CPU time is a useful addition. In general, allowing user defined measurements would be great. E.g. to measure the time spend in user mode. () => { tms t; times(&t); return t.tms_utime; } Note, that this code does not need to be portable. You can also use version() else static assert. Things that come to mind that I'd like to measure. Time measurements: * User CPU time * System CPU time * Time spent in memory allocations Count measurements: * Memory usage * L1/L2/L3 cache misses * Number of executed instructions * Number of memory allocations Of course wall clock time is the ultimate measure when benchmarking. But often you need to investigate further (doing more measurements). Do you think adding this is worthwhile? All in all the user interface has been greatly simplified. Jens
Apr 10 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/10/12 5:40 AM, Jens Mueller wrote:
 How come that the times based relative report and the percentage based
 relative report are mixed in one result? And how do I choose which one
 I'd like to see in the output.

It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.
 When benchmarking you can measure different things at the same time. In
 this regard the current proposal is limited. It just measures wall clock
 time. I believe extending the StopWatch to measure e.g. user CPU time is
 a useful addition.

Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?
 In general, allowing user defined measurements would be great.
 E.g. to measure the time spend in user mode.
 () =>  {
           tms t;
           times(&t);
           return t.tms_utime;
        }

 Note, that this code does not need to be portable. You can also use
 version() else static assert.

 Things that come to mind that I'd like to measure.
 Time measurements:
    * User CPU time
    * System CPU time
    * Time spent in memory allocations
 Count measurements:
    * Memory usage
    * L1/L2/L3 cache misses
    * Number of executed instructions
    * Number of memory allocations

 Of course wall clock time is the ultimate measure when benchmarking.
 But often you need to investigate further (doing more measurements).

 Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime Andrei
Apr 14 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There have been quite a few good comments, but no review manager offer. 
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than 
later because it can be used by subsequent submissions (many of which 
allege efficiency as a major advantage) to show they improve over the 
state of the art.

Thanks,

Andrei
Apr 14 2012
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-15 04:58, Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager offer.
 Could someone please take this role?

 Again, it would be great to get std.benchmark in sooner rather than
 later because it can be used by subsequent submissions (many of which
 allege efficiency as a major advantage) to show they improve over the
 state of the art.

 Thanks,

 Andrei

As far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted. -- /Jacob Carlborg
Apr 15 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, April 15, 2012 13:41:28 Jacob Carlborg wrote:
 On 2012-04-15 04:58, Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager offer.
 Could someone please take this role?
 
 Again, it would be great to get std.benchmark in sooner rather than
 later because it can be used by subsequent submissions (many of which
 allege efficiency as a major advantage) to show they improve over the
 state of the art.
 
 Thanks,
 
 Andrei

As far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted.

There are, but none of them are being reviewed at the moment or being pushed for a review. And there _is_ an argument for std.benchmark being high priority based on how it could affect future reviews. We should probably start reviewing _something_ soon though. We haven't had very good momentum on that of late. - Jonathan M Davis
Apr 16 2012
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.net> writes:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
 Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review 
 manager
 offer. Could someone please take this role?

I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.

The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager -Lars
Apr 16 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.04.2012 1:00, Jens Mueller wrote:
 Lars T. Kyllingstad wrote:
 On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
 Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager
 offer. Could someone please take this role?

I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.

The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?

It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is. P.S. If you are not sure are up to it just say the word and I'll volunteer instead. We need to push this forward as the review queue is going to overflow real soon ;) -- Dmitry Olshansky
Apr 17 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
Apr 17 2012
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 18:55:07 H. S. Teoh wrote:
 On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.

[...] Let's just check it in since it's ready. No point in waiting.

Check it in? You mean start the review on it? I hope that you don't mean "check it in" as in commit it. It needs to be reviewed and voted on first. - Jonathan M Davis
Apr 17 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Lars T. Kyllingstad wrote:
 On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.

The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate? Jens
Apr 16 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Dmitry Olshansky wrote:
 On 17.04.2012 1:00, Jens Mueller wrote:
Lars T. Kyllingstad wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.

The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?

It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is.

Many thanks for your explanation.
 P.S. If you are not sure are up to it just say the word and I'll
 volunteer instead. We need to push this forward as the review queue
 is going to overflow real soon ;)

Andrei is already preparing. Review will start soon. Jens
Apr 17 2012
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Tuesday, 17 April 2012 at 21:15:31 UTC, Andrei Alexandrescu 
wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei

I agree, there is no good reason for the queue to be stopped like it is now, but since noone seems to step up for the other modules, let's not waste time and review your module. Hopefully, the other modules will be reviewed ASAP. We gain nothing in waiting.
Apr 17 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.

Let's just check it in since it's ready. No point in waiting. T -- Some days you win; most days you lose.
Apr 17 2012
prev sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 18 April 2012 at 09:44:12 UTC, Marco Leise wrote:
 Am Tue, 17 Apr 2012 16:15:43 -0500
 In a supermarket, there are three people in front of you in the 
 queue. Then they all decide they forgot something and disappear 
 in the back. You exchange some looks with the cashier and 
 decide that you'll be long finished before any of them come 
 back. Go ahead.

Well, we are not in a supermarket. It was correct to ask.
Apr 18 2012
prev sibling next sibling parent "Famous" <none none.net> writes:
Instead of the originally proposed layout

================================================================
Benchmark                               relative ns/iter  iter/s
================================================================
---[ module_one ]-----------------------------------------------
file write                                        140.2K    7.1K
file read                                 517.8%   27.1K   36.9K
array creation                             1.2Kx  116.0     8.6M
================================================================

I would like to see something more similar to the following:

================================================================
Benchmark                                Time    Performance
                                                 ---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
file read                                 27.10µ  36.90k   5.17
array creation                           116.00n   8.62M   1.21k
================================================================

Transition to the new layout can be achieved as follows:

         * Consider layout guidelines given in

            http://mirrors.ctan.org/info/german/tabsatz/tabsatz.pdf

         * Reorder columns considering their semantic and introduce
           groups.

         * Rename 'relative' to 'factor' and avoid usage of 'x' for
           'times'.

         * Divide column 'ns/iter' by a billion (see Vladimir's
           post).

         * Right align module name (abbreviate if it is too long).

         * Avoid '%' but rely on the prefixes specified in the SI:

http://en.wikipedia.org/w/index.php?title=International_System_of_Units&oldid=487057838#Units_and_prefixes

           Then, using a fixed width of 7 characters we have 3
           significant digits, at least:

            123450000         -> 123.45M
             12345000         ->  12.35M
              1234500         ->   1.23M
               123450         -> 123.45k
                12345         ->  12.35k
                 1234.5       ->   1.23k
                  123.45      -> 123.45
                   12.345     ->  12.35
                    1.2345    ->   1.23
                    0.12345   -> 123.45m
                    0.012345  ->  12.35m
                    0.0012345 ->   1.23m

Cheers,
Famous
Apr 15 2012
prev sibling next sibling parent "Famous" <none none.net> writes:
Maybe the layout is not destroyed, this time:

 ================================================================
 Benchmark                                Time    Performance
                                                  ---------------
 name                                     s/iter  iter/s  factor
 -----------------------[ module_one ]--- ------- ------- -------
 file write                               140.20µ   7.13k
 file read                                 27.10µ  36.90k   5.17
 array creation                           116.00n   8.62M   1.21k
 ================================================================

Apr 15 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
 On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.

I know. That wasn't my question. How do I choose between percentage vs. factors for a relative benchmark, e.g. 200% vs. 2?
When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
a useful addition.

Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?

I'm not a Windows user but GetProcessTimes seems to be the Windows equivalent. http://msdn.microsoft.com/en-us/library/windows/desktop/ms683223%28v=vs.85%29.aspx
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
          tms t;
          times(&t);
          return t.tms_utime;
       }

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
   * User CPU time
   * System CPU time
   * Time spent in memory allocations
Count measurements:
   * Memory usage
   * L1/L2/L3 cache misses
   * Number of executed instructions
   * Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime

I think that std.benchmark shouldn't do this. But I think we should figure out whether adding a user-defined measurement function is possible. Making sure that be we have a design that is flexible enough to capture different measurements. Jens
Apr 15 2012
prev sibling next sibling parent Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
 There have been quite a few good comments, but no review manager
 offer. Could someone please take this role?

I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers. Jens
Apr 15 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 16:15:43 Andrei Alexandrescu wrote:
 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.

I have no problem with it. None of the others is being pushed for review or promoted at present. I'd just as soon have the std.benchmark review get under way rather than sit around waiting for one of those earlier in the queue to get started. - Jonathan M Davis
Apr 17 2012
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 17 Apr 2012 16:15:43 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 4/17/12 2:53 PM, Jens Mueller wrote:
 Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei

In a supermarket, there are three people in front of you in the queue. Then they all decide they forgot something and disappear in the back. You exchange some looks with the cashier and decide that you'll be long finished before any of them come back. Go ahead. -- Marco
Apr 18 2012
prev sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
 Hello,


 I finally found the time to complete std.benchmark. I got to a 
 very simple API design, starting where I like it: one line of 
 code.

 Code is in the form of a pull request at 
 https://github.com/D-Programming-Language/phobos/pull/529. 
 (There's some noise in there caused by my git n00biness). 
 Documentation is at 
 http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

 If reasonable and at all possible, I'd like to bump the 
 priority of this proposal. Clearly D's user base is highly 
 interested in efficiency, and many of the upcoming libraries 
 have efficiency a virtual part of their design. So we should 
 get std.benchmark in soon and require that new addition come 
 with benchmarks for their essential functionality.

 My vision is that in the future Phobos will have a significant 
 benchmarks battery, which will help improving Phobos and 
 porting to new platforms.


 Andrei

Where have std.benchmark gone? I've written a benchmarking framework in C++ that I would like to port to D. If you've already implemented some cool stuff its better to reuse/merge it with mine. My C++ code has a bench of external deps (not included in this post). If you're interested I'll send you the rest. It's inspired by Haskell's powerful benchmarking framework(s) because it cant automatically figure out how to randomize various kinds of types including recursive instantiatinos of builtins, STL pairs, tuples and even containers using a randomization framework that uses a STL-container-type_trait (rand.hpp which is not included). I'm guessing implementing such a framework in D is an order of magnitude easier than in C++. Innovative things about timed.hpp is constant time benchmarking meaning that the loop will generate a larger and large instance of the structure until a specified amount of time has passed. I believe this i a more user-friendly way of benchmarking than most other frameworks do. And with D's supercool struct/class tupleof even more automation of instance generation is possible. Destroy! Code: /*! \file timed.hpp * \brief Benchmark Functions on Auto-Randomized Arguments. * \author Per Nordlöw (per.nordlow gmail.com) * \license Boost * * \see http://en.cppreference.com/w/cpp/thread/async * \see http://stackoverflow.com/questions/10059239/gccs-behaviour-with-stdasyncstdlaunchasync-vs-clangs-behaviour * \see http://stackoverflow.com/questions/8895350/quickcheck-like-template-function-benching-in-c * \see http://functionalcpp.wordpress.com/2013/08/03/generalized-function-evaluation/ * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html * \see https://limcore.wordpress.com/2008/04/13/8/ * \see http://www.randombit.net/bitbashing/programming/parallel_function_invocation_using_std_async.html * \todo Use const boost::posix_time::time_duration & dur = boost::posix_time::seconds(1) */ #pragma once #include <utility> #include <vector> #include <future> #include <chrono> #include <thread> #include <type_traits> #include "rand.hpp" #include "type_traits_x.hpp" namespace pnw { struct timed { typedef std::chrono::high_resolution_clock hrc; typedef std::chrono::milliseconds millis; typedef std::chrono::microseconds micros; public: timed(size_t max_length = 1024*256, millis total_duration_max = millis(1000), millis call_duration_max = millis(0), uint n_tries = 1) : m_max_length(max_length), m_tot_dur(total_duration_max), m_call_dur(call_duration_max), m_n_tries(n_tries) {} /*! Benchmark Function/Lambda/Functor \p f with non-void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<! std::is_void<decltype(f())>::value, // if non-void return std::pair<decltype(f()), typename hrc::duration> >::type { decltype(f()) result; auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { result = f(); } // call it \c m_n_tries times auto span = hrc::now() - tA; return std::make_pair(result, span); } /*! Benchmark Function/Lambda/Functor \p f with \c void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<std::is_void<decltype(f())>::value, // if void return typename hrc::duration>::type { auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { f(); } // call it \c m_n_tries times return hrc::now() - tA; // reverse order? } #if 0 /*! Call Function \p f after \p us microseconds. * \see http://stackoverflow.com/questions/10804683/delayed-function-call */ template<class Func, class... Args> inline std::future<typename std::result_of<Func(Args...)>::type> delayed_async(Func&& f, Args&&... args, micros us) { return std::async(std::launch::async, [f, us, args...] () { std::this_thread::sleep_for(us); return f(args...); // call } ); } #endif /*! Benchmark the \em Single Argument Function \p f using Randomized Instances of type \p Cont. * \tparam C Container Type. * \tparam B benchS (Times) Container. * \param[in] f Function. * \return vector of pairs of \c C-instance size and execute-time. * \todo Estimate Time Complexity Model: O(c), O(n), ..., O(n^p), O(log(n)) O(exp(n)) * and predict cost of next iteration to halt within time-limit. * \see http://stackoverflow.com/questions/7245235/syntax-of-c-templates-with-function-type-parameters * \see http://en.cppreference.com/w/cpp/chrono/steady_clock */ template<class C, class B = std::vector<std::pair<size_t, double> >

call_s(void (*f)(C&)) { const auto deadline = hrc::now() + m_tot_dur; B bm; // benchmarks C c; // test data for (size_t n = 1; n <= m_max_length; n *= 2) { // efficient way of adding more random values to a vector auto c2 = rand_n<C>(n); c.insert(begin(c), begin(c2), end(c2)); // \see http://stackoverflow.com/a/2551785/683710 auto tA = hrc::now(); // log start for (uint i = 0; i < m_n_tries; i++) { f(c); } // call it \c m_n_tries times auto span = hrc::now() - tA; if (m_call_dur.count() != 0 and span >= m_call_dur) { break; } // if function call took too long auto count = std::chrono::duration_cast<std::chrono::microseconds>(span).count(); bm.emplace_back(n, static_cast<double>(count)/1e6); // store if (hrc::now() + span >= deadline) { // if no time for yet another measure taking at least \c span break; } } return bm; } /*! Benchmark the \em Single Argument Function \p f using Randomized Instances of type \p Cont. * \see Previous overload of \c call_s */ template<class C, class B = std::vector<std::pair<size_t, double> > > typename std::enable_if<is_container<C>::value, B>::type call_s(void (*f)(const C&)) { return call_s(reinterpret_cast<void (*)(C&)>(f)); } /*! Benchmark Function \p f using Randomized Instances of type \p Cont. * \tparam C Container. * \tparam B benchS (Times) Container. * \param[in] f Function. * \return vector of pairs of \c C-instance size and execute-time. */ template<class C, class B = std::vector<std::pair<size_t, double> >

call(void (*f)(typename C::iterator, typename C::iterator)) { const auto deadline = hrc::now() + m_tot_dur; B bm; // benchmarks C c; // test data for (size_t n = 1; n <= m_max_length; n *= 2) { // efficient way of adding more random values to a vector auto c2 = rand_n<C>(n); c.insert(begin(c), begin(c2), end(c2)); // \see http://stackoverflow.com/a/2551785/683710 auto tA = hrc::now(); // log start for (uint i = 0; i < m_n_tries; i++) { f(c.begin(), c.end()); } // call it \c m_n_tries times auto span = hrc::now() - tA; if (m_call_dur.count() != 0 and span >= m_call_dur) { break; } // if function call took too long auto count = std::chrono::duration_cast<std::chrono::microseconds>(span).count(); bm.emplace_back(n, static_cast<double>(count)/1e6); // store if (hrc::now() + span >= deadline) { // if there's no time for more measures break; } } return bm; } public: size_t m_max_length = 1024*256; ///< Maximum container instance length of \c T. millis m_tot_dur = millis(1000); ///< Maximum time to wait for whole benchmarking to complete. millis m_call_dur = millis(0); ///< Maximum time to wait for a call \p f to complete. uint m_n_tries = 1; ///< Number of tries per size of randomizeda instance. std::launch m_launch_policy = std::launch::async | std::launch::deferred; ///< Thread launch policy. }; }
Sep 27 2013