## digitalmars.D - std.benchmark ready for review. Manager sought after

• Andrei Alexandrescu (16/16) Apr 07 2012 Hello,
• Somedude (3/26) Apr 08 2012 Like it. Would it be a good idea to add a column with an average memory
• Andrei Alexandrescu (6/8) Apr 08 2012 Interesting idea. I saw
• Marco Leise (5/15) Apr 08 2012 Oh please, I use these functions already and with the similarities among...
• Somedude (8/24) Apr 08 2012 I thought about monitoring as well.
• Dmitry Olshansky (9/11) Apr 08 2012 In general it's next to impossible and/or entirely OS-specific.
• Somedude (2/15) Apr 08 2012 That's what I thought, thank you.
• Piotr Szturmaj (7/13) Apr 08 2012 For algorithms that process sequences, it would be nice to have results
• Andrei Alexandrescu (4/18) Apr 08 2012 The framework aims at generality and simplicity. I'd rather have it
• Denis Shelomovskij (11/11) Apr 08 2012 Very good but minimum isn't a best guess. Personally I (and there will
• Andrei Alexandrescu (16/24) Apr 08 2012 I've analyzed this quite a bit at work and the average and median are
• Manfred Nowak (10/13) Apr 08 2012 On the contrary:
• Andrei Alexandrescu (20/32) Apr 08 2012 That sounds quite tenuous to me. How do you measure it, and what
• Manfred Nowak (28/47) Apr 09 2012 This is in doubt, because you yourself wrote "the machine itself has
• Andrei Alexandrescu (13/45) Apr 09 2012 Which is great, unless the program wants to measure the cache memory
• Francois Chabot (5/8) Apr 09 2012 I disagree. If a regression suddenly causes a function to become
• Andrei Alexandrescu (5/11) Apr 09 2012 But cache binding depends on a variety of cache characteristics, i.e.
• Manfred Nowak (23/31) Apr 09 2012 I did not mean to run two or more benchmarks in parallel but only
• Andrei Alexandrescu (9/30) Apr 14 2012 I doubt that's the case. Once we discuss running several instances in
• Denis Shelomovskij (21/44) Apr 09 2012 Yes of course. I mean than an algorithm itself should follow some
• Andrei Alexandrescu (10/17) Apr 09 2012 As I explained, the average takes noise and outliers (some very large,
• Denis Shelomovskij (8/18) Apr 10 2012 Looks like a misunderstanding because of my bad English: "Recording the
• Martin Nowak (4/10) Apr 10 2012 The benchmarked function itself could have a relevant variance, e.g.
• Dmitry Olshansky (16/24) Apr 08 2012 +1
• Andrei Alexandrescu (4/8) Apr 08 2012 This would be interesting to explore, could you give some more detail
• Dmitry Olshansky (9/16) Apr 08 2012 I thought to just throw it in. But if were to do it in under an hour:
• Caligo (4/8) Apr 08 2012 I probably missed this somewhere, but what happens to std.datetime :
• Andrei Alexandrescu (4/14) Apr 08 2012 Yes, I forgot to mention that. Part of the proposal is that
• Steven Schveighoffer (2/2) Apr 09 2012 Added to trello.
• Francois Chabot (12/12) Apr 09 2012 Why is there so much emphasis on printBenchmarks()?
• Andrei Alexandrescu (10/15) Apr 09 2012 The functionality is to make benchmark easy to use, meaningful, and easy...
• Somedude (5/20) Apr 09 2012 The printBenchmark facility is cool and should be included imho.
• Andrei Alexandrescu (5/8) Apr 09 2012 Yes, I had unittest in mind when writing the library. If one needs more
• Jens Mueller (33/54) Apr 10 2012 I come to think the same.
• Andrei Alexandrescu (16/45) Apr 14 2012 It's in the names. If the name of a benchmark starts with
• Jens Mueller (11/64) Apr 15 2012 I know. That wasn't my question. How do I choose between percentage vs.
• Andrei Alexandrescu (8/8) Apr 14 2012 There have been quite a few good comments, but no review manager offer.
• Jacob Carlborg (6/14) Apr 15 2012 As far as I know, there are already several other projects/modules in
• Jonathan M Davis (6/22) Apr 16 2012 There are, but none of them are being reviewed at the moment or being pu...
• Jens Mueller (6/8) Apr 15 2012 I will do this.
• Lars T. Kyllingstad (4/13) Apr 16 2012 The review process is based on Boost's:
• Jens Mueller (5/20) Apr 16 2012 The page is shorter than expected. Last time I checked I found something
• Dmitry Olshansky (24/43) Apr 17 2012 It's peer review followed by voting and that's all about it.
• Jens Mueller (4/53) Apr 17 2012 Andrei is already preparing. Review will start soon.
• Andrei Alexandrescu (6/7) Apr 17 2012 I'd like to gather a bit of confidence that it's okay with people that
• SomeDude (7/15) Apr 17 2012 I agree, there is no good reason for the queue to be stopped like
• Jonathan M Davis (6/12) Apr 17 2012 I have no problem with it. None of the others is being pushed for review...
• H. S. Teoh (6/12) Apr 17 2012 [...]
• Jonathan M Davis (4/15) Apr 17 2012 Check it in? You mean start the review on it? I hope that you don't mean...
• Marco Leise (5/15) Apr 18 2012 In a supermarket, there are three people in front of you in the queue. T...
• SomeDude (2/8) Apr 18 2012 Well, we are not in a supermarket. It was correct to ask.
• Famous (47/47) Apr 15 2012 Instead of the originally proposed layout
• =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (236/255) Sep 27 2013 Where have std.benchmark gone?
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this
proposal. Clearly D's user base is highly interested in efficiency, and
many of the upcoming libraries have efficiency a virtual part of their
design. So we should get std.benchmark in soon and require that new
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to new
platforms.

Andrei
Apr 07 2012
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a
very simple API design, starting where I like it: one line of
code.
Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)? 2) The meaning of "epoch" in this context is unfamiliar to me:
Measurement is done in epochs. For each function benchmarked,
the smallest time is taken over all epochs.
3) "benchmark_relative_file read" should be replaced with a language construct. E.g. a function call like relativeBenchmark("file read"), or an enum value like getopt's. 4) There is a TODO in benchmarkSuspend's example.
Apr 07 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/7/12 11:45 PM, Vladimir Panteleev wrote:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.
Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)?
Cool.
2) The meaning of "epoch" in this context is unfamiliar to me:

Measurement is done in epochs. For each function benchmarked, the
smallest time is taken over all epochs.
To me too :o).
3) "benchmark_relative_file read" should be replaced with a language
construct. E.g. a function call like relativeBenchmark("file read"), or
an enum value like getopt's.
No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.
4) There is a TODO in benchmarkSuspend's example.
Will destroy. Andrei
Apr 07 2012
On Sunday, 8 April 2012 at 05:41:11 UTC, Andrei Alexandrescu
wrote:
3) "benchmark_relative_file read" should be replaced with a
language
construct. E.g. a function call like relativeBenchmark("file
an enum value like getopt's.
No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.
Hmm, maybe there's a misunderstanding, but what I meant was: The "benchmark_relative_" prefix makes sense for function names (for scheduleForBenchmarking), but not so much for string literals for benchmark names. The string literal "benchmark_relative_file read" looks like the words "benchmark relative file" are grouped together, with "read" added on. So, my suggestion would be to wrap the "benchmark_relative_" prefix - when used with benchmark name strings - into a semantical function / enum / etc. In my example above, relativeBenchmark would be: string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; } I suppose it can be summed up as a tradeoff between complexity (you need to explain both the function name usage and the relativeBenchmark wrapper usage) vs. code prettiness.
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 2:02 AM, Vladimir Panteleev wrote:
The "benchmark_relative_" prefix makes sense for function names (for
scheduleForBenchmarking), but not so much for string literals for
benchmark names. The string literal "benchmark_relative_file read" looks
like the words "benchmark relative file" are grouped together, with
"benchmark_relative_" prefix - when used with benchmark name strings -
into a semantical function / enum / etc. In my example above,
relativeBenchmark would be:

string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; }

I suppose it can be summed up as a tradeoff between complexity (you need
to explain both the function name usage and the relativeBenchmark
wrapper usage) vs. code prettiness.
I understand, thanks. Andrei
Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 05:25, Andrei Alexandrescu a crit :
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this
proposal. Clearly D's user base is highly interested in efficiency, and
many of the upcoming libraries have efficiency a virtual part of their
design. So we should get std.benchmark in soon and require that new
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to new
platforms.

Andrei
Like it. Would it be a good idea to add a column with an average memory used ?
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?
Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Apr 08 2012
Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?
Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking. -- Marco
Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 18:21, Marco Leise a crit :
Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?
Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking.
I thought about monitoring as well. But then again, following your remark, monitoring may in fact be a completely different matter. This would need a little bit more thought I guess. Adding monitoring capabilities to processes would be interesting. But I guess we are already quite far from benchmarking now. Unless std.benchmark makes use of these monitoring capabilities.
Apr 08 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 12:16, Somedude wrote:
[snip]
Like it. Would it be a good idea to add a column with an average memory
used ?
In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;) -- Dmitry Olshansky
Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 17:48, Dmitry Olshansky a crit :
On 08.04.2012 12:16, Somedude wrote:
[snip]
Like it. Would it be a good idea to add a column with an average memory
used ?
In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;)
That's what I thought, thank you.
Apr 08 2012
Andrei Alexandrescu wrote:
I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 5:46 AM, Piotr Szturmaj wrote:
Andrei Alexandrescu wrote:
I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.
For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html
The framework aims at generality and simplicity. I'd rather have it simple and universal rather than subtle. Andrei
Apr 08 2012
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.

So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).

Yes, graph is needed.

--
Денис В. Шеломовский
Denis V. Shelomovskij
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.
I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise. Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarks as a mean to improve performance of the benchmarked code.
So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).
Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).
Yes, graph is needed.
I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark. Andrei
Apr 08 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

Clearly there is noise during normal use as well, but
incorporating it in benchmarks as a matter of course reduces the
usefulness of benchmarks
On the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise. 2) The noise the benchmarked program produces has to be measured too, because the running benchmarked program probably increases the noise for all other running programs. In addition: the noise produced by a machine under heavy load might bring the performance of the benchmarked program down to zero. -manfred
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:03 PM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:

Clearly there is noise during normal use as well, but
incorporating it in benchmarks as a matter of course reduces the
usefulness of benchmarks
On the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise.
That sounds quite tenuous to me. How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N where: A > 0 is actual benchmark time Q > 0 quantization noise (uniform distribution) N > 0 various other noises (interrupts, task switching, networking, CPU dynamically changing frequency, etc). Many people jump on Gaussian as an approximation, but my tests suggest it's hardly so because it has a lot of jerky outliers. How do we estimate A given T?
2) The noise the benchmarked program produces has to be measured too,
because the running benchmarked program probably increases the noise
for all other running programs.
How to measure that? Also, that noise does not need to be measured as much as eliminated to the extent possible. This is because the benchmark app noise is a poor model of the application-induced noise.
In addition: the noise produced by a machine under heavy load might
bring the performance of the benchmarked program down to zero.
Of course. That's why the documentation emphasizes the necessity of baselines. A measurement without baselines is irrelevant. Andrei
Apr 08 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

all noise is additive (there's no noise that may make a benchmark
appear to run faster)
This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet. b) needed data is stored in a hdd whose I/O scheduler uses the elevator algorithm and serves the request by pure chance instantly, because the position of the needed data is between two positions accessed by some programs not belonging to the benchmarked set. Especially a hdd, if used, will be responsible for a lot of noise you define as "quantization noise (uniform distribution)" even if the head stays at the same cylinder. Not recognizing this noise would only mean that the data is cached and interpreting the only true read from the hdd as a jerky outlier sems quite wrong.
1) The "noise during normal use" has to be measured in order to
detect the sensibility of the benchmarked program to that noise.
How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N
For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.
2) The noise the benchmarked program produces has to be measured
too, because the running benchmarked program probably increases
the noise for all other running programs.
How to measure that?
Similar to the above note.
Also, that noise does not need to be measured
as much as eliminated to the extent possible.
I wouldn't define two programs to be equivalent based on the time until completion only. That time might be identical for both programs, but if only one of the programs increases the answering time of the machine to inacceptability I would choose the other. -manfred
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 9:25 AM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:
all noise is additive (there's no noise that may make a benchmark
appear to run faster)
This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet.
Which is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)
b) needed data is stored in a hdd whose I/O scheduler uses the elevator
algorithm and serves the request by pure chance instantly, because the
position of the needed data is between two positions accessed by some
programs not belonging to the benchmarked set.

Especially a hdd, if used, will be responsible for a lot of noise you
define as "quantization noise (uniform distribution)" even if the head
stays at the same cylinder. Not recognizing this noise would only mean
that the data is cached and interpreting the only true read from the
hdd as a jerky outlier sems quite wrong.
If the goal is to measure the seek time of the HDD, the benchmark itself should make sure the HDD cache is cleared. (What I recall they do on Linux is unmounting and remounting the drive.) Otherwise, it adds a useless component to the timing.
1) The "noise during normal use" has to be measured in order to
detect the sensibility of the benchmarked program to that noise.
How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N
For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.
I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc). I don't understand the part of the sentence starting with "...use the thereby...", I'd be grateful if you elaborated. Andrei
Apr 09 2012
"Francois Chabot" <francois.chabot.dev gmail.com> writes:
Which is great, unless the program wants to measure the cache
memory itself, in which case it would use special assembler
instructions or large memset()s. (We do such at Facebook.)
I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function. -- Francois
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:29 AM, Francois Chabot wrote:
Which is great, unless the program wants to measure the cache memory
itself, in which case it would use special assembler instructions or
large memset()s. (We do such at Facebook.)
I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function.
But cache binding depends on a variety of cache characteristics, i.e. the machine. The question is whether we accept a heavy dependence of the benchmark on the machine. Andrei
Apr 09 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

For example by running more than one instance of the benchmarked
program in paralell and use the thereby gathered statistical
routines to spread T into the additiv components A, Q and N.
I disagree with running two benchmarks in parallel because that
exposes them to even more noise (scheduling, CPU count, current
I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else. Now let T(n) be the time needed to run n instances in parallel on a single processor. According to your definition and your remark above: T(n) = n*A + Q + N + P(n) where P(n)>=0 and P(1)==0 is the additional noise with which the benchmarked program disturbs itself. Please observe that Q and N eliminate themselves on subtraction: T(2) - T(1) = A + P(2) - P(1) T(3) - T(2) = A + P(3) - P(2) ... P(n+1)-P(n) measures the sensitivity of the benchmarked program to its one noise. As you wrote yourself its value should be close to zero until the machine is close to falling flat.
I don't understand the part of the sentence starting with "...use
the thereby...", I'd be grateful if you elaborated.
Ohh..., an unrecognized deletion. I have written: | use the thereby gathered _data to feed_ statistical routines to | spread T as elaborated above. -manfred
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(Resuming a few past discussions about std.benchmark.)

On 4/9/12 1:26 PM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:
I disagree with running two benchmarks in parallel because that
exposes them to even more noise (scheduling, CPU count, current
I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else.
I doubt that's the case. Once we discuss running several instances in parallel, I predict the vagaries of scheduling become much more prevalent.
Now let T(n) be the time needed to run n instances in parallel on a

T(n) = n*A + Q + N + P(n)

where P(n)>=0 and P(1)==0 is the additional noise with which the
benchmarked program disturbs itself.

Please observe that Q and N eliminate themselves on subtraction:
T(2) - T(1) = A + P(2) - P(1)
T(3) - T(2) = A + P(3) - P(2)
...

P(n+1)-P(n) measures the sensitivity of the benchmarked program to
its one noise. As you wrote yourself its value should be close to
zero until the machine is close to falling flat.
It would be great if that applied, but I'm afraid things get significantly more complex. For example, the noise will have a per-processor component, and the processes will compete for cache among themselves. Andrei
Apr 14 2012
08.04.2012 21:31, Andrei Alexandrescu пишет:
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.
I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise.
Yes of course. I mean than an algorithm itself should follow some restrictions for such measurement. It should run with the same speed every time. Many algorithms follows this convention, but not all. Why will recording the average produce so much noise? As I see, floating point arithmetic is now used without a strong reason so it looks like a time of this part isn't valuable. Or is it just a temporary solution? Anyway it should be configurable using a CT parameter, so it will not abuse one who doesn't need it.
Clearly there is noise during normal use as well, but incorporating it
in benchmarks as a matter of course reduces the usefulness of benchmarks
as a mean to improve performance of the benchmarked code.
A graph is needed exactly because of that. Without a graph it really gives very little.
So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).
Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).
So a graph is needed.
Yes, graph is needed.
I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark.
Sorry, meant a histogram, not a curve. A histogram can be shown in a console very well. And it is needed because its the easiest way to show benchmarked program behaviour (and noise behaviour). It also requires only about 80 integers to store information and shouldn't produce much noise. IMHO, a histogram gives lots of information and will be a good addition. -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
Why will recording the average produce so much noise?
As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.
As I see, floating
point arithmetic is now used without a strong reason so it looks like a
time of this part isn't valuable. Or is it just a temporary solution?
I don't understand "time of this part".
Anyway it should be configurable using a CT parameter, so it will not
abuse one who doesn't need it.
We'd like the framework to do the right thing, and the average does not seem to be it.
IMHO, a histogram gives lots of information and will be a good addition.
I disagree. Andrei
Apr 09 2012
09.04.2012 17:26, Andrei Alexandrescu пишет:
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
Why will recording the average produce so much noise?
As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.
As I see, floating
point arithmetic is now used without a strong reason so it looks like a
time of this part isn't valuable. Or is it just a temporary solution?
I don't understand "time of this part".
Looks like a misunderstanding because of my bad English: "Recording the average will include a fair amount of noise" means for me that a process of "recording the average" produces "a fair amount of noise" itself, not that "a result includes noise". -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 10 2012
"Martin Nowak" <dawg dawgfoto.de> writes:
I've analyzed this quite a bit at work and the average and median are
not very informative. You need the mode, but in most benchmarks the mode
is very close to the minimum, so using the minimum is even better.
How is it better?
In speed measurements, all noise is additive (there's no noise that may
make a benchmark appear to run faster). There are also quite a few
outliers. Recording the average will include a fair amount of noise.
The benchmarked function itself could have a relevant variance, e.g. when using randomized algorithms. You already get that from the LRU table in the array append cache.
Apr 10 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 20:59, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.
Prime example is networking.
So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation,
+1 graph (yes, good old ASCII art).
Yes, graph is needed.
No not ASCII art. It's too coarse grained and the only good thing it can do is look nicely in terminal (or give an overall picture but it's limited to say ~25 blocks in a histogram). What would be nice to have an option to output bare-bones info so that one can easily adapt it and feed the result to e.g. gnuplot. I think csv or it's version with tabs is good enough in this case. Another cool addition IMHO would be parametric benchmarks, so there is a function and a set of parameters (one parameter is fine too) to benchmark on. It makes that much more sense with graphs as algorithm profile plotted for various inputs (sizes) can be very revealing. -- Dmitry Olshansky
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
Another cool addition IMHO would be parametric benchmarks, so there is a
function and a set of parameters (one parameter is fine too) to
benchmark on. It makes that much more sense with graphs as algorithm
profile plotted for various inputs (sizes) can be very revealing.
This would be interesting to explore, could you give some more detail please? Andrei
Apr 08 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 21:40, Andrei Alexandrescu wrote:
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
Another cool addition IMHO would be parametric benchmarks, so there is a
function and a set of parameters (one parameter is fine too) to
benchmark on. It makes that much more sense with graphs as algorithm
profile plotted for various inputs (sizes) can be very revealing.
This would be interesting to explore, could you give some more detail please?
I thought to just throw it in. But if were to do it in under an hour: benchmark_func_blah(int problem_size); auto benchmark_params_func_blah = [ 1, 10, 100, 1_000, 10_000 ...]; or slightly more general - use the range protocol: auto benchmark_params_func_blah = getRangeOfValues(); I'm not sure the prefix is place to put it but here the rough idea. -- Dmitry Olshansky
Apr 08 2012
Caligo <iteronvexor gmail.com> writes:
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very simple
API design, starting where I like it: one line of code.

Andrei
I probably missed this somewhere, but what happens to std.datetime : benchmark ? Is it going to be deprecated ?
Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:03 PM, Caligo wrote:
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>  wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very simple
API design, starting where I like it: one line of code.

Andrei
I probably missed this somewhere, but what happens to std.datetime : benchmark` ? Is it going to be deprecated ?
Yes, I forgot to mention that. Part of the proposal is that std.datetime.StopWatch and std.datetime.benchmark are be deprecated. Andrei
Apr 08 2012
"Steven Schveighoffer" <schveiguy yahoo.com> writes:

-Steve
Apr 09 2012
"Francois Chabot" <francois.chabot.dev gmail.com> writes:
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this
library, and yet they are relegated to second-class citizen: "Oh,
I guess you can use this". Normally, I wouldn't be so picky, but
this is a standard library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just
a bonus. Any benchmarking part of a large project is bound to
format the output itself (to log benchmark results against
revisions in a database or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at
first glance. Something along the lines of benchmarkModule() and
benchmarkAllModules() would be more sensible.
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 10:23 AM, Francois Chabot wrote:
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this library,
and yet they are relegated to second-class citizen: "Oh, I guess you can
use this". Normally, I wouldn't be so picky, but this is a standard
library. Focus should be on functionality.
The functionality is to make benchmark easy to use, meaningful, and easy to interpret. I don't want to add a complicated library for postprocessing benchmarks because most nobody will use it. The first function in the documentation is what most people will want to bring themselves to using. The functions that provide the data are eminently available so I disagree with the "second-class citizen" characterization. You want to use them, use them. They don't need to be given rockstar billing. Andrei
Apr 09 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 09/04/2012 17:23, Francois Chabot a écrit :
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this library,
and yet they are relegated to second-class citizen: "Oh, I guess you can
use this". Normally, I wouldn't be so picky, but this is a standard
library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just a
bonus. Any benchmarking part of a large project is bound to format the
output itself (to log benchmark results against revisions in a database
or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at first
glance. Something along the lines of benchmarkModule() and
benchmarkAllModules() would be more sensible.
The printBenchmark facility is cool and should be included imho. It helps benchmarking being as standard as unit testing. We don't want to have to write again and again the same boilerplate code for such trivial uses.
Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:44 AM, Somedude wrote:
It helps benchmarking being as standard as unit testing.
We don't want to have to write again and again the same boilerplate code
for such trivial uses.
Yes, I had unittest in mind when writing the library. If one needs more than one statement to get an informative benchmark off the ground, we failed. Andrei
Apr 09 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's
some noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of
this proposal. Clearly D's user base is highly interested in
efficiency, and many of the upcoming libraries have efficiency a
virtual part of their design. So we should get std.benchmark in soon
and require that new addition come with benchmarks for their
essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to
new platforms.
I come to think the same. How come that the times based relative report and the percentage based relative report are mixed in one result? And how do I choose which one I'd like to see in the output. When benchmarking you can measure different things at the same time. In this regard the current proposal is limited. It just measures wall clock time. I believe extending the StopWatch to measure e.g. user CPU time is a useful addition. In general, allowing user defined measurements would be great. E.g. to measure the time spend in user mode. () => { tms t; times(&t); return t.tms_utime; } Note, that this code does not need to be portable. You can also use version() else static assert. Things that come to mind that I'd like to measure. Time measurements: * User CPU time * System CPU time * Time spent in memory allocations Count measurements: * Memory usage * L1/L2/L3 cache misses * Number of executed instructions * Number of memory allocations Of course wall clock time is the ultimate measure when benchmarking. But often you need to investigate further (doing more measurements). Do you think adding this is worthwhile? All in all the user interface has been greatly simplified. Jens
Apr 10 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.
It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.
When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
tms t;
times(&t);
return t.tms_utime;
}

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
* User CPU time
* System CPU time
* Time spent in memory allocations
Count measurements:
* Memory usage
* L1/L2/L3 cache misses
* Number of executed instructions
* Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?
Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime Andrei
Apr 14 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.
It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.
I know. That wasn't my question. How do I choose between percentage vs. factors for a relative benchmark, e.g. 200% vs. 2?
When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?
I'm not a Windows user but GetProcessTimes seems to be the Windows equivalent. http://msdn.microsoft.com/en-us/library/windows/desktop/ms683223%28v=vs.85%29.aspx
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
tms t;
times(&t);
return t.tms_utime;
}

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
* User CPU time
* System CPU time
* Time spent in memory allocations
Count measurements:
* Memory usage
* L1/L2/L3 cache misses
* Number of executed instructions
* Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?
Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime
I think that std.benchmark shouldn't do this. But I think we should figure out whether adding a user-defined measurement function is possible. Making sure that be we have a design that is flexible enough to capture different measurements. Jens
Apr 15 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei
Apr 14 2012
Jacob Carlborg <doob me.com> writes:
On 2012-04-15 04:58, Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei
As far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted. -- /Jacob Carlborg
Apr 15 2012
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, April 15, 2012 13:41:28 Jacob Carlborg wrote:
On 2012-04-15 04:58, Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei
As far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted.
There are, but none of them are being reviewed at the moment or being pushed for a review. And there _is_ an argument for std.benchmark being high priority based on how it could affect future reviews. We should probably start reviewing _something_ soon though. We haven't had very good momentum on that of late. - Jonathan M Davis
Apr 16 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?
I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers. Jens
Apr 15 2012
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review
manager
offer. Could someone please take this role?
I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager -Lars
Apr 16 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?
I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager
The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate? Jens
Apr 16 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.04.2012 1:00, Jens Mueller wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?
I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager
The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?
It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is. P.S. If you are not sure are up to it just say the word and I'll volunteer instead. We need to push this forward as the review queue is going to overflow real soon ;) -- Dmitry Olshansky
Apr 17 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Dmitry Olshansky wrote:
On 17.04.2012 1:00, Jens Mueller wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?
I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager
The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?
It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is.
P.S. If you are not sure are up to it just say the word and I'll
volunteer instead. We need to push this forward as the review queue
is going to overflow real soon ;)
Andrei is already preparing. Review will start soon. Jens
Apr 17 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
Apr 17 2012
"SomeDude" <lovelydear mailmetrash.com> writes:
On Tuesday, 17 April 2012 at 21:15:31 UTC, Andrei Alexandrescu
wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
I agree, there is no good reason for the queue to be stopped like it is now, but since noone seems to step up for the other modules, let's not waste time and review your module. Hopefully, the other modules will be reviewed ASAP. We gain nothing in waiting.
Apr 17 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 16:15:43 Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
I have no problem with it. None of the others is being pushed for review or promoted at present. I'd just as soon have the std.benchmark review get under way rather than sit around waiting for one of those earlier in the queue to get started. - Jonathan M Davis
Apr 17 2012
"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
[...] Let's just check it in since it's ready. No point in waiting. T -- Some days you win; most days you lose.
Apr 17 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 18:55:07 H. S. Teoh wrote:
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
[...] Let's just check it in since it's ready. No point in waiting.
Check it in? You mean start the review on it? I hope that you don't mean "check it in" as in commit it. It needs to be reviewed and voted on first. - Jonathan M Davis
Apr 17 2012
Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 17 Apr 2012 16:15:43 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.
I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
In a supermarket, there are three people in front of you in the queue. Then they all decide they forgot something and disappear in the back. You exchange some looks with the cashier and decide that you'll be long finished before any of them come back. Go ahead. -- Marco
Apr 18 2012
"SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 18 April 2012 at 09:44:12 UTC, Marco Leise wrote:
Am Tue, 17 Apr 2012 16:15:43 -0500
In a supermarket, there are three people in front of you in the
queue. Then they all decide they forgot something and disappear
in the back. You exchange some looks with the cashier and
decide that you'll be long finished before any of them come
Well, we are not in a supermarket. It was correct to ask.
Apr 18 2012
"Famous" <none none.net> writes:
Instead of the originally proposed layout

================================================================
Benchmark                               relative ns/iter  iter/s
================================================================
---[ module_one ]-----------------------------------------------
file write                                        140.2K    7.1K
array creation                             1.2Kx  116.0     8.6M
================================================================

I would like to see something more similar to the following:

================================================================
Benchmark                                Time    Performance
---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
array creation                           116.00n   8.62M   1.21k
================================================================

Transition to the new layout can be achieved as follows:

* Consider layout guidelines given in

http://mirrors.ctan.org/info/german/tabsatz/tabsatz.pdf

* Reorder columns considering their semantic and introduce
groups.

* Rename 'relative' to 'factor' and avoid usage of 'x' for
'times'.

* Divide column 'ns/iter' by a billion (see Vladimir's
post).

* Right align module name (abbreviate if it is too long).

* Avoid '%' but rely on the prefixes specified in the SI:

http://en.wikipedia.org/w/index.php?title=International_System_of_Units&oldid=487057838#Units_and_prefixes

Then, using a fixed width of 7 characters we have 3
significant digits, at least:

123450000         -> 123.45M
12345000         ->  12.35M
1234500         ->   1.23M
123450         -> 123.45k
12345         ->  12.35k
1234.5       ->   1.23k
123.45      -> 123.45
12.345     ->  12.35
1.2345    ->   1.23
0.12345   -> 123.45m
0.012345  ->  12.35m
0.0012345 ->   1.23m

Cheers,
Famous
Apr 15 2012
"Famous" <none none.net> writes:
Maybe the layout is not destroyed, this time:

================================================================
Benchmark                                Time    Performance
---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
array creation                           116.00n   8.62M   1.21k
================================================================
Apr 15 2012
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a
very simple API design, starting where I like it: one line of
code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529.
(There's some noise in there caused by my git n00biness).
Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the
priority of this proposal. Clearly D's user base is highly
interested in efficiency, and many of the upcoming libraries
have efficiency a virtual part of their design. So we should
get std.benchmark in soon and require that new addition come
with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and
porting to new platforms.

Andrei
Where have std.benchmark gone? I've written a benchmarking framework in C++ that I would like to port to D. If you've already implemented some cool stuff its better to reuse/merge it with mine. My C++ code has a bench of external deps (not included in this post). If you're interested I'll send you the rest. It's inspired by Haskell's powerful benchmarking framework(s) because it cant automatically figure out how to randomize various kinds of types including recursive instantiatinos of builtins, STL pairs, tuples and even containers using a randomization framework that uses a STL-container-type_trait (rand.hpp which is not included). I'm guessing implementing such a framework in D is an order of magnitude easier than in C++. Innovative things about timed.hpp is constant time benchmarking meaning that the loop will generate a larger and large instance of the structure until a specified amount of time has passed. I believe this i a more user-friendly way of benchmarking than most other frameworks do. And with D's supercool struct/class tupleof even more automation of instance generation is possible. Destroy! Code: /*! \file timed.hpp * \brief Benchmark Functions on Auto-Randomized Arguments. * \author Per Nordlöw (per.nordlow gmail.com) * \license Boost * * \see http://en.cppreference.com/w/cpp/thread/async * \see http://stackoverflow.com/questions/10059239/gccs-behaviour-with-stdasyncstdlaunchasync-vs-clangs-behaviour * \see http://stackoverflow.com/questions/8895350/quickcheck-like-template-function-benching-in-c * \see http://functionalcpp.wordpress.com/2013/08/03/generalized-function-evaluation/ * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html * \see https://limcore.wordpress.com/2008/04/13/8/ * \see http://www.randombit.net/bitbashing/programming/parallel_function_invocation_using_std_async.html * \todo Use const boost::posix_time::time_duration & dur = boost::posix_time::seconds(1) */ #pragma once #include <utility> #include <vector> #include <future> #include <chrono> #include <thread> #include <type_traits> #include "rand.hpp" #include "type_traits_x.hpp" namespace pnw { struct timed { typedef std::chrono::high_resolution_clock hrc; typedef std::chrono::milliseconds millis; typedef std::chrono::microseconds micros; public: timed(size_t max_length = 1024*256, millis total_duration_max = millis(1000), millis call_duration_max = millis(0), uint n_tries = 1) : m_max_length(max_length), m_tot_dur(total_duration_max), m_call_dur(call_duration_max), m_n_tries(n_tries) {} /*! Benchmark Function/Lambda/Functor \p f with non-void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<! std::is_void<decltype(f())>::value, // if non-void return std::pair<decltype(f()), typename hrc::duration> >::type { decltype(f()) result; auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { result = f(); } // call it \c m_n_tries times auto span = hrc::now() - tA; return std::make_pair(result, span); } /*! Benchmark Function/Lambda/Functor \p f with \c void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<std::is_void<decltype(f())>::value, // if void return typename hrc::duration>::type { auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { f(); } // call it \c m_n_tries times return hrc::now() - tA; // reverse order? } #if 0 /*! Call Function \p f after \p us microseconds. * \see http://stackoverflow.com/questions/10804683/delayed-function-call */ template<class Func, class... Args> inline std::future<typename std::result_of<Func(Args...)>::type> delayed_async(Func&& f, Args&&... args, micros us) { return std::async(std::launch::async, [f, us, args...] () { std::this_thread::sleep_for(us); return f(args...); // call } ); } #endif /*! Benchmark the \em Single Argument Function \p f using Randomized Instances of type \p Cont. * \tparam C Container Type. * \tparam B benchS (Times) Container. * \param[in] f Function. * \return vector of pairs of \c C-instance size and execute-time. * \todo Estimate Time Complexity Model: O(c), O(n), ..., O(n^p), O(log(n)) O(exp(n)) * and predict cost of next iteration to halt within time-limit. * \see http://stackoverflow.com/questions/7245235/syntax-of-c-templates-with-function-type-parameters * \see http://en.cppreference.com/w/cpp/chrono/steady_clock */ template<class C, class B = std::vector<std::pair<size_t, double> >

typename std::enable_if<is_container<C>::value, B>::type
call_s(void (*f)(C&)) {
const auto deadline = hrc::now() + m_tot_dur;
B bm;                    // benchmarks
C c;                     // test data
for (size_t n = 1; n <= m_max_length; n *= 2) {
// efficient way of adding more random values to a
vector
auto c2 = rand_n<C>(n); c.insert(begin(c),
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
auto tA = hrc::now(); // log start
for (uint i = 0; i < m_n_tries; i++) { f(c); } //
call it \c m_n_tries times
auto span = hrc::now() - tA;
if (m_call_dur.count() != 0 and
span >= m_call_dur) { break; } // if function
call took too long
auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
if (hrc::now() + span >= deadline) { // if no time
for yet another measure taking at least \c span
break;
}
}
return bm;
}

/*! Benchmark the \em Single Argument Function \p f using
Randomized Instances of type \p Cont.
* \see Previous overload of \c call_s
*/
template<class C, class B = std::vector<std::pair<size_t,
double> > >
typename std::enable_if<is_container<C>::value, B>::type
call_s(void (*f)(const C&)) {
return call_s(reinterpret_cast<void (*)(C&)>(f));
}

/*! Benchmark Function \p f using Randomized Instances of
type \p Cont.
* \tparam C Container.
* \tparam B benchS (Times) Container.
* \param[in] f Function.
* \return vector of pairs of \c C-instance size and
execute-time.
*/
template<class C, class B = std::vector<std::pair<size_t,
double> >

typename std::enable_if<is_container<C>::value, B>::type
call(void (*f)(typename C::iterator,
typename C::iterator))
{
const auto deadline = hrc::now() + m_tot_dur;
B bm;                    // benchmarks
C c;                     // test data
for (size_t n = 1; n <= m_max_length; n *= 2) {
// efficient way of adding more random values to a
vector
auto c2 = rand_n<C>(n); c.insert(begin(c),
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
auto tA = hrc::now(); // log start
for (uint i = 0; i < m_n_tries; i++) { f(c.begin(),
c.end()); } // call it \c m_n_tries times
auto span = hrc::now() - tA;
if (m_call_dur.count() != 0 and
span >= m_call_dur) { break; } // if function
call took too long
auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
if (hrc::now() + span >= deadline) { // if there's
no
time for more measures
break;
}
}
return bm;
}

public:
size_t m_max_length = 1024*256; ///< Maximum container
instance length of \c T.
millis m_tot_dur = millis(1000); ///< Maximum time to wait
for whole benchmarking to complete.
millis m_call_dur = millis(0); ///< Maximum time to wait for
a call \p f to complete.
uint m_n_tries = 1;    ///< Number of tries per size of
randomizeda instance.
std::launch m_launch_policy = std::launch::async |