## digitalmars.D - std.benchmark ready for review. Manager sought after

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this
proposal. Clearly D's user base is highly interested in efficiency, and
many of the upcoming libraries have efficiency a virtual part of their
design. So we should get std.benchmark in soon and require that new
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to new
platforms.

Andrei

Apr 07 2012
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a
very simple API design, starting where I like it: one line of
code.

1) Is it possible to do something about the "kilonanoseconds",
and print the two-letter SI time abbreviation instead
(μs/ns/ms/s/...)?

2) The meaning of "epoch" in this context is unfamiliar to me:

Measurement is done in epochs. For each function benchmarked,
the smallest time is taken over all epochs.

3) "benchmark_relative_file read" should be replaced with a
language construct. E.g. a function call like
relativeBenchmark("file read"), or an enum value like getopt's.

4) There is a TODO in benchmarkSuspend's example.

Apr 07 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/7/12 11:45 PM, Vladimir Panteleev wrote:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

1) Is it possible to do something about the "kilonanoseconds", and print
the two-letter SI time abbreviation instead (μs/ns/ms/s/...)?

Cool.

2) The meaning of "epoch" in this context is unfamiliar to me:

Measurement is done in epochs. For each function benchmarked, the
smallest time is taken over all epochs.

To me too :o).

3) "benchmark_relative_file read" should be replaced with a language
construct. E.g. a function call like relativeBenchmark("file read"), or
an enum value like getopt's.

No can do. Need a function name-based convention so we can automate
scheduleForBenchmarking.

4) There is a TODO in benchmarkSuspend's example.

Will destroy.

Andrei

Apr 07 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 2:02 AM, Vladimir Panteleev wrote:
The "benchmark_relative_" prefix makes sense for function names (for
scheduleForBenchmarking), but not so much for string literals for
benchmark names. The string literal "benchmark_relative_file read" looks
like the words "benchmark relative file" are grouped together, with
"benchmark_relative_" prefix - when used with benchmark name strings -
into a semantical function / enum / etc. In my example above,
relativeBenchmark would be:

string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; }

I suppose it can be summed up as a tradeoff between complexity (you need
to explain both the function name usage and the relativeBenchmark
wrapper usage) vs. code prettiness.

I understand, thanks.

Andrei

Apr 08 2012
On Sunday, 8 April 2012 at 05:41:11 UTC, Andrei Alexandrescu
wrote:
3) "benchmark_relative_file read" should be replaced with a
language
construct. E.g. a function call like relativeBenchmark("file
an enum value like getopt's.

No can do. Need a function name-based convention so we can
automate scheduleForBenchmarking.

Hmm, maybe there's a misunderstanding, but what I meant was:

The "benchmark_relative_" prefix makes sense for function names
(for scheduleForBenchmarking), but not so much for string
literals for benchmark names. The string literal
"benchmark_relative_file read" looks like the words "benchmark
suggestion would be to wrap the "benchmark_relative_" prefix -
when used with benchmark name strings - into a semantical
function / enum / etc. In my example above, relativeBenchmark
would be:

string relativeBenchmark(string s) { return "benchmark_relative_"
~ s; }

I suppose it can be summed up as a tradeoff between complexity
(you need to explain both the function name usage and the
relativeBenchmark wrapper usage) vs. code prettiness.

Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 05:25, Andrei Alexandrescu a crit :
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of this
proposal. Clearly D's user base is highly interested in efficiency, and
many of the upcoming libraries have efficiency a virtual part of their
design. So we should get std.benchmark in soon and require that new
addition come with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to new
platforms.

Andrei

Like it. Would it be a good idea to add a column with an average memory
used ?

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?

Interesting idea. I saw
http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows
and it looks like it's not an easy problem. Should we make this part of
the initial release?

Andrei

Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 18:21, Marco Leise a crit :
Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?

Interesting idea. I saw
http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows
and it looks like it's not an easy problem. Should we make this part of
the initial release?

Andrei

Oh please, I use these functions already and with the similarities amongst
operating systems they should be somewhere in Phobos! My thought was more in
the line of std.procmgmt/std.process though. Since they are process management
functions useful for server monitoring as well, not just benchmarking.

I thought about monitoring as well.
But then again, following your remark, monitoring may in fact be a
completely different matter.
This would need a little bit more thought I guess. Adding monitoring
capabilities to processes would be interesting. But I guess we are
already quite far from benchmarking now.
Unless std.benchmark makes use of these monitoring capabilities.

Apr 08 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 12:16, Somedude wrote:
[snip]
Like it. Would it be a good idea to add a column with an average memory
used ?

In general it's next to impossible and/or entirely OS-specific.
What can be done I think is adding a query function to GC interface that
returns amount of RAM currently allocated on it's heap.

So adding GC heap usage can be done albeit it's not really a stable
metric. This way one gets a nice hint on why something is slow ;)

--
Dmitry Olshansky

Apr 08 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 17:48, Dmitry Olshansky a crit :
On 08.04.2012 12:16, Somedude wrote:
[snip]
Like it. Would it be a good idea to add a column with an average memory
used ?

In general it's next to impossible and/or entirely OS-specific.
What can be done I think is adding a query function to GC interface that
returns amount of RAM currently allocated on it's heap.

So adding GC heap usage can be done albeit it's not really a stable
metric. This way one gets a nice hint on why something is slow ;)

That's what I thought, thank you.

Apr 08 2012
Andrei Alexandrescu wrote:
I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

For algorithms that process sequences, it would be nice to have results
represented in cycles per item, This should give more consistent results
across different CPU familes and different clock speeds.

Specifically, I think about cycles per byte, see
http://en.wikipedia.org/wiki/Cycles_per_byte
Example: http://www.cryptopp.com/benchmarks.html

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 5:46 AM, Piotr Szturmaj wrote:
Andrei Alexandrescu wrote:
I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's some
noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

For algorithms that process sequences, it would be nice to have results
represented in cycles per item, This should give more consistent results
across different CPU familes and different clock speeds.

Specifically, I think about cycles per byte, see
http://en.wikipedia.org/wiki/Cycles_per_byte
Example: http://www.cryptopp.com/benchmarks.html

The framework aims at generality and simplicity. I'd rather have it
simple and universal rather than subtle.

Andrei

Apr 08 2012
Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 08 Apr 2012 09:35:14 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/8/12 3:16 AM, Somedude wrote:
Like it. Would it be a good idea to add a column with an average memory
used ?

Interesting idea. I saw
http://stackoverflow.com/questions/1674652/c-c-memory-usage-
pi-in-linux-windows
and it looks like it's not an easy problem. Should we make this part of
the initial release?

Andrei

Oh please, I use these functions already and with the similarities amongst
operating systems they should be somewhere in Phobos! My thought was more in
the line of std.procmgmt/std.process though. Since they are process management
functions useful for server monitoring as well, not just benchmarking.

--
Marco

Apr 08 2012
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.

So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).

Yes, graph is needed.

--
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.

I've analyzed this quite a bit at work and the average and median are
not very informative. You need the mode, but in most benchmarks the mode
is very close to the minimum, so using the minimum is even better.

In speed measurements, all noise is additive (there's no noise that may
make a benchmark appear to run faster). There are also quite a few
outliers. Recording the average will include a fair amount of noise.

Clearly there is noise during normal use as well, but incorporating it
in benchmarks as a matter of course reduces the usefulness of benchmarks
as a mean to improve performance of the benchmarked code.

So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).

Standard deviation is also not very useful because it includes all
outliers (some of which are very far away from the mode).

Yes, graph is needed.

I am not sure about that. We may provide the raw measurement data for
programs that want to plot things, but plotting is beyond the charter of
std.benchmark.

Andrei

Apr 08 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

Clearly there is noise during normal use as well, but
incorporating it in benchmarks as a matter of course reduces the
usefulness of benchmarks

On the contrary:
1) The "noise during normal use" has to be measured in order to detect
the sensibility of the benchmarked program to that noise.
2) The noise the benchmarked program produces has to be measured too,
because the running benchmarked program probably increases the noise
for all other running programs.

In addition: the noise produced by a machine under heavy load might
bring the performance of the benchmarked program down to zero.

-manfred

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 3:03 PM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:

Clearly there is noise during normal use as well, but
incorporating it in benchmarks as a matter of course reduces the
usefulness of benchmarks

On the contrary:
1) The "noise during normal use" has to be measured in order to detect
the sensibility of the benchmarked program to that noise.

That sounds quite tenuous to me. How do you measure it, and what
conclusions do you draw other than there's a more or less other stuff
going on on the machine, and the machine itself has complex interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

where:

A > 0 is actual benchmark time

Q > 0 quantization noise (uniform distribution)

N > 0 various other noises (interrupts, task switching, networking, CPU
dynamically changing frequency, etc). Many people jump on Gaussian as an
approximation, but my tests suggest it's hardly so because it has a lot
of jerky outliers.

How do we estimate A given T?

2) The noise the benchmarked program produces has to be measured too,
because the running benchmarked program probably increases the noise
for all other running programs.

How to measure that? Also, that noise does not need to be measured as
much as eliminated to the extent possible. This is because the benchmark
app noise is a poor model of the application-induced noise.

In addition: the noise produced by a machine under heavy load might
bring the performance of the benchmarked program down to zero.

Of course. That's why the documentation emphasizes the necessity of
baselines. A measurement without baselines is irrelevant.

Andrei

Apr 08 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

all noise is additive (there's no noise that may make a benchmark
appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has
complex interactions". This complex interactions might lower the time
needed for an operation of the benchmarked program.

Examples that come to mind:
a) needed data is already in a (faster) cache because it belongs to a
memory block, from which some data is needed by some program not
belonging to the benchmarked set---and that block isnt replaced yet.
b) needed data is stored in a hdd whose I/O scheduler uses the elevator
algorithm and serves the request by pure chance instantly, because the
position of the needed data is between two positions accessed by some
programs not belonging to the benchmarked set.

Especially a hdd, if used, will be responsible for a lot of noise you
define as "quantization noise (uniform distribution)" even if the head
stays at the same cylinder. Not recognizing this noise would only mean
that the data is cached and interpreting the only true read from the
hdd as a jerky outlier sems quite wrong.

1) The "noise during normal use" has to be measured in order to
detect the sensibility of the benchmarked program to that noise.

conclusions do you draw other than there's a more or less other
stuff going on on the machine, and the machine itself has complex
interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

For example by running more than one instance of the benchmarked
program in paralell and use the thereby gathered statistical routines

2) The noise the benchmarked program produces has to be measured
too, because the running benchmarked program probably increases
the noise for all other running programs.

How to measure that?

Similar to the above note.

Also, that noise does not need to be measured
as much as eliminated to the extent possible.

I wouldn't define two programs to be equivalent based on the time until
completion only. That time might be identical for both programs, but if
only one of the programs increases the answering time of the machine
to inacceptability I would choose the other.

-manfred

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 9:25 AM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:
all noise is additive (there's no noise that may make a benchmark
appear to run faster)

This is in doubt, because you yourself wrote "the machine itself has
complex interactions". This complex interactions might lower the time
needed for an operation of the benchmarked program.

Examples that come to mind:
a) needed data is already in a (faster) cache because it belongs to a
memory block, from which some data is needed by some program not
belonging to the benchmarked set---and that block isnt replaced yet.

Which is great, unless the program wants to measure the cache memory
itself, in which case it would use special assembler instructions or
large memset()s. (We do such at Facebook.)

b) needed data is stored in a hdd whose I/O scheduler uses the elevator
algorithm and serves the request by pure chance instantly, because the
position of the needed data is between two positions accessed by some
programs not belonging to the benchmarked set.

Especially a hdd, if used, will be responsible for a lot of noise you
define as "quantization noise (uniform distribution)" even if the head
stays at the same cylinder. Not recognizing this noise would only mean
that the data is cached and interpreting the only true read from the
hdd as a jerky outlier sems quite wrong.

If the goal is to measure the seek time of the HDD, the benchmark itself
should make sure the HDD cache is cleared. (What I recall they do on
Linux is unmounting and remounting the drive.) Otherwise, it adds a
useless component to the timing.

1) The "noise during normal use" has to be measured in order to
detect the sensibility of the benchmarked program to that noise.

conclusions do you draw other than there's a more or less other
stuff going on on the machine, and the machine itself has complex
interactions?

Far as I can tell a time measurement result is:

T = A + Q + N

For example by running more than one instance of the benchmarked
program in paralell and use the thereby gathered statistical routines

I disagree with running two benchmarks in parallel because that exposes
them to even more noise (scheduling, CPU count, current machine load
etc). I don't understand the part of the sentence starting with "...use
the thereby...", I'd be grateful if you elaborated.

Andrei

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:29 AM, Francois Chabot wrote:
Which is great, unless the program wants to measure the cache memory
itself, in which case it would use special assembler instructions or
large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become
heavily cache-bound, it should show up in benchmarks somehow,
regardless of the previous expected behavior of the function.

But cache binding depends on a variety of cache characteristics, i.e.
the machine. The question is whether we accept a heavy dependence of the
benchmark on the machine.

Andrei

Apr 09 2012
Manfred Nowak <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

For example by running more than one instance of the benchmarked
program in paralell and use the thereby gathered statistical
routines to spread T into the additiv components A, Q and N.

I disagree with running two benchmarks in parallel because that
exposes them to even more noise (scheduling, CPU count, current

I did not mean to run two or more benchmarks in parallel but only
more instances of the benchmarked program _without_ the environment
supplied by the benchmakr. Of course may there be more noise. But if
so, that noise is dependent on the additional running instances and
to nothing else.

Now let T(n) be the time needed to run n instances in parallel on a

T(n) = n*A + Q + N + P(n)

where P(n)>=0 and P(1)==0 is the additional noise with which the
benchmarked program disturbs itself.

Please observe that Q and N eliminate themselves on subtraction:
T(2) - T(1) = A + P(2) - P(1)
T(3) - T(2) = A + P(3) - P(2)
...

P(n+1)-P(n) measures the sensitivity of the benchmarked program to
its one noise. As you wrote yourself its value should be close to
zero until the machine is close to falling flat.

I don't understand the part of the sentence starting with "...use
the thereby...", I'd be grateful if you elaborated.

Ohh..., an unrecognized deletion. I have written:
| use the thereby gathered _data to feed_ statistical routines to
as elaborated above.

-manfred

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(Resuming a few past discussions about std.benchmark.)

On 4/9/12 1:26 PM, Manfred Nowak wrote:
Andrei Alexandrescu wrote:
I disagree with running two benchmarks in parallel because that
exposes them to even more noise (scheduling, CPU count, current

I did not mean to run two or more benchmarks in parallel but only
more instances of the benchmarked program _without_ the environment
supplied by the benchmakr. Of course may there be more noise. But if
so, that noise is dependent on the additional running instances and
to nothing else.

I doubt that's the case. Once we discuss running several instances in
parallel, I predict the vagaries of scheduling become much more prevalent.

Now let T(n) be the time needed to run n instances in parallel on a

T(n) = n*A + Q + N + P(n)

where P(n)>=0 and P(1)==0 is the additional noise with which the
benchmarked program disturbs itself.

Please observe that Q and N eliminate themselves on subtraction:
T(2) - T(1) = A + P(2) - P(1)
T(3) - T(2) = A + P(3) - P(2)
...

P(n+1)-P(n) measures the sensitivity of the benchmarked program to
its one noise. As you wrote yourself its value should be close to
zero until the machine is close to falling flat.

It would be great if that applied, but I'm afraid things get
significantly more complex. For example, the noise will have a
per-processor component, and the processes will compete for cache among
themselves.

Andrei

Apr 14 2012
08.04.2012 21:31, Andrei Alexandrescu пишет:
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.

I've analyzed this quite a bit at work and the average and median are
not very informative. You need the mode, but in most benchmarks the mode
is very close to the minimum, so using the minimum is even better.

In speed measurements, all noise is additive (there's no noise that may
make a benchmark appear to run faster). There are also quite a few
outliers. Recording the average will include a fair amount of noise.

Yes of course. I mean than an algorithm itself should follow some
restrictions for such measurement. It should run with the same speed
every time. Many algorithms follows this convention, but not all.

Why will recording the average produce so much noise? As I see, floating
point arithmetic is now used without a strong reason so it looks like a
time of this part isn't valuable. Or is it just a temporary solution?

Anyway it should be configurable using a CT parameter, so it will not
abuse one who doesn't need it.

Clearly there is noise during normal use as well, but incorporating it
in benchmarks as a matter of course reduces the usefulness of benchmarks
as a mean to improve performance of the benchmarked code.

A graph is needed exactly because of that. Without a graph it really
gives very little.

So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation, graph (yes, good old ASCII art).

Standard deviation is also not very useful because it includes all
outliers (some of which are very far away from the mode).

So a graph is needed.

Yes, graph is needed.

I am not sure about that. We may provide the raw measurement data for
programs that want to plot things, but plotting is beyond the charter of
std.benchmark.

Sorry, meant a histogram, not a curve. A histogram can be shown in a
console very well. And it is needed because its the easiest way to show
benchmarked program behaviour (and noise behaviour). It also requires
only about 80 integers to store information and shouldn't produce much
noise.

IMHO, a histogram gives lots of information and will be a good addition.

--
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large,
e.g. milliseconds in a benchmark that takes microseconds) into account.
The minimum is shielded from this issue. In the limit, the minimum for
infinitely many measurements is the sought-after result.

As I see, floating
point arithmetic is now used without a strong reason so it looks like a
time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".

Anyway it should be configurable using a CT parameter, so it will not
abuse one who doesn't need it.

We'd like the framework to do the right thing, and the average does not
seem to be it.

IMHO, a histogram gives lots of information and will be a good addition.

I disagree.

Andrei

Apr 09 2012
09.04.2012 17:26, Andrei Alexandrescu пишет:
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:
Why will recording the average produce so much noise?

As I explained, the average takes noise and outliers (some very large,
e.g. milliseconds in a benchmark that takes microseconds) into account.
The minimum is shielded from this issue. In the limit, the minimum for
infinitely many measurements is the sought-after result.

As I see, floating
point arithmetic is now used without a strong reason so it looks like a
time of this part isn't valuable. Or is it just a temporary solution?

I don't understand "time of this part".

Looks like a misunderstanding because of my bad English: "Recording the
average will include a fair amount of noise" means for me that a process
of "recording the average" produces "a fair amount of noise" itself, not
that "a result includes noise".

--
Денис В. Шеломовский
Denis V. Shelomovskij

Apr 10 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 20:59, Denis Shelomovskij wrote:
Very good but minimum isn't a best guess. Personally I (and there will
be a lot of such maniacs I suppose) will think that this (minimum) time
can be significantly smaller than average time.

Prime example is networking.

So a parameter (probably with a default value) should be added.
Something like enum of flags telling what we want to know. At least
these looks usable: minTime, <some mean time>, maxTime,
standardDeviation,

+1

graph (yes, good old ASCII art).
Yes, graph is needed.

No not ASCII art. It's too coarse grained and the only good thing it can
do is look nicely in terminal (or give an overall picture but it's
limited to say ~25 blocks in a histogram).

What would be nice to have an option to output bare-bones info so that
one can easily adapt it and feed the result to e.g. gnuplot.
I think csv or it's version with tabs is good enough in this case.

Another cool addition IMHO would be parametric benchmarks, so there is a
function and a set of parameters (one parameter is fine too) to
benchmark on. It makes that much more sense with graphs as algorithm
profile plotted for various inputs (sizes) can be very revealing.

--
Dmitry Olshansky

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
Another cool addition IMHO would be parametric benchmarks, so there is a
function and a set of parameters (one parameter is fine too) to
benchmark on. It makes that much more sense with graphs as algorithm
profile plotted for various inputs (sizes) can be very revealing.

This would be interesting to explore, could you give some more detail

Andrei

Apr 08 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.04.2012 21:40, Andrei Alexandrescu wrote:
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:
Another cool addition IMHO would be parametric benchmarks, so there is a
function and a set of parameters (one parameter is fine too) to
benchmark on. It makes that much more sense with graphs as algorithm
profile plotted for various inputs (sizes) can be very revealing.

This would be interesting to explore, could you give some more detail

I thought to just throw it in. But if were to do it in under an hour:

benchmark_func_blah(int problem_size);
auto benchmark_params_func_blah = [ 1, 10, 100, 1_000, 10_000 ...];

or slightly more general - use the range protocol:
auto benchmark_params_func_blah = getRangeOfValues();

I'm not sure the prefix is place to put it but here the rough idea.

--
Dmitry Olshansky

Apr 08 2012
"Francois Chabot" <francois.chabot.dev gmail.com> writes:
 Which is great, unless the program wants to measure the cache
memory itself, in which case it would use special assembler
instructions or large memset()s. (We do such at Facebook.)

I disagree. If a regression suddenly causes a function to become
heavily cache-bound, it should show up in benchmarks somehow,
regardless of the previous expected behavior of the function.

--
Francois

Apr 09 2012
"Martin Nowak" <dawg dawgfoto.de> writes:
 I've analyzed this quite a bit at work and the average and median are
not very informative. You need the mode, but in most benchmarks the mode
is very close to the minimum, so using the minimum is even better.

In speed measurements, all noise is additive (there's no noise that may
make a benchmark appear to run faster). There are also quite a few
outliers. Recording the average will include a fair amount of noise.

when using randomized algorithms. You already get that from the LRU table
in the array append cache.

Apr 10 2012
Caligo <iteronvexor gmail.com> writes:
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very simple
API design, starting where I like it: one line of code.

Andrei

I probably missed this somewhere, but what happens to std.datetime :
benchmark ?  Is it going to be deprecated ?

Apr 08 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:03 PM, Caligo wrote:
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>  wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very simple
API design, starting where I like it: one line of code.

Andrei

I probably missed this somewhere, but what happens to std.datetime :
benchmark ?  Is it going to be deprecated ?

Yes, I forgot to mention that. Part of the proposal is that
std.datetime.StopWatch and std.datetime.benchmark are be deprecated.

Andrei

Apr 08 2012
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
Added to trello.

-Steve

Apr 09 2012
"Francois Chabot" <francois.chabot.dev gmail.com> writes:
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this
library, and yet they are relegated to second-class citizen: "Oh,
I guess you can use this". Normally, I wouldn't be so picky, but
this is a standard library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just
a bonus. Any benchmarking part of a large project is bound to
format the output itself (to log benchmark results against
revisions in a database or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at
first glance. Something along the lines of benchmarkModule() and
benchmarkAllModules() would be more sensible.

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 10:23 AM, Francois Chabot wrote:
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this library,
and yet they are relegated to second-class citizen: "Oh, I guess you can
use this". Normally, I wouldn't be so picky, but this is a standard
library. Focus should be on functionality.

The functionality is to make benchmark easy to use, meaningful, and easy
to interpret. I don't want to add a complicated library for
postprocessing benchmarks because most nobody will use it.

The first function in the documentation is what most people will want to
bring themselves to using. The functions that provide the data are
eminently available so I disagree with the "second-class citizen"
characterization. You want to use them, use them. They don't need to be
given rockstar billing.

Andrei

Apr 09 2012
Somedude <lovelydear mailmetrash.com> writes:
Le 09/04/2012 17:23, Francois Chabot a écrit :
Why is there so much emphasis on printBenchmarks()?

benchmark() and runBenchmarks() are clearly the core of this library,
and yet they are relegated to second-class citizen: "Oh, I guess you can
use this". Normally, I wouldn't be so picky, but this is a standard
library. Focus should be on functionality.

Providing formatted output is a nice bonus, but to me, it's just a
bonus. Any benchmarking part of a large project is bound to format the
output itself (to log benchmark results against revisions in a database
or something like that).

Also, benchmark() and runBenchmarks() are kind of confusing at first
glance. Something along the lines of benchmarkModule() and
benchmarkAllModules() would be more sensible.

The printBenchmark facility is cool and should be included imho.
It helps benchmarking being as standard as unit testing.
We don't want to have to write again and again the same boilerplate code
for such trivial uses.

Apr 09 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 11:44 AM, Somedude wrote:
It helps benchmarking being as standard as unit testing.
We don't want to have to write again and again the same boilerplate code
for such trivial uses.

Yes, I had unittest in mind when writing the library. If one needs more
than one statement to get an informative benchmark off the ground, we
failed.

Andrei

Apr 09 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a very
simple API design, starting where I like it: one line of code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529. (There's
some noise in there caused by my git n00biness). Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the priority of
this proposal. Clearly D's user base is highly interested in
efficiency, and many of the upcoming libraries have efficiency a
virtual part of their design. So we should get std.benchmark in soon
and require that new addition come with benchmarks for their
essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and porting to
new platforms.

I come to think the same.

How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is
In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() => {
tms t;
times(&t);
return t.tms_utime;
}

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
* User CPU time
* System CPU time
* Time spent in memory allocations
Count measurements:
* Memory usage
* L1/L2/L3 cache misses
* Number of executed instructions
* Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

All in all the user interface has been greatly simplified.

Jens

Apr 10 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

It's in the names. If the name of a benchmark starts with
benchmark_relative_, then that benchmark is considered relative to the
last non-relative benchmark. Using a naming convention allows complete
automation in benchmarking a module.

I figure it's fine that all results appear together because the absence
of data in the relative column clarifies which is which.

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is

Generally I fear piling too much on StopWatch because every feature adds
its own noise. But there's value in collecting the result of times().
What would be the Windows equivalent?

In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
tms t;
times(&t);
return t.tms_utime;
}

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
* User CPU time
* System CPU time
* Time spent in memory allocations
Count measurements:
* Memory usage
* L1/L2/L3 cache misses
* Number of executed instructions
* Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework too
much. Let's see:

* Memory usage is, I think, difficult in Windows.
* Estimating cache misses and executed instructions is significant research
* Number of memory allocations requires instrumenting druntime

Andrei

Apr 14 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei

Apr 14 2012
Jacob Carlborg <doob me.com> writes:
On 2012-04-15 04:58, Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei

As far as I know, there are already several other projects/modules in
the pipeline before std.benchmark. If we are to review them in the order
they are submitted.

--
/Jacob Carlborg

Apr 15 2012
"Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, April 15, 2012 13:41:28 Jacob Carlborg wrote:
On 2012-04-15 04:58, Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager offer.
Could someone please take this role?

Again, it would be great to get std.benchmark in sooner rather than
later because it can be used by subsequent submissions (many of which
allege efficiency as a major advantage) to show they improve over the
state of the art.

Thanks,

Andrei

As far as I know, there are already several other projects/modules in
the pipeline before std.benchmark. If we are to review them in the order
they are submitted.

There are, but none of them are being reviewed at the moment or being pushed
for a review. And there _is_ an argument for std.benchmark being high priority
based on how it could affect future reviews. We should probably start reviewing
_something_ soon though. We haven't had very good momentum on that of late.

- Jonathan M Davis

Apr 16 2012
"Lars T. Kyllingstad" <public kyllingen.net> writes:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review
manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add
it to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

-Lars

Apr 16 2012
Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.04.2012 1:00, Jens Mueller wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it
to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something
way longer. Since the Phobos' review process is based on Boost's where
does deviate?

It's peer review followed by voting and that's all about it.
I don't think you should seek any formal standards, documents,
guidelines, etc.

The manager role is rather simple:

1. Manger picks module from review queue and posts an announcement that
formal review of it starts today. Post should contain relevant
importantly it sets the exact time the review ends and the exact time
voting ends. (usually 2 weeks review, 1 week for voting)

2. When review ends Manager ether opens a vote thread.
If it's obvious that module needs futher work (on explicit request from
author) Manager may choose to prolong or postpone review thus putting it
back into queue.

3. When vote period ends count up the votes and declare the result.

And that's it.
There are no hard rule on which module has the priority in queue.
It is loosely calculated on basis of how long it was ready for review
and how important the functionality is.

P.S. If you are not sure are up to it just say the word and I'll
volunteer instead. We need to push this forward as the review queue is
going to overflow real soon ;)

--
Dmitry Olshansky

Apr 17 2012
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that
std.benchmark skips to the front of the queue instead of waiting in

Thanks,

Andrei

Apr 17 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 18:55:07 H. S. Teoh wrote:
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people
that std.benchmark skips to the front of the queue instead of

[...]

Let's just check it in since it's ready. No point in waiting.

Check it in? You mean start the review on it? I hope that you don't mean
"check it in" as in commit it. It needs to be reviewed and voted on first.

- Jonathan M Davis

Apr 17 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Lars T. Kyllingstad wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it
to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something
way longer. Since the Phobos' review process is based on Boost's where
does deviate?

Jens

Apr 16 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Dmitry Olshansky wrote:
On 17.04.2012 1:00, Jens Mueller wrote:
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it
to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

The review process is based on Boost's:

http://www.boost.org/community/reviews.html#Review_Manager

The page is shorter than expected. Last time I checked I found something
way longer. Since the Phobos' review process is based on Boost's where
does deviate?

It's peer review followed by voting and that's all about it.
I don't think you should seek any formal standards, documents,
guidelines, etc.

The manager role is rather simple:

1. Manger picks module from review queue and posts an announcement
that formal review of it starts today. Post should contain relevant
importantly it sets the exact time the review ends and the exact
time voting ends. (usually 2 weeks review, 1 week for voting)

2. When review ends Manager ether opens a vote thread.
If it's obvious that module needs futher work (on explicit request
from author) Manager may choose to prolong or postpone review thus
putting it back into queue.

3. When vote period ends count up the votes and declare the result.

And that's it.
There are no hard rule on which module has the priority in queue.
It is loosely calculated on basis of how long it was ready for
review and how important the functionality is.

P.S. If you are not sure are up to it just say the word and I'll
volunteer instead. We need to push this forward as the review queue
is going to overflow real soon ;)

Andrei is already preparing. Review will start soon.

Jens

Apr 17 2012
"SomeDude" <lovelydear mailmetrash.com> writes:
On Tuesday, 17 April 2012 at 21:15:31 UTC, Andrei Alexandrescu
wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with
people that std.benchmark skips to the front of the queue
agree/disagree.

Thanks,

Andrei

I agree, there is no good reason for the queue to be stopped like
it is now, but since noone seems to step up for the other
modules, let's not waste time and review your module. Hopefully,
the other modules will be reviewed ASAP. We gain nothing in
waiting.

Apr 17 2012
"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people
that std.benchmark skips to the front of the queue instead of

Let's just check it in since it's ready. No point in waiting.

T

--
Some days you win; most days you lose.

Apr 17 2012
"SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 18 April 2012 at 09:44:12 UTC, Marco Leise wrote:
Am Tue, 17 Apr 2012 16:15:43 -0500
In a supermarket, there are three people in front of you in the
queue. Then they all decide they forgot something and disappear
in the back. You exchange some looks with the cashier and
decide that you'll be long finished before any of them come

Well, we are not in a supermarket. It was correct to ask.

Apr 18 2012
"Famous" <none none.net> writes:
Instead of the originally proposed layout

================================================================
Benchmark                               relative ns/iter  iter/s
================================================================
---[ module_one ]-----------------------------------------------
file write                                        140.2K    7.1K
array creation                             1.2Kx  116.0     8.6M
================================================================

I would like to see something more similar to the following:

================================================================
Benchmark                                Time    Performance
---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
array creation                           116.00n   8.62M   1.21k
================================================================

Transition to the new layout can be achieved as follows:

* Consider layout guidelines given in

http://mirrors.ctan.org/info/german/tabsatz/tabsatz.pdf

* Reorder columns considering their semantic and introduce
groups.

* Rename 'relative' to 'factor' and avoid usage of 'x' for
'times'.

* Divide column 'ns/iter' by a billion (see Vladimir's
post).

* Right align module name (abbreviate if it is too long).

* Avoid '%' but rely on the prefixes specified in the SI:

http://en.wikipedia.org/w/index.php?title=International_System_of_Units&oldid=487057838#Units_and_prefixes

Then, using a fixed width of 7 characters we have 3
significant digits, at least:

123450000         -> 123.45M
12345000         ->  12.35M
1234500         ->   1.23M
123450         -> 123.45k
12345         ->  12.35k
1234.5       ->   1.23k
123.45      -> 123.45
12.345     ->  12.35
1.2345    ->   1.23
0.12345   -> 123.45m
0.012345  ->  12.35m
0.0012345 ->   1.23m

Cheers,
Famous

Apr 15 2012
"Famous" <none none.net> writes:
Maybe the layout is not destroyed, this time:

================================================================
Benchmark                                Time    Performance
---------------
name                                     s/iter  iter/s  factor
-----------------------[ module_one ]--- ------- ------- -------
file write                               140.20µ   7.13k
array creation                           116.00n   8.62M   1.21k
================================================================


Apr 15 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
On 4/10/12 5:40 AM, Jens Mueller wrote:
How come that the times based relative report and the percentage based
relative report are mixed in one result? And how do I choose which one
I'd like to see in the output.

It's in the names. If the name of a benchmark starts with
benchmark_relative_, then that benchmark is considered relative to
the last non-relative benchmark. Using a naming convention allows
complete automation in benchmarking a module.

I figure it's fine that all results appear together because the
absence of data in the relative column clarifies which is which.

I know. That wasn't my question. How do I choose between percentage vs.
factors for a relative benchmark, e.g. 200% vs. 2?

When benchmarking you can measure different things at the same time. In
this regard the current proposal is limited. It just measures wall clock
time. I believe extending the StopWatch to measure e.g. user CPU time is

Generally I fear piling too much on StopWatch because every feature
adds its own noise. But there's value in collecting the result of
times(). What would be the Windows equivalent?

I'm not a Windows user but GetProcessTimes
seems to be the Windows equivalent.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683223%28v=vs.85%29.aspx

In general, allowing user defined measurements would be great.
E.g. to measure the time spend in user mode.
() =>  {
tms t;
times(&t);
return t.tms_utime;
}

Note, that this code does not need to be portable. You can also use
version() else static assert.

Things that come to mind that I'd like to measure.
Time measurements:
* User CPU time
* System CPU time
* Time spent in memory allocations
Count measurements:
* Memory usage
* L1/L2/L3 cache misses
* Number of executed instructions
* Number of memory allocations

Of course wall clock time is the ultimate measure when benchmarking.
But often you need to investigate further (doing more measurements).

Do you think adding this is worthwhile?

Absolutely. I just fear about expanding the charter of the framework
too much. Let's see:

* Memory usage is, I think, difficult in Windows.
* Estimating cache misses and executed instructions is significant research
* Number of memory allocations requires instrumenting druntime

I think that std.benchmark shouldn't do this. But I think we should
figure out whether adding a user-defined measurement function is
possible. Making sure that be we have a design that is flexible enough
to capture different measurements.

Jens

Apr 15 2012
Jens Mueller <jens.k.mueller gmx.de> writes:
Andrei Alexandrescu wrote:
There have been quite a few good comments, but no review manager
offer. Could someone please take this role?

I will do this.
But I will need to get more familiar with the process. And add it to
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review
managers.

Jens

Apr 15 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 16:15:43 Andrei Alexandrescu wrote:
On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that
std.benchmark skips to the front of the queue instead of waiting in

I have no problem with it. None of the others is being pushed for review or
promoted at present. I'd just as soon have the std.benchmark review get under
way rather than sit around waiting for one of those earlier in the queue to
get started.

- Jonathan M Davis

Apr 17 2012
Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 17 Apr 2012 16:15:43 -0500
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

On 4/17/12 2:53 PM, Jens Mueller wrote:
Andrei is already preparing. Review will start soon.

I'd like to gather a bit of confidence that it's okay with people that
std.benchmark skips to the front of the queue instead of waiting in

Thanks,

Andrei

In a supermarket, there are three people in front of you in the queue. Then
they all decide they forgot something and disappear in the back. You exchange
some looks with the cashier and decide that you'll be long finished before any
of them come back. Go ahead.

--
Marco

Apr 18 2012
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu
wrote:
Hello,

I finally found the time to complete std.benchmark. I got to a
very simple API design, starting where I like it: one line of
code.

Code is in the form of a pull request at
https://github.com/D-Programming-Language/phobos/pull/529.
(There's some noise in there caused by my git n00biness).
Documentation is at
http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.

If reasonable and at all possible, I'd like to bump the
priority of this proposal. Clearly D's user base is highly
interested in efficiency, and many of the upcoming libraries
have efficiency a virtual part of their design. So we should
get std.benchmark in soon and require that new addition come
with benchmarks for their essential functionality.

My vision is that in the future Phobos will have a significant
benchmarks battery, which will help improving Phobos and
porting to new platforms.

Andrei

Where have std.benchmark gone?

I've written a benchmarking framework in C++ that I would like to
port to D.
If you've already implemented some cool stuff its better to
reuse/merge it with mine.

My C++ code has a bench of external deps (not included in this
post).
If you're interested
I'll send you the rest. It's inspired by Haskell's powerful
benchmarking framework(s) because it cant automatically figure
out how to randomize various kinds of types including recursive
instantiatinos of builtins, STL pairs, tuples and even containers
using a randomization framework that uses a
STL-container-type_trait (rand.hpp which is not included).

I'm guessing implementing such a framework in D is an order of
magnitude easier than in C++.

Innovative things about timed.hpp is constant time benchmarking
meaning that the loop will generate a larger and large instance
of the structure until a specified amount of time has passed. I
believe this i a more user-friendly way of benchmarking than most
other frameworks do. And with D's supercool struct/class tupleof
even more automation of instance generation is possible.

Destroy!

Code:

/*! \file timed.hpp
* \brief Benchmark Functions on Auto-Randomized Arguments.
* \author Per Nordlöw (per.nordlow gmail.com)
*
* \see
http://stackoverflow.com/questions/10059239/gccs-behaviour-with-stdasyncstdlaunchasync-vs-clangs-behaviour
* \see
http://stackoverflow.com/questions/8895350/quickcheck-like-template-function-benching-in-c
* \see
http://functionalcpp.wordpress.com/2013/08/03/generalized-function-evaluation/
* \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
* \see https://limcore.wordpress.com/2008/04/13/8/
* \see
http://www.randombit.net/bitbashing/programming/parallel_function_invocation_using_std_async.html
* \todo Use const boost::posix_time::time_duration & dur =
boost::posix_time::seconds(1)
*/

#pragma once
#include <utility>
#include <vector>
#include <future>
#include <chrono>
#include <type_traits>
#include "rand.hpp"
#include "type_traits_x.hpp"

namespace pnw
{

struct timed {
typedef std::chrono::high_resolution_clock hrc;
typedef std::chrono::milliseconds millis;
typedef std::chrono::microseconds micros;
public:
timed(size_t max_length = 1024*256,
millis total_duration_max = millis(1000),
millis call_duration_max = millis(0),
uint n_tries = 1)
: m_max_length(max_length),
m_tot_dur(total_duration_max),
m_call_dur(call_duration_max),
m_n_tries(n_tries)
{}

/*! Benchmark Function/Lambda/Functor \p f with non-void
return value.
* \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
*/
template<class Func>
auto call(Func&& f)
-> typename std::enable_if<!
std::is_void<decltype(f())>::value, // if non-void return
std::pair<decltype(f()),
typename
hrc::duration> >::type {
decltype(f()) result;
auto tA = hrc::now();
for (uint i = 0; i < m_n_tries; i++) { result = f(); }
//
call it \c m_n_tries times
auto span = hrc::now() - tA;
return std::make_pair(result, span);
}

/*! Benchmark Function/Lambda/Functor \p f with \c void
return value.
* \see
http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html
*/
template<class Func>
auto call(Func&& f)
-> typename
std::enable_if<std::is_void<decltype(f())>::value, // if void
return
typename hrc::duration>::type
{
auto tA = hrc::now();
for (uint i = 0; i < m_n_tries; i++) { f(); } // call it
\c m_n_tries times
return hrc::now() - tA; // reverse order?
}

#if 0
/*! Call Function \p f after \p us microseconds.
* \see
http://stackoverflow.com/questions/10804683/delayed-function-call
*/
template<class Func, class... Args>
inline std::future<typename
std::result_of<Func(Args...)>::type>
delayed_async(Func&& f, Args&&... args, micros us) {
return std::async(std::launch::async, [f, us, args...]
()
{
return f(args...);            // call
} );
}
#endif

/*! Benchmark the \em Single Argument Function \p f using
Randomized Instances of type \p Cont.
* \tparam C Container Type.
* \tparam B benchS (Times) Container.
* \param[in] f Function.
* \return vector of pairs of \c C-instance size and
execute-time.
* \todo Estimate Time Complexity Model: O(c), O(n), ...,
O(n^p), O(log(n)) O(exp(n))
*       and predict cost of next iteration to halt within
time-limit.
* \see
http://stackoverflow.com/questions/7245235/syntax-of-c-templates-with-function-type-parameters
*/
template<class C, class B = std::vector<std::pair<size_t,
double> >

call_s(void (*f)(C&)) {
const auto deadline = hrc::now() + m_tot_dur;
B bm;                    // benchmarks
C c;                     // test data
for (size_t n = 1; n <= m_max_length; n *= 2) {
// efficient way of adding more random values to a
vector
auto c2 = rand_n<C>(n); c.insert(begin(c),
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
auto tA = hrc::now(); // log start
for (uint i = 0; i < m_n_tries; i++) { f(c); } //
call it \c m_n_tries times
auto span = hrc::now() - tA;
if (m_call_dur.count() != 0 and
span >= m_call_dur) { break; } // if function
call took too long
auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
if (hrc::now() + span >= deadline) { // if no time
for yet another measure taking at least \c span
break;
}
}
return bm;
}

/*! Benchmark the \em Single Argument Function \p f using
Randomized Instances of type \p Cont.
* \see Previous overload of \c call_s
*/
template<class C, class B = std::vector<std::pair<size_t,
double> > >
typename std::enable_if<is_container<C>::value, B>::type
call_s(void (*f)(const C&)) {
return call_s(reinterpret_cast<void (*)(C&)>(f));
}

/*! Benchmark Function \p f using Randomized Instances of
type \p Cont.
* \tparam C Container.
* \tparam B benchS (Times) Container.
* \param[in] f Function.
* \return vector of pairs of \c C-instance size and
execute-time.
*/
template<class C, class B = std::vector<std::pair<size_t,
double> >

call(void (*f)(typename C::iterator,
typename C::iterator))
{
const auto deadline = hrc::now() + m_tot_dur;
B bm;                    // benchmarks
C c;                     // test data
for (size_t n = 1; n <= m_max_length; n *= 2) {
// efficient way of adding more random values to a
vector
auto c2 = rand_n<C>(n); c.insert(begin(c),
begin(c2),
end(c2)); // \see http://stackoverflow.com/a/2551785/683710
auto tA = hrc::now(); // log start
for (uint i = 0; i < m_n_tries; i++) { f(c.begin(),
c.end()); } // call it \c m_n_tries times
auto span = hrc::now() - tA;
if (m_call_dur.count() != 0 and
span >= m_call_dur) { break; } // if function
call took too long
auto count =
std::chrono::duration_cast<std::chrono::microseconds>(span).count();
bm.emplace_back(n, static_cast<double>(count)/1e6);
// store
if (hrc::now() + span >= deadline) { // if there's
no
time for more measures
break;
}
}
return bm;
}

public:
size_t m_max_length = 1024*256; ///< Maximum container
instance length of \c T.
millis m_tot_dur = millis(1000); ///< Maximum time to wait
for whole benchmarking to complete.
millis m_call_dur = millis(0); ///< Maximum time to wait for
a call \p f to complete.
uint m_n_tries = 1;    ///< Number of tries per size of
randomizeda instance.
std::launch m_launch_policy = std::launch::async |