digitalmars.D - Time to move std.experimental.checkedint to std.checkedint ?

Walter Bright (1/1) Mar 23 2021 It's been there long enough.

mw (5/6) Mar 23 2021 Can we fix all the problems found in this ticket:

Walter Bright (2/7) Mar 24 2021 Those are all good enhancement ideas.
Jacob Carlborg (9/12) Mar 24 2021 I'm not sure if the first thing can be supported. I would require
tsbockman (65/68) Mar 24 2021 Years ago I submitted a checkedint module of my own for inclusion

Walter Bright (11/15) Mar 26 2021 Integer overflow happening should not result in memory safety errors in ...

tsbockman (28/39) Mar 27 2021 That's an implementation detail. There is no need at either the

Andrei Alexandrescu (8/18) Mar 29 2021 This claim seems speculative. A factor of two for a fundamental class of...

tsbockman (21/26) Mar 29 2021 This is true. But, at the moment I don't have an easy way to

Andrei Alexandrescu (18/25) Mar 29 2021 You actually do. Apply the scientific method.

tsbockman (10/12) Mar 30 2021 I skimmed the paper, and from what I have seen so far it supports

Walter Bright (27/46) Mar 29 2021 With the LEA instruction, which can do adds and some multiplies in one

Paul Backus (8/13) Mar 29 2021 Well...sometimes they do:

Walter Bright (3/17) Mar 29 2021 Without an example, I don't know what you mean.

tsbockman (29/31) Mar 29 2021 It only seems unclear because you have accepted the idea that

H. S. Teoh (77/108) Mar 29 2021 The only thing at fault here is the name "integer". `int` in D is

Walter Bright (5/7) Mar 29 2021 You're right. It's not an integer, it's an int :-)
tsbockman (9/22) Mar 29 2021 You have a wildly exaggerated sense of the runtime performance
Max Samukha (3/11) Mar 29 2021 I seems you are arguing against the way D broke compile time

Walter Bright (2/14) Mar 29 2021 Compile-time isn't a run-time performance issue.

Max Haughton (13/30) Mar 30 2021 On the subject of run-time performance, checkedint can also do

Bruce Carneal (10/21) Mar 30 2021 [...]

Walter Bright (2/4) Mar 30 2021 Yes, that's also why I want it to have more visibility by being in Phobo...

Max Samukha (3/4) Mar 31 2021 Performance is irrelevant to the fact that D frivolously violates

Max Haughton (2/7) Mar 31 2021 Like?
Timon Gehr (3/10) Apr 01 2021 Not just at compile time, but it's less noticeable at runtime because

Walter Bright (13/42) Mar 29 2021 Programmers need to accept that computer math is different in arbitrary ...

tsbockman (4/7) Mar 29 2021 As someone else shared earlier in this thread, Zig already

Walter Bright (7/14) Mar 29 2021 I amend my statement to "immediately make D as uncompetitive as Zig is"

Jacob Carlborg (7/13) Mar 30 2021 The question is then, does that mean that Zig has over 131070

Rumbu (7/21) Mar 30 2021 In Zig, integer type names are not considered keywords, e.g you

tsbockman (4/15) Mar 30 2021 So you're now dismissing Zig as slow because its feature set

Walter Bright (8/11) Mar 30 2021 Because it surprised me? No. Because if someone had figured out a way to...

tsbockman (65/76) Mar 30 2021 Zero runtime cost is not a reasonable standard unless the feature

Walter Bright (25/42) Mar 30 2021 Thank you for running benchmarks.

tsbockman (24/39) Mar 30 2021 Note that I deliberately chose an integer-intensive workload, and

Andrei Alexandrescu (3/10) Mar 30 2021 Idea: build dmd with -ftrapv (which is supported, I think, by gdc and

Andrei Alexandrescu (2/11) Mar 30 2021 That's awfully close to "No true Scotsman".

tsbockman (4/16) Mar 30 2021 Just tossing out names of fallacies isn't really very helpful if

Andrei Alexandrescu (5/21) Mar 30 2021 I thought it's fairly clear - the claim is non-falsifiable: if code is

Andrei Alexandrescu (3/25) Mar 30 2021 s/Code without checks could benefit of other/Code with checks could

tsbockman (76/106) Mar 31 2021 Thank you for explaining anyway.

Andrei Alexandrescu (6/21) Mar 30 2021 Instead of passing the burden of proof back and forth, some evidence

Vladimir Panteleev (5/7) Mar 30 2021 Typing --help in the flags box answers that question :) And the

Andrei Alexandrescu (3/10) Mar 30 2021 Cool, thanks. I was looking for "the fastest code that still has the

Andrei Alexandrescu (3/14) Mar 30 2021 I guess that'd be "-O ReleaseSafe":
Vladimir Panteleev (10/21) Mar 30 2021 Right, sorry.

Andrei Alexandrescu (5/31) Mar 30 2021 Not much to write home about. The jumps scale linearly with the number

tsbockman (5/9) Mar 30 2021 Ideally, in release builds the compiler could loosen up the

Andrei Alexandrescu (3/14) Mar 30 2021 Yah, was hoping I'd find something like that. Was disappointed. That

Jacob Carlborg (9/11) Mar 31 2021 The reason, or one of the reasons, why Zig is/can be faster than

Max Haughton (3/14) Mar 31 2021 Specific Example? GCC and LLVM are both almost rabid when you

Andrei Alexandrescu (4/19) Mar 31 2021 Even if that's the case, "we choose to use by default different flags

Max Haughton (13/37) Mar 31 2021 Intel C++ can be a little naughty with the fast math options,

Walter Bright (7/12) Mar 31 2021 Benchmarks are always going to be unfair, but it's only reasonable to tr...

Jacob Carlborg (9/11) Apr 01 2021 No, that's why I said "can be". But what I meant is that just running

Walter Bright (16/22) Mar 30 2021 The ldc:

Andrei Alexandrescu (2/30) Mar 30 2021 Yah, actually gdc uses lea as well: https://godbolt.org/z/Gb6416EKe
Elronnd (3/5) Mar 30 2021 The lea is the exact same length as the sequence of moves, and

Walter Bright (5/12) Mar 30 2021 It's a win because it uses the address decoder logic which is separate f...

Vladimir Panteleev (4/9) Mar 30 2021 Haven't CPUs used register renaming for a long time now? It's

Andrei Alexandrescu (3/9) Mar 30 2021 Affirmative if you consider the Nehalem modern:

Vladimir Panteleev (8/10) Mar 30 2021 Um, that was released 13 years ago.

Andrei Alexandrescu (6/17) Mar 30 2021 It carried over afaik to all subsequent Intel CPUs:

Elronnd (8/12) Mar 31 2021 Less with the talking, more with the benchmarking!

Walter Bright (3/12) Mar 30 2021 If you use a register that needs to be saved on the stack, it's going to...

Vladimir Panteleev (12/28) Mar 30 2021 Thanks!

Walter Bright (27/47) Mar 31 2021 Slower than ADD, but not slower than multiple ADDs. DMD does not replace...

Vladimir Panteleev (23/35) Mar 31 2021 Thanks for the insight!

Walter Bright (7/26) Mar 31 2021 People will prefer what makes them money :-)

Vladimir Panteleev (11/29) Mar 31 2021 You would think someone would have told that to all the companies

Andrei Alexandrescu (3/19) Mar 31 2021 It is. I know because I collaborated with the provisioning team at Faceb...

Vladimir Panteleev (5/12) Mar 31 2021 I don't understand what you mean by this.

Andrei Alexandrescu (6/21) Mar 31 2021 Using languages has to take important human factors into effect, e.g.

Andrei Alexandrescu (5/16) Mar 31 2021 Factually true. Millions of dollars a year that is.

Vladimir Panteleev (25/28) Mar 30 2021 Right, but as we both know, speed doesn't necessarily scale with

Walter Bright (4/8) Mar 30 2021 The code uses hardcoded loop limits. Yes, the compiler can infer no over...

Vladimir Panteleev (6/15) Mar 30 2021 Well, this is fake artificial code, and looping a fixed number of

Andrei Alexandrescu (13/46) Mar 30 2021 Of course, and I wasn't suggesting the contrary. If speed would simply

tsbockman (13/19) Mar 30 2021 I already posted both some Zig benchmark results of my own, and

Andrei Alexandrescu (9/29) Mar 30 2021 That's surprising so some investigation would be in order. From what I

tsbockman (9/11) Mar 30 2021 -fwrapv isn't supposed to do anything discernible; it just
tsbockman (6/12) Mar 30 2021 Perhaps the additional runtime validation is causing reduced

sighoya (9/12) Mar 29 2021 The point is the overflow check is already done by most cpus
Elronnd (11/14) Mar 30 2021 Dan Luu measures overflow checks as having an overall 1%

Andrei Alexandrescu (5/16) Mar 30 2021 Bit surprised about how you put it. Are you sure you represent what the

Elronnd (6/8) Mar 31 2021 Look at the 'fsan ud' row of the only table. 1% is the

Walter Bright (3/4) Mar 31 2021 1% is a serious improvement. If it wasn't, why would Rust (for example) ...

Elronnd (4/9) Apr 01 2021 That's an appeal to authority. You haven't actually justified

Walter Bright (25/36) Apr 02 2021 That's backwards. You want other people to invest in this technology, yo...

Guillaume Piolat (9/12) Apr 03 2021 Seems to be a bit like bounds checks (less obvious benefits), it

John Colvin (7/20) Apr 03 2021 It’s not like bounds checks because there’s loads of code out

jmh530 (6/11) Mar 27 2021 Are you familiar with how Zig handles overflow [1]? They error on

tsbockman (22/27) Mar 27 2021 Thanks for the link; I hadn't seen Zig's take before. It agrees

Berni44 (6/7) Mar 24 2021 Isn't that true meanwhile for everything in std.experimental? I

Q. Schroll (24/31) Mar 24 2021 I have no idea why std.experimental is a thing to begin with. It

Steven Schveighoffer (20/38) Mar 29 2021 It's there because we wanted a place for new parts of phobos to develop

Guillaume Piolat (15/19) Mar 29 2021 I was intringued and digged a bit of forum history:

Walter Bright <newshound2 digitalmars.com> writes:

It's been there long enough.

Mar 23 2021

mw <mingwu gmail.com> writes:

On Tuesday, 23 March 2021 at 21:22:18 UTC, Walter Bright wrote:
 It's been there long enough.


Can we fix all the problems found in this ticket:

https://issues.dlang.org/show_bug.cgi?id=21169

  Issue 21169 - make checkedint as a drop-in replacement of native 
int/long

Mar 23 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/23/2021 2:26 PM, mw wrote:
 Can we fix all the problems found in this ticket:
 
 https://issues.dlang.org/show_bug.cgi?id=21169
 
   Issue 21169 - make checkedint as a drop-in replacement of native int/long

Those are all good enhancement ideas.

Mar 24 2021

Jacob Carlborg <doob me.com> writes:

On Tuesday, 23 March 2021 at 21:26:43 UTC, mw wrote:

 https://issues.dlang.org/show_bug.cgi?id=21169

  Issue 21169 - make checkedint as a drop-in replacement of 
 native int/long

I'm not sure if the first thing can be supported. I would require 
implicit conversions of custom types, which has always been 
refused in the past.

I don't think the last one, number 7, can work either. checkedint 
supports adding arbitrary hooks that are executed during various 
conditions. I don't see how those could be made atomic.

--
/Jacob Carlborg

Mar 24 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 23 March 2021 at 21:26:43 UTC, mw wrote:
 https://issues.dlang.org/show_bug.cgi?id=21169

  Issue 21169 - make checkedint as a drop-in replacement of 
 native int/long

Years ago I submitted a checkedint module of my own for inclusion 
in Phobos (https://code.dlang.org/packages/checkedint), which was 
ultimately rejected by Andrei Alexandrescu because my design 
goals did not align with his well enough, prompting him to write 
what became std.experimental.checkedint himself.

Maximum convenience and similarity to D's native integer types 
were high priorities for me, so I spent a lot of time thinking 
about and experimenting with this problem. My conclusions:

///////////////////////////////////
1) Checked types are different from unchecked types. That's the 
whole point!

I found that trying too hard to make transitioning between 
checked and unchecked types seamless created holes in the 
automated protection against overflow that the checked types are 
supposed to provide. Implicit conversions from checked to 
unchecked integers are dangerous for the same reason that 
implicit conversions from  system to  safe delegates are 
dangerous.

I think the urge to make that transition seamless comes from the 
fact that trying to actually use checkedint (whether mine or 
Andrei's) for defensive programming is extremely tedious and 
annoying, because no one else is doing so. But, this is the wrong 
solution: the real answer is that checked operations should have 
been the default in D from the beginning, with unchecked 
intrinsics available for those rare cases where wrapping overflow 
and other strange behaviors of machine integers are actually 
desired, or where maximum performance is needed.

Unchecked integer operations are mostly just a micro-optimization 
that is pointless outside of very hot code, like inner loops. (It 
is very puzzling that people consider memory safety so important, 
and yet are totally disinterested in integer overflow, which can 
violate memory safety.)

2) While there are many things that can be done to make the 
behavior of two types more similar, it is impossible in D to make 
any custom type an actual drop-in replacement for a different 
type.

This is because D, by design, has only partial support for 
implicit conversions, and because template constraints and 
overload resolution are sensitive to the exact type of the 
arguments.

Thus, whether to treat two different types as equivalent is 
ultimately a choice that each and every API that may interact 
with those types makes for itself, either intentionally or by 
accident. For example:

V f(V)(V value)
     if(std.traits.isIntegral!V)
{
     // Do something here ...
}

The perfectly reasonable template constraint above rejects 
checkedint types. Should it? There is no way to answer this 
question without seeing and understanding the body of the 
function: while uncommon, it is valid and sometimes desirable to 
depend upon wrapped integer overflow. So, the API designers must 
explicitly permit checkedint inputs if they consider that 
desirable.

Automating good solutions to these ambiguities is possible in 
many cases, but would require deep, breaking, and controversial 
changes to the D language.
///////////////////////////////////

TLDR; What you're really asking for is impossible in D2. It would 
require massive breaking changes to the language to implement 
without undermining the guarantees that a checked integer type 
exists to provide.

Mar 24 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/24/2021 1:28 PM, tsbockman wrote:
 Unchecked integer operations are mostly just a micro-optimization that is 
 pointless outside of very hot code, like inner loops. (It is very puzzling
that 
 people consider memory safety so important, and yet are totally disinterested
in 
 integer overflow, which can violate memory safety.)

Integer overflow happening should not result in memory safety errors in a safe 
language. It can cause other problems, but not that.

The reasons people don't care that much about integer overflow are:

1. they are not the cause of enough problems to be that concerning

2. 2's complement arithmetic fundamentally relies on it

3. it's hard to have signed and unsigned integer types coexist without 
overflows, and not having unsigned types leads to ugly kludges to get them

4. fast integer arithmetic is fundamental to fast code, not a mere 
micro-optimization. Who wants an overflow check on every pointer increment?

5. size_t is unsigned, and ptrdiff_t is signed. Yet they have to work together.

Mar 26 2021

tsbockman <thomas.bockman gmail.com> writes:

On Saturday, 27 March 2021 at 03:25:04 UTC, Walter Bright wrote:
 The reasons people don't care that much about integer overflow 
 are:

 1. they are not the cause of enough problems to be that 
 concerning

 2. 2's complement arithmetic fundamentally relies on it

That's an implementation detail. There is no need at either the 
software or the hardware level to make it the programmer's 
problem by default.

Main memory is addressed as one giant byte array, but we interact 
with it through better abstractions most of the time (the stack 
and the heap).

 3. it's hard to have signed and unsigned integer types coexist 
 without overflows, and not having unsigned types leads to ugly 
 kludges to get them

Correctly mixing signed and unsigned integers is hard for 
programmers to consistently get right, but easy for the computer. 
That's why the default should be for the computer to do it.

 4. fast integer arithmetic is fundamental to fast code,

I did benchmarking during the development of checkedint. With 
good inlining and optimization, even a library solution generally 
slows integer math code down by less than a factor of two. (I 
expect a language solution could do even better.)

This is significant, but nowhere near big enough to move the 
bottleneck in most code away from I/O, memory, floating-point, or 
integer math for which wrapping is semantically correct (like 
hashing or encryption). In those cases where integer math code 
really is the bottleneck, there are often just a few hot spots 
where the automatic checks in some inner loop need to be replaced 
with manual checks outside the loop.

 not a mere micro-optimization.

By "micro-optimization" I mean that it does not affect the 
asymptotic performance of algorithms, does not matter much 
outside of hot spots, and is unlikely to change where the hot 
spots are in the average program.

 Who wants an overflow check on every pointer increment?

As with bounds checks, most of the time the compiler should be 
able to prove the checks can be skipped, or move them outside the 
inner loop. The required logic is very similar.

Mar 27 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/27/21 3:42 AM, tsbockman wrote:
 With good inlining and optimization, even a library solution generally 
 slows integer math code down by less than a factor of two. (I expect a 
 language solution could do even better.)
 
 This is significant, but nowhere near big enough to move the bottleneck 
 in most code away from I/O, memory, floating-point, or integer math for 
 which wrapping is semantically correct (like hashing or encryption). In 
 those cases where integer math code really is the bottleneck, there are 
 often just a few hot spots where the automatic checks in some inner loop 
 need to be replaced with manual checks outside the loop.

This claim seems speculative. A factor of two for a fundamental class of 
operations is very large, not just "significant". We're talking about 
e.g. 1 cycle for addition, and it was a big deal when it was introduced 
back in the early 2000s. Checked code is larger, meaning more pressure 
on the scarce I-cache in large programs - and that's not going to be 
visible in microbenchmarks. And "I/O is slow anyway" is exactly what 
drove the development of C++ catastrophically slow iostreams.

Mar 29 2021

tsbockman <thomas.bockman gmail.com> writes:

On Monday, 29 March 2021 at 16:41:12 UTC, Andrei Alexandrescu 
wrote:
 Checked code is larger, meaning more pressure on the scarce
 I-cache in large programs - and that's not going to be visible
 in microbenchmarks.

This is true. But, at the moment I don't have an easy way to 
quantify the size of that effect.

 And "I/O is slow anyway" is exactly what drove the development 
 of C++ catastrophically slow iostreams.

That's really not what I said, though. What I actually said is:

0) The performance of hot code is usually limited by something 
other than semantically non-wrapping integer arithmetic.
1) When non-wrapping integer arithmetic is the bottleneck, the 
compiler should usually be able to optimize away most of the cost 
of checking for overflow.
2) When the compiler cannot optimize away most of the cost, 
programmers can usually do so manually.
3) Programmers could still disable the checks entirely wherever 
they consider the performance gain worth the damage done to 
correctness/reliability.
4) Outside of hot code, the cost isn't significant.

You're picking on (0), but the validity of my claim that checked 
arithmetic by default wouldn't negatively impact performance much 
mainly depends upon the truth of (4) plus either the truth of 
(1), or the willingness and ability of programmers to take 
advantage of (2) and (3).

Mar 29 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/29/21 3:25 PM, tsbockman wrote:
 On Monday, 29 March 2021 at 16:41:12 UTC, Andrei Alexandrescu wrote:
 Checked code is larger, meaning more pressure on the scarce
 I-cache in large programs - and that's not going to be visible
 in microbenchmarks.

 
 This is true. But, at the moment I don't have an easy way to quantify 
 the size of that effect.

You actually do. Apply the scientific method.

This is not a new idea, most definitely has been around for years and 
people have tried a variety of things. So all you need to do is search 
around scholar.google.com for papers on the topic and plain google.com 
for other work on the topic. In a couple of minutes I found:

* https://dl.acm.org/doi/abs/10.1145/2743019 - relatively recent, quotes 
a lot of other work. A good starting point.

* -ftrapv and -fwrapv flags in gcc: 
https://gcc.gnu.org/onlinedocs/gcc-4.0.2/gcc/Code-Gen-Options.html. This 
is not quite what you're looking for (they just crash the program on 
overflow), but it's good to figure how much demand there is and how 
people use those flags.

* How popular is automated/manual overflow check in systems languages? 
Rust is a stickler for safety and it has explicit operations that check: 
https://stackoverflow.com/questions/52646755/checking-for-integ
r-overflow-in-rust. 
I couldn't find any proposal for C or C++. What does this lack of 
evidence suggest? etc.

Mar 29 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 30 March 2021 at 01:09:12 UTC, Andrei Alexandrescu 
wrote:
 * https://dl.acm.org/doi/abs/10.1145/2743019 - relatively 
 recent, quotes a lot of other work. A good starting point.

I skimmed the paper, and from what I have seen so far it supports 
my understanding of the facts in every way. I intend to read it 
more carefully later this week and post a summary here of the 
most relevant bits, for the benefit of anyone who doesn't want to 
pay for it.

Of course, there is a subject aspect to all of this as well; even 
with numbers in hand reasonable people may disagree as to what 
should be done about them.

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 9:41 AM, Andrei Alexandrescu wrote:
 On 3/27/21 3:42 AM, tsbockman wrote:
 With good inlining and optimization, even a library solution generally slows 
 integer math code down by less than a factor of two. (I expect a language 
 solution could do even better.)

 This is significant, but nowhere near big enough to move the bottleneck in 
 most code away from I/O, memory, floating-point, or integer math for which 
 wrapping is semantically correct (like hashing or encryption). In those cases 
 where integer math code really is the bottleneck, there are often just a few 
 hot spots where the automatic checks in some inner loop need to be replaced 
 with manual checks outside the loop.

 
 This claim seems speculative. A factor of two for a fundamental class of 
 operations is very large, not just "significant". We're talking about e.g. 1 
 cycle for addition, and it was a big deal when it was introduced back in the 
 early 2000s. Checked code is larger, meaning more pressure on the scarce
I-cache 
 in large programs - and that's not going to be visible in microbenchmarks. And 
 "I/O is slow anyway" is exactly what drove the development of C++ 
 catastrophically slow iostreams.

With the LEA instruction, which can do adds and some multiplies in one 
operation, this calculation often comes at zero cost, as it is uses the address 
calculation logic that runs in parallel.

LEA does not set any flags or include any overflow detection logic.

Just removing that optimization will result in significant slowdowns.

Yes, bugs happen because of overflows. The worst consequence of this is memory 
corruption bugs in the form of undersized allocations and subsequent buffer 
overflows (from malloc(numElems * sizeElem)). But D's buffer overflow
protection 
features mitigate this.

D's integral promotion rules (bytes and shorts are promoted to ints before
doing 
arithmetic) get rid of the bulk of likely overflows. (It's ironic that the 
integral promotion rules are much maligned and considered a mistake, I don't 
share that opinion, and this is one of the reasons why.)

In my experience, there are very few places in real code where overflow is a 
possibility. They usually come in the form of unexpected input, such as overly 
large files, or specially crafted malicious input. I've inserted checks in
DMD's 
implementation where overflow is a risk.

Placing the burden of checks everywhere is a poor tradeoff.

It isn't even clear what the behavior on overflows should be. Error?
Wraparound? 
Saturation? std.experimental.checkedint enables the user to make this decision 
on a case-by-case basis. The language properly defaults to the simplest and 
fastest choice - wraparound.

BTW, Rust does have optional overflow protection, it's turned off for release 
builds. This is pretty good evidence the performance cost of such checks is not 
worth it. It also does not do integral promotion, so Rust code is far more 
vulnerable to overflows.

Mar 29 2021

Paul Backus <snarwin gmail.com> writes:

On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
 D's integral promotion rules (bytes and shorts are promoted to 
 ints before doing arithmetic) get rid of the bulk of likely 
 overflows. (It's ironic that the integral promotion rules are 
 much maligned and considered a mistake, I don't share that 
 opinion, and this is one of the reasons why.)

Well...sometimes they do:

     auto result = int.max + int.max;
     writeln(typeof(result).stringof); // int
     writeln(result); // -2

The main issue with D's integer promotion rules is that they're 
inconsistent. Sometimes truncating the result of an expression 
requires an explicit cast, and sometimes it doesn't.

Mar 29 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 2:05 PM, Paul Backus wrote:
 On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
 D's integral promotion rules (bytes and shorts are promoted to ints before 
 doing arithmetic) get rid of the bulk of likely overflows. (It's ironic that 
 the integral promotion rules are much maligned and considered a mistake, I 
 don't share that opinion, and this is one of the reasons why.)

 
 Well...sometimes they do:
 
     auto result = int.max + int.max;
     writeln(typeof(result).stringof); // int
     writeln(result); // -2

I wrote "the bulk of", not "all"


 The main issue with D's integer promotion rules is that they're inconsistent. 
 Sometimes truncating the result of an expression requires an explicit cast,
and 
 sometimes it doesn't.

Without an example, I don't know what you mean.

Mar 29 2021

tsbockman <thomas.bockman gmail.com> writes:

On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
 It isn't even clear what the behavior on overflows should be. 
 Error? Wraparound? Saturation?

It only seems unclear because you have accepted the idea that 
computer code "integer" operations may differ from mathematical 
integer operations in arbitrary ways. Otherwise, the algorithm is 
simple:

     if(floor(mathResult) <= codeResult && codeResult <= 
ceil(mathResult))
         return codeResult;
     else
         signalErrorSomehow();

Standard mathematical integer addition does not wrap around or 
saturate. When someone really wants an operation that wraps 
around or saturates (not just for speed's sake), then that is a 
different operation and should use a different name and/or 
type(s), to avoid sowing confusion and ambiguity throughout the 
codebase for readers and compilers.

All of the integer behavior that people complain about violates 
this in some way: wrapping overflow, incorrect signed-unsigned 
comparisons, confusing/inconsistent implicit conversion rules, 
undefined behavior of various more obscure operations for certain 
inputs, etc.

Mathematical integers are a more familiar, simpler, easier to 
reason about abstraction. When we use this abstraction, we can 
draw upon our understanding and intuition from our school days, 
use common mathematical laws and formulas with confidence, etc. 
Of course the behavior of the computer cannot fully match this 
infinite abstraction, but it could at least tell us when it is 
unable to do what was asked of it, instead of just silently doing 
something else.

Mar 29 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Mar 29, 2021 at 10:47:49PM +0000, tsbockman via Digitalmars-d wrote:
 On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
 It isn't even clear what the behavior on overflows should be. Error?
 Wraparound? Saturation?

 
 It only seems unclear because you have accepted the idea that computer
 code "integer" operations may differ from mathematical integer
 operations in arbitrary ways.

The only thing at fault here is the name "integer". `int` in D is
defined to be a 32-bit machine word. The very specification of "32-bit"
already implies modulo 2^32. Meaning, this is arithmetic modulo 2^32,
this is NOT a mathematical infinite-capacity integer. Ditto for the
other built-in integral types. When you typed `int` you already signed
up for all of the "unintuitive" behaviour that has been the standard
behaviour of built-in machine words since the 70's and 80's.  They
*approximate* mathematical integers, but they are certainly NOT the same
thing as mathematical integers, and this is *by definition*.

If you want mathematical integers, you should be using std.bigint or
something similar instead.


 Otherwise, the algorithm is simple:
 
     if(floor(mathResult) <= codeResult && codeResult <= ceil(mathResult))
         return codeResult;
     else
         signalErrorSomehow();

Implementing such a scheme would introduce so much overhead that it
would render the `int` type essentially useless for systems programming.
Or for any application where performance is important, for that matter.


 Standard mathematical integer addition does not wrap around or
 saturate.  When someone really wants an operation that wraps around or
 saturates (not just for speed's sake), then that is a different
 operation and should use a different name and/or type(s), to avoid
 sowing confusion and ambiguity throughout the codebase for readers and
 compilers.

The meaning of +, -, *, /, % for built-in machine words has been the one
in modulo 2^n arithmetic since the early days when computers were first
invented.  This isn't going to change anytime soon in a systems
language.  It doesn't matter what you call them; if you don't like the
use of the symbols +, -, *, / for anything other than "standard
mathematical integers", make your own language and call them something
else. But they are the foundational hardware-supported operations upon
which more complex abstractions are built; without them, you wouldn't
even be capable of arithmetic in the first place.

It's unrealistic to impose pure mathematical definitions on
limited-precision hardware numbers.  Sooner or later, any programmer
must come to grips with what's actually implemented in hardware, not
what he imagines some ideal utopian hardware would implement.  It's like
people complaining that IEEE floats are "buggy" or otherwise behave in
strange ways.  That's because they're NOT mathematical real numbers.
But they *are* a useful approximation of mathematical real numbers -- if
used correctly.  That requires learning to work with what's implemented
in the hardware rather than imposing mathematical ideals on an
abstraction that requires laborious (i.e., inefficient) translations to
fit the ugly hardware reality.

If you don't like the "oddness" of hardware-implemented types, there's
always the option of using std.bigint, or software like Mathematica or
similar that frees you from needing to worry about the ugly realities of
the hardware. Just don't expect the same kind of performance you will
get by using the hardware types directly.


 All of the integer behavior that people complain about violates this
 in some way: wrapping overflow, incorrect signed-unsigned comparisons,
 confusing/inconsistent implicit conversion rules, undefined behavior
 of various more obscure operations for certain inputs, etc.
 
 Mathematical integers are a more familiar, simpler, easier to reason
 about abstraction. When we use this abstraction, we can draw upon our
 understanding and intuition from our school days, use common
 mathematical laws and formulas with confidence, etc. Of course the
 behavior of the computer cannot fully match this infinite abstraction,
 but it could at least tell us when it is unable to do what was asked
 of it, instead of just silently doing something else.

It's easy to invent idealized abstractions that are easy to reason
about, but which require unnatural contortions to implement efficiently
in hardware.  A programming language like D that claims to be a systems
programming language needs to be able to program the hardware directly,
not to impose some ideal abstractions that do not translate nicely to
hardware and that therefore require a lot of complexity on the part of
the compiler to implement, and on top of that incurs poor runtime
performance.

To quote Knuth:

	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird. -- D.
	Knuth

Again, if you expect mathematical integers, use std.bigint. Or MathCAD
or similar. The integral types defined in D are raw hardware types of
fixed bit length -- which by definition operate according to modulo 2^n
arithmetic. The "peculiarities" of the hardware types are inevitable,
and I seriously doubt this is going to change anytime in the foreseeable
future.  By using `int` instead of `BigInt`, the programmer has already
implicitly accepted the "weird" hardware behaviour, and must be prepared
to deal with the consequences.  Just as when you use `float` or `double`
you already signed up for IEEE semantics, like it or not. (I don't, but
I also recognize that it's unrealistic to expect the hardware type to
match up 100% with the mathematical ideal.) If you don't like that, use
one of the real arithmetic libraries out there that let you work with
"true" mathematical reals that aren't subject to the quirks of IEEE
floating-point numbers. Just don't expect anything that will be
competitive performance-wise.

Like I said, the only real flaw here is the choice of the name `int` for
a hardware type that's clearly NOT an unbounded mathemetical integer.
It's too late to rename it now, but basically it should be thought of as
`intMod32bit` rather than `integerInTheMathematicalSense`. Once you
mentally translate `int` into "32-bit 2's-complement binary word in a
hardware register", everything else naturally follows.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying

Mar 29 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 5:02 PM, H. S. Teoh wrote:
 Like I said, the only real flaw here is the choice of the name `int` for
 a hardware type that's clearly NOT an unbounded mathemetical integer.

You're right. It's not an integer, it's an int :-)

Besides, nobody is going to want to type `intMod32bit` every time they declare
a 
variable. Heck, Rust chose `int32`, but that is of value only in the first 30 
seconds of learning Rust, and will be cursed forever after.

Mar 29 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 30 March 2021 at 00:02:54 UTC, H. S. Teoh wrote:
 If you want mathematical integers, you should be using 
 std.bigint or something similar instead.


 Otherwise, the algorithm is simple:
 
     if(floor(mathResult) <= codeResult && codeResult <= 
 ceil(mathResult))
         return codeResult;
     else
         signalErrorSomehow();

 Implementing such a scheme would introduce so much overhead 
 that it would render the `int` type essentially useless for 
 systems programming. Or for any application where performance 
 is important, for that matter.

You have a wildly exaggerated sense of the runtime performance 
cost of doing things the way I advocate if you think it is 
anywhere close to bigint.

My proposal (grossly oversimplified) is mostly just to check the 
built-in CPU overflow flags once in a while. I've actually tested 
this, and even with a library solution the overhead is low in 
most realistic scenarios, if the inliner and optimizer are 
effective. A language solution could do even better, I'm sure.

Mar 29 2021

Max Samukha <maxsamukha gmail.com> writes:

On Tuesday, 30 March 2021 at 00:02:54 UTC, H. S. Teoh wrote:

 Just as when you use `float` or `double` you already signed up 
 for IEEE semantics, like it or not. (I don't, but I also 
 recognize that it's unrealistic to expect the hardware type to 
 match up 100% with the mathematical ideal.) If you don't like 
 that, use one of the real arithmetic libraries out there that 
 let you work with "true" mathematical reals that aren't subject 
 to the quirks of IEEE floating-point numbers. Just don't expect 
 anything that will be competitive performance-wise.

I seems you are arguing against the way D broke compile time 
floats and doubles. )

Mar 29 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 10:53 PM, Max Samukha wrote:
 On Tuesday, 30 March 2021 at 00:02:54 UTC, H. S. Teoh wrote:
 
 Just as when you use `float` or `double` you already signed up for IEEE 
 semantics, like it or not. (I don't, but I also recognize that it's 
 unrealistic to expect the hardware type to match up 100% with the mathematical 
 ideal.) If you don't like that, use one of the real arithmetic libraries out 
 there that let you work with "true" mathematical reals that aren't subject to 
 the quirks of IEEE floating-point numbers. Just don't expect anything that 
 will be competitive performance-wise.

 
 I seems you are arguing against the way D broke compile time floats and
doubles. )

Compile-time isn't a run-time performance issue.

Mar 29 2021

Max Haughton <maxhaton gmail.com> writes:

On Tuesday, 30 March 2021 at 06:43:04 UTC, Walter Bright wrote:
 On 3/29/2021 10:53 PM, Max Samukha wrote:
 On Tuesday, 30 March 2021 at 00:02:54 UTC, H. S. Teoh wrote:
 
 Just as when you use `float` or `double` you already signed 
 up for IEEE semantics, like it or not. (I don't, but I also 
 recognize that it's unrealistic to expect the hardware type 
 to match up 100% with the mathematical ideal.) If you don't 
 like that, use one of the real arithmetic libraries out there 
 that let you work with "true" mathematical reals that aren't 
 subject to the quirks of IEEE floating-point numbers. Just 
 don't expect anything that will be competitive 
 performance-wise.

 
 I seems you are arguing against the way D broke compile time 
 floats and doubles. )

 Compile-time isn't a run-time performance issue.

On the subject of run-time performance, checkedint can also do 
things like Saturation arithmetic, which can be accelerated using 
increasingly common native instructions (e.g. AVX on Intel, AMD, 
and presumably Via also). I have done some tests and found that 
these are not currently used. ARM also has saturating 
instructions but I haven't done any tests.

Due to AVX being a SIMD instruction set there is a tradeoff to 
using them for scalar operations, however for loops the 
proposition seems attractive. The calculus to do this seems 
non-trivial for the backend however.

(AVX instructions are also quite big so there is a the usual I$ 
hit here too).

Mar 30 2021

Bruce Carneal <bcarneal gmail.com> writes:

On Tuesday, 30 March 2021 at 08:48:04 UTC, Max Haughton wrote:
 On Tuesday, 30 March 2021 at 06:43:04 UTC, Walter Bright wrote:
 On 3/29/2021 10:53 PM, Max Samukha wrote:
 On Tuesday, 30 March 2021 at 00:02:54 UTC, H. S. Teoh wrote:
 
 




[...]
 On the subject of run-time performance, checkedint can also do 
 things like Saturation arithmetic, which can be accelerated 
 using increasingly common native instructions (e.g. AVX on 
 Intel, AMD, and presumably Via also).

[...]
 (AVX instructions are also quite big so there is a the usual I$ 
 hit here too).

Some micro-architectures employ an L0/uOp cache, which can 
significantly alter the I$ performance calculus within loops.  To 
confidently identify an I$ performance bottleneck I think you'd 
need to use perf analysis tools. IIRC Max recommended this at 
Beerconf.

Side note: the checkedint code sure looks nice.  It's a very 
readable example of the leverage D affords.

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 6:33 AM, Bruce Carneal wrote:
 Side note: the checkedint code sure looks nice.  It's a very readable example
of 
 the leverage D affords.

Yes, that's also why I want it to have more visibility by being in Phobos.

Mar 30 2021

Max Samukha <maxsamukha gmail.com> writes:

On Tuesday, 30 March 2021 at 06:43:04 UTC, Walter Bright wrote:

 Compile-time isn't a run-time performance issue.

Performance is irrelevant to the fact that D frivolously violates 
basic assumptions about float/double at compile-time.

Mar 31 2021

Max Haughton <maxhaton gmail.com> writes:

On Wednesday, 31 March 2021 at 11:18:05 UTC, Max Samukha wrote:
 On Tuesday, 30 March 2021 at 06:43:04 UTC, Walter Bright wrote:

 Compile-time isn't a run-time performance issue.

 Performance is irrelevant to the fact that D frivolously 
 violates basic assumptions about float/double at compile-time.

Like?

Mar 31 2021

Timon Gehr <timon.gehr gmx.ch> writes:

On 31.03.21 13:18, Max Samukha wrote:
 On Tuesday, 30 March 2021 at 06:43:04 UTC, Walter Bright wrote:
 
 Compile-time isn't a run-time performance issue.

 
 Performance is irrelevant to the fact that D frivolously violates basic 
 assumptions about float/double at compile-time.

Not just at compile time, but it's less noticeable at runtime because 
compilers usually choose to do the right thing anyway.

Apr 01 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 3:47 PM, tsbockman wrote:
 On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
 It isn't even clear what the behavior on overflows should be. Error? 
 Wraparound? Saturation?

 
 It only seems unclear because you have accepted the idea that computer code 
 "integer" operations may differ from mathematical integer operations in 
 arbitrary ways.

Programmers need to accept that computer math is different in arbitrary ways. 
Not accepting it means a lifetime of frustration, because it cannot be the same.

 Otherwise, the algorithm is simple:
 
     if(floor(mathResult) <= codeResult && codeResult <= ceil(mathResult))
         return codeResult;
     else
         signalErrorSomehow();

Some of the SIMD arithmetic instructions use saturation arithmetic. It is 
definitely a thing, and Intel found it profitable to add hardware support for
it.


 Standard mathematical integer addition does not wrap around or saturate. When 
 someone really wants an operation that wraps around or saturates (not just for 
 speed's sake), then that is a different operation and should use a different 
 name and/or type(s), to avoid sowing confusion and ambiguity throughout the 
 codebase for readers and compilers.

That's what std.experimental.checkedint does.


 All of the integer behavior that people complain about violates this in some 
 way: wrapping overflow, incorrect signed-unsigned comparisons, 
 confusing/inconsistent implicit conversion rules,

The integral promotion rules have been standard practice for 40 years. It takes 
two sentences to describe them accurately. Having code that looks like C but 
behaves differently will be *worse*.


 undefined behavior of various 
 more obscure operations for certain inputs, etc.

Offhand, I can't think of any.


 Mathematical integers are a more familiar, simpler, easier to reason about 
 abstraction. When we use this abstraction, we can draw upon our understanding 
 and intuition from our school days, use common mathematical laws and formulas 
 with confidence, etc. Of course the behavior of the computer cannot fully
match 
 this infinite abstraction, but it could at least tell us when it is unable to
do 
 what was asked of it, instead of just silently doing something else.

These things all come at a cost. The cost is higher than the benefit.

Having D generate overflow checks on all adds and multiples will immediately 
make D uncompetitive with C, C++, Rust, Zig, Nim, etc.

Mar 29 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 30 March 2021 at 00:33:13 UTC, Walter Bright wrote:
 Having D generate overflow checks on all adds and multiples 
 will immediately make D uncompetitive with C, C++, Rust, Zig, 
 Nim, etc.

As someone else shared earlier in this thread, Zig already 
handles this in pretty much exactly the way I argue for:
     https://ziglang.org/documentation/master/#Integer-Overflow

Mar 29 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/29/2021 6:29 PM, tsbockman wrote:
 On Tuesday, 30 March 2021 at 00:33:13 UTC, Walter Bright wrote:
 Having D generate overflow checks on all adds and multiples will immediately 
 make D uncompetitive with C, C++, Rust, Zig, Nim, etc.

 
 As someone else shared earlier in this thread, Zig already handles this in 
 pretty much exactly the way I argue for:
     https://ziglang.org/documentation/master/#Integer-Overflow

I amend my statement to "immediately make D as uncompetitive as Zig is"

Note that Zig has a very different idea of integers than D does. It has 
arbitrary bit width integers, up to 65535. This seems odd, as what are you
going 
to do with a 6 bit integer? There aren't machine instructions to support it. 
It'd be better off with a ranged integer, say:

    i : int 0..64

Mar 29 2021

Jacob Carlborg <doob me.com> writes:

On Tuesday, 30 March 2021 at 03:31:05 UTC, Walter Bright wrote:

 Note that Zig has a very different idea of integers than D 
 does. It has arbitrary bit width integers, up to 65535. This 
 seems odd, as what are you going to do with a 6 bit integer? 
 There aren't machine instructions to support it. It'd be better 
 off with a ranged integer, say:

    i : int 0..64

The question is then, does that mean that Zig has over 131070 
keywords (65535  for signed and unsigned each)? :D. Or does it 
reserve anything that starts with i/u followed by numbers? Kind 
of like how D reveres identifiers starting with two underscores.

--
/Jacob Carlborg

Mar 30 2021

Rumbu <rumbu rumbu.ro> writes:

On Tuesday, 30 March 2021 at 15:28:04 UTC, Jacob Carlborg wrote:
 On Tuesday, 30 March 2021 at 03:31:05 UTC, Walter Bright wrote:

 Note that Zig has a very different idea of integers than D 
 does. It has arbitrary bit width integers, up to 65535. This 
 seems odd, as what are you going to do with a 6 bit integer? 
 There aren't machine instructions to support it. It'd be 
 better off with a ranged integer, say:

    i : int 0..64

 The question is then, does that mean that Zig has over 131070 
 keywords (65535  for signed and unsigned each)? :D. Or does it 
 reserve anything that starts with i/u followed by numbers? Kind 
 of like how D reveres identifiers starting with two underscores.

 --
 /Jacob Carlborg

In Zig, integer type names are not considered keywords, e.g you 
can use i7 as a variable name or i666 as a function name.

But you cannot define new types with this pattern, you get an 
error message stating that "Type 'i?' is shadowing primitive type 
'i?'".

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 30 March 2021 at 03:31:05 UTC, Walter Bright wrote:
 On 3/29/2021 6:29 PM, tsbockman wrote:
 On Tuesday, 30 March 2021 at 00:33:13 UTC, Walter Bright wrote:
 Having D generate overflow checks on all adds and multiples 
 will immediately make D uncompetitive with C, C++, Rust, Zig, 
 Nim, etc.

 
 As someone else shared earlier in this thread, Zig already 
 handles this in pretty much exactly the way I argue for:
     https://ziglang.org/documentation/master/#Integer-Overflow

 I amend my statement to "immediately make D as uncompetitive as 
 Zig is"

So you're now dismissing Zig as slow because its feature set 
surprised you? No real-world data is necessary? No need to 
understand any of Zig's relevant optimizations or options?

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 10:09 AM, tsbockman wrote:
 So you're now dismissing Zig as slow because its feature set surprised you?

Because it surprised me? No. Because if someone had figured out a way to do 
overflow checks for no runtime costs, it would be in every language. I know
Rust 
tried pretty hard to do it.


 No real-world data is necessary? No need to understand any of Zig's relevant 
 optimizations or options?

I don't have to test a brick to assume it won't fly. But I could be wrong, 
definitely. If you can prove me wrong in my presumption, I'm listening.

P.S. Yes, I know anything will "fly" if you attach enough horsepower to it. But 
there's a reason airplanes don't look like bricks.

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Tuesday, 30 March 2021 at 17:53:37 UTC, Walter Bright wrote:
 On 3/30/2021 10:09 AM, tsbockman wrote:
 So you're now dismissing Zig as slow because its feature set 
 surprised you?

 Because it surprised me? No. Because if someone had figured out 
 a way to do overflow checks for no runtime costs, it would be 
 in every language. I know Rust tried pretty hard to do it.

Zero runtime cost is not a reasonable standard unless the feature 
is completely worthless and it cannot be turned off.

 No real-world data is necessary? No need to understand any of 
 Zig's relevant optimizations or options?

 I don't have to test a brick to assume it won't fly. But I 
 could be wrong, definitely. If you can prove me wrong in my 
 presumption, I'm listening.

Since I have already been criticized for the use of 
micro-benchmarks, I assume that only data from complete practical 
applications will satisfy.

Unfortunately, the idiomatic C, C++, D, and Rust source code all 
omit the information required to perform such tests. Simply 
flipping compiler switches (the -ftrapv and -fwrapv flags in gcc 
Andrei mentioned earlier) won't work, because most high 
performance code contains some deliberate and correct examples of 
wrapping overflow, signed-unsigned reinterpretation, etc.

Idiomatic Zig code (probably Ada, too) does contain this 
information. But, the selection of "real world" open source Zig 
code available for testing is limited right now, since Zig hasn't 
stabilized the language or the standard library yet.

The best test subject I have found, compiled, and run 
successfully is this:
     https://github.com/Vexu/arocc
It's an incomplete C compiler: "Right now preprocessing and 
parsing is mostly done but anything beyond that is missing." I 
believe compilation is a fairly integer-intensive workload, so 
the results should be meaningful.

To test, I took the C source code of gzip and duplicated its 
contents many times until I got the arocc wall time up to about 1 
second. (The final input file is 37.5 MiB.) arocc outputs a long 
stream of error messages to stderr, whose contents aren't 
important for our purposes.

In order to minimize the time consumed by I/O, I run each test 
several times in a row and ignore the early runs, to ensure that 
the input file is cached in RAM by the OS, and pipe the output of 
arocc (both stdout and stderr) to /dev/null.

Results with -O ReleaseSafe (optimizations on, with checked 
integer arithmetic, bounds checks, null checks, etc.):
     Binary size: 2.0 MiB
     Wall clock time: 1.31s
     System time: 0.71s
     User time: 0.60s
     CPU usage: 99% of a single core

Results with -O ReleaseFast (optimizations on, with safety checks 
off):
     Binary size: 2.3 MiB
     Wall clock time: 1.15s
     System time: 0.68s
     User time: 0.46s
     CPU usage: 99% of a single core

So, in this particular task ReleaseSafe (which checks for a lot 
of other things, not just integer overflow) takes 14% longer than 
ReleaseFast. If you only care about user time, that is 48% longer.

Last time I checked, these numbers are similar to the performance 
difference between optimized builds by DMD and LDC/GDC. They are 
also similar to the performance differences within related 

benchmarks like:
     
https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/cpp.html

Note also that with Zig's approach, paying the modest performance 
penalty for the various safety checks is *completely optional* in 
release builds (just like D's bounds checking). Even for 
applications where that final binary order of magnitude of speed 
is considered essential in production, Zig's approach still leads 
to clearer, easier to debug code.

So, unless DMD (or C itself!) is "a brick" that "won't fly", your 
claim that this is something that a high performance systems 
programming language just cannot do is not grounded in reality.

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 4:01 PM, tsbockman wrote:
 So, in this particular task ReleaseSafe (which checks for a lot of other
things, 
 not just integer overflow) takes 14% longer than ReleaseFast. If you only care 
 about user time, that is 48% longer.

Thank you for running benchmarks.

14% is a big deal.

 Last time I checked, these numbers are similar to the performance difference 
 between optimized builds by DMD and LDC/GDC. They are also similar to the 

Ada/C 
 in language comparison benchmarks like:
 https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/cpp.html
 
 Note also that with Zig's approach, paying the modest performance penalty for 
 the various safety checks is *completely optional* in release builds (just
like 
 D's bounds checking). Even for applications where that final binary order of 
 magnitude of speed is considered essential in production, Zig's approach still 
 leads to clearer, easier to debug code.

The problem with turning it off for production code is that the overflows tend 
to be rare and not encountered during testing. When you need it, it is disabled.

Essentially, turning it off for release code is an admission that it is too 
expensive.

Note that D's bounds checking is *not* turned off in release mode. It has a 
separate switch to turn that off, and I recommend only using it to see how much 
performance it'll cost for a particular application.

 So, unless DMD (or C itself!) is "a brick" that "won't fly", your claim that 
 this is something that a high performance systems programming language just 
 cannot do is not grounded in reality.

I didn't say cannot. I said it would make it uncompetitive.

Overflow checking would be nice to have. But it is not worth the cost for D. I 
also claim that D code is much less likely to suffer from overflows because of 
the implicit integer promotion rules. Adding two shorts is never going to 
overflow, for example, and D won't let you naively assign the resulting int
back 
to a short.

One could legitimately claim that D *does* have a form of integer overflow 
protection in the form of Value Range Propagation (VRP). Best of all, VRP comes 
for free at zero runtime cost!

P.S. I know you know this, due to your good work on VRP :-) but I mention it
for 
the other readers.

P.P.S. So why is this claim not made for C? Because:

     short s, t, u;
     s = t + u;

compiles without complaint in C, but will fail to compile in D. C doesn't have
VRP.

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 01:43:50 UTC, Walter Bright wrote:
 Thank you for running benchmarks.

 14% is a big deal.

Note that I deliberately chose an integer-intensive workload, and 
artificially sped up the I/O to highlight the performance cost. 
For most real-world applications, the cost is actually *much* 
lower. The paper Andrei linked earlier has a couple of examples:

     Checked Apache httpd is less than 0.1% slower than unchecked.
     Checked OpenSSH file copy is about 7% slower than unchecked.

https://dl.acm.org/doi/abs/10.1145/2743019

 The problem with turning it off for production code is that the 
 overflows tend to be rare and not encountered during testing. 
 When you need it, it is disabled.

Only if you choose to disable it. Just because you think it's not 
worth the cost doesn't mean everyone, or even most people, would 
turn it off.

 Essentially, turning it off for release code is an admission 
 that it is too expensive.

It's an admission that it's too expensive *for some 
applications*, not in general. D's garbage collector is too 
expensive for some applications, but that doesn't mean it should 
be removed from the language, nor even disabled by default.

 Note that D's bounds checking is *not* turned off in release 
 mode. It has a separate switch to turn that off, and I 
 recommend only using it to see how much performance it'll cost 
 for a particular application.

That's exactly how checked arithmetic, bounds checking, etc. 
works in Zig. What do you think the difference is, other than 
your arbitrary assertion that checked arithmetic costs more than 
it's worth?

 I said it would make it uncompetitive.

The mean performance difference between C and C++ in the 
(admittedly casual) comparative benchmarks I cited is 36%. Is C 
uncompetitive with C++? What definition of "uncompetitive" are 
you using?

 Overflow checking would be nice to have. But it is not worth 
 the cost for D. I also claim that D code is much less likely to 
 suffer from overflows...

Yes, D is better than C in this respect (among many others).

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/21 11:07 PM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 01:43:50 UTC, Walter Bright wrote:
 Thank you for running benchmarks.

 14% is a big deal.

 
 Note that I deliberately chose an integer-intensive workload, and 
 artificially sped up the I/O to highlight the performance cost.

Idea: build dmd with -ftrapv (which is supported, I think, by gdc and 
ldc) and compare performance. That would be truly interesting.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/21 7:01 PM, tsbockman wrote:
 Simply flipping compiler switches (the -ftrapv and -fwrapv flags in gcc 
 Andrei mentioned earlier) won't work, because most high performance code 
 contains some deliberate and correct examples of wrapping overflow, 
 signed-unsigned reinterpretation, etc.
 
 Idiomatic Zig code (probably Ada, too) does contain this information. 
 But, the selection of "real world" open source Zig code available for 
 testing is limited right now, since Zig hasn't stabilized the language 
 or the standard library yet.

That's awfully close to "No true Scotsman".

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 03:32:40 UTC, Andrei Alexandrescu 
wrote:
 On 3/30/21 7:01 PM, tsbockman wrote:
 Simply flipping compiler switches (the -ftrapv and -fwrapv 
 flags in gcc Andrei mentioned earlier) won't work, because 
 most high performance code contains some deliberate and 
 correct examples of wrapping overflow, signed-unsigned 
 reinterpretation, etc.
 
 Idiomatic Zig code (probably Ada, too) does contain this 
 information. But, the selection of "real world" open source 
 Zig code available for testing is limited right now, since Zig 
 hasn't stabilized the language or the standard library yet.

 That's awfully close to "No true Scotsman".

Just tossing out names of fallacies isn't really very helpful if 
you don't explain why you think it may apply here.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:32 AM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 03:32:40 UTC, Andrei Alexandrescu wrote:
 On 3/30/21 7:01 PM, tsbockman wrote:
 Simply flipping compiler switches (the -ftrapv and -fwrapv flags in 
 gcc Andrei mentioned earlier) won't work, because most high 
 performance code contains some deliberate and correct examples of 
 wrapping overflow, signed-unsigned reinterpretation, etc.

 Idiomatic Zig code (probably Ada, too) does contain this information. 
 But, the selection of "real world" open source Zig code available for 
 testing is limited right now, since Zig hasn't stabilized the 
 language or the standard library yet.

 That's awfully close to "No true Scotsman".

 
 Just tossing out names of fallacies isn't really very helpful if you 
 don't explain why you think it may apply here.

I thought it's fairly clear - the claim is non-falsifiable: if code is 
faster without checks, it is deemed so on account of tricks. Code 
without checks could benefit of other, better tricks, but their absence 
is explained by the small size of the available corpus.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:47 AM, Andrei Alexandrescu wrote:
 On 3/31/21 12:32 AM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 03:32:40 UTC, Andrei Alexandrescu wrote:
 On 3/30/21 7:01 PM, tsbockman wrote:
 Simply flipping compiler switches (the -ftrapv and -fwrapv flags in 
 gcc Andrei mentioned earlier) won't work, because most high 
 performance code contains some deliberate and correct examples of 
 wrapping overflow, signed-unsigned reinterpretation, etc.

 Idiomatic Zig code (probably Ada, too) does contain this 
 information. But, the selection of "real world" open source Zig code 
 available for testing is limited right now, since Zig hasn't 
 stabilized the language or the standard library yet.

 That's awfully close to "No true Scotsman".

 Just tossing out names of fallacies isn't really very helpful if you 
 don't explain why you think it may apply here.

 
 I thought it's fairly clear - the claim is non-falsifiable: if code is 
 faster without checks, it is deemed so on account of tricks. Code 
 without checks could benefit of other, better tricks, but their absence 
 is explained by the small size of the available corpus.


s/Code without checks could benefit of other/Code with checks could 
benefit of other/

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 04:49:01 UTC, Andrei Alexandrescu 
wrote:
 On 3/31/21 12:47 AM, Andrei Alexandrescu wrote:
 On 3/31/21 12:32 AM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 03:32:40 UTC, Andrei 
 Alexandrescu wrote:
 On 3/30/21 7:01 PM, tsbockman wrote:
 Simply flipping compiler switches (the -ftrapv and -fwrapv 
 flags in gcc Andrei mentioned earlier) won't work, because 
 most high performance code contains some deliberate and 
 correct examples of wrapping overflow, signed-unsigned 
 reinterpretation, etc.

 Idiomatic Zig code (probably Ada, too) does contain this 
 information. But, the selection of "real world" open source 
 Zig code available for testing is limited right now, since 
 Zig hasn't stabilized the language or the standard library 
 yet.

 That's awfully close to "No true Scotsman".

 Just tossing out names of fallacies isn't really very helpful 
 if you don't explain why you think it may apply here.

 I thought it's fairly clear


Thank you for explaining anyway.

 - the claim is non-falsifiable: if code is faster without
 checks, it is deemed so on account of tricks.


I've never disputed at any point that unchecked code is, by 
nature, almost always faster than checked code - albeit often not 
by much. I haven't attributed unchecked code's speed advantage to 
"tricks" anywhere.

 Code without checks could benefit of other, better
 tricks, but their absence is explained by the small size of the
 available corpus.

 s/Code without checks could benefit of other/Code with checks 
 could benefit of other/

While I think it is true that "better tricks" can narrow the 
performance gap between checked and unchecked code, that is not 
at all what I was talking about at all in the paragraphs you 
labeled "No true Scotsman".

Consider a C++ program similar to the following D program:

/////////////////////////////////////
module app;

import std.stdio : writeln, readln;
import std.conv : parse;

N randLCG(N)()  safe
     if(is(N == int) || is(N == uint))
{
     static N state = N(211210973);
     // "Numerical Recipes" linear congruential generator:
     return (state = N(1664525) * state + N(1013904223)); // can 
and should wrap
}

double testDivisor(N, N divisor)(const(ulong) trials)  safe
     if(is(N == int) || is(N == uint))
{
     N count = 0;
     foreach(n; 0 .. trials)
         count += (randLCG!N() % divisor) == N(0); // can, but 
should *not* wrap
     return count / real(trials);
}

void main() {
     string input = readln();
     const trials = parse!ulong(input);
     writeln(testDivisor!( int, 3)(trials));
     writeln(testDivisor!(uint, 3)(trials));
}
/////////////////////////////////////

randLCG!( int, 3) requires -fwrapv and NOT -ftrapv to work as 
intended.
randLCG!(uint, 3) works correctly no matter what.
testDivisor!( int, 3) requires -ftrapv and NOT -fwrapv to detect 
unintended overflows.
testDivisor!(uint, 3) is always vulnerable to unintended 
overflow, with or without -ftrapv.

So, neither -ftrapv nor -fwrapv causes an idiomatic C++ program 
detect unintended overflows without false positives in the 
general case. The compiler simply doesn't have enough information 
available to do so, regardless of how much performance we are 
willing to sacrifice. Instead, the source code of a C++ program 
must first be modified by a real human being to make it 
compatible with either -ftrapv or -fwrapv (which are mutually 
exclusive).

The paper you linked earlier mentions this problem: "Finally 
often integer overflows are known to be intentional or the 
programmer has investigated it and determined it to be 
acceptable. To address these use cases while still being useful 
in reporting undesired integer overflows, a whitelist 
functionality was introduced to enable users to specify certain 
files or functions that should not be checked.
...
Second, our methodology for distinguishing intentional from 
unintentional uses of wraparound is manual and subjective. The 
manual effort required meant that we could only study a subset of 
the errors..."
(See sections 5.3 and 6.1 of 
https://dl.acm.org/doi/abs/10.1145/2743019)

Idiomatic Zig code already contains the information which the 
researchers on that paper had to manually insert for all the 
C/C++ code they tested. That is why my tests were limited to Zig, 
because I don't have the time or motivation to go and determine 
whether each and every potential overflow in GCC or Firefox or 
whatever is intentional, just so that I can benchmark them with 
-ftrapv enabled.

Mar 31 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/21 1:09 PM, tsbockman wrote:
 On Tuesday, 30 March 2021 at 03:31:05 UTC, Walter Bright wrote:
 On 3/29/2021 6:29 PM, tsbockman wrote:
 On Tuesday, 30 March 2021 at 00:33:13 UTC, Walter Bright wrote:
 Having D generate overflow checks on all adds and multiples will 
 immediately make D uncompetitive with C, C++, Rust, Zig, Nim, etc.

 As someone else shared earlier in this thread, Zig already handles 
 this in pretty much exactly the way I argue for:
     https://ziglang.org/documentation/master/#Integer-Overflow

 I amend my statement to "immediately make D as uncompetitive as Zig is"

 
 So you're now dismissing Zig as slow because its feature set surprised 
 you? No real-world data is necessary? No need to understand any of Zig's 
 relevant optimizations or options?

Instead of passing the burden of proof back and forth, some evidence 
would be welcome. I know nothing about Zig so e.g. I couldn't tell how 
accurate its claims are: https://news.ycombinator.com/item?id=21117669

FWIW I toyed with this but don't know what optimization flags zig takes: 
https://godbolt.org/z/vKds1c8WY

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu 
wrote:
 FWIW I toyed with this but don't know what optimization flags 
 zig takes: https://godbolt.org/z/vKds1c8WY

Typing --help in the flags box answers that question :) And the 
answer is "-O ReleaseFast":
https://godbolt.org/z/1WK6W7TM9

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/30/21 11:40 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu wrote:
 FWIW I toyed with this but don't know what optimization flags zig 
 takes: https://godbolt.org/z/vKds1c8WY

 
 Typing --help in the flags box answers that question :) And the answer 
 is "-O ReleaseFast":
 https://godbolt.org/z/1WK6W7TM9

Cool, thanks. I was looking for "the fastest code that still has the 
checks", how to get that?

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:01 AM, Andrei Alexandrescu wrote:
 On 3/30/21 11:40 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu wrote:
 FWIW I toyed with this but don't know what optimization flags zig 
 takes: https://godbolt.org/z/vKds1c8WY

 Typing --help in the flags box answers that question :) And the answer 
 is "-O ReleaseFast":
 https://godbolt.org/z/1WK6W7TM9

 
 Cool, thanks. I was looking for "the fastest code that still has the 
 checks", how to get that?

I guess that'd be "-O ReleaseSafe":

https://godbolt.org/z/cYcscf1W5

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 04:01:48 UTC, Andrei Alexandrescu 
wrote:
 On 3/30/21 11:40 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei 
 Alexandrescu wrote:
 FWIW I toyed with this but don't know what optimization flags 
 zig takes: https://godbolt.org/z/vKds1c8WY

 
 Typing --help in the flags box answers that question :) And 
 the answer is "-O ReleaseFast":
 https://godbolt.org/z/1WK6W7TM9

 Cool, thanks. I was looking for "the fastest code that still 
 has the checks", how to get that?

Right, sorry.

--help says:

     ReleaseFast             Optimizations on, safety off
     ReleaseSafe             Optimizations on, safety on

So, maybe that.

The ReleaseSafe code looks pretty good, it generates a "jo" 
instruction: https://godbolt.org/z/cYcscf1W5

Who knows what it actually looks like in CPU microcode, though :)

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:04 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 04:01:48 UTC, Andrei Alexandrescu wrote:
 On 3/30/21 11:40 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu wrote:
 FWIW I toyed with this but don't know what optimization flags zig 
 takes: https://godbolt.org/z/vKds1c8WY

 Typing --help in the flags box answers that question :) And the 
 answer is "-O ReleaseFast":
 https://godbolt.org/z/1WK6W7TM9

 Cool, thanks. I was looking for "the fastest code that still has the 
 checks", how to get that?

 
 Right, sorry.
 
 --help says:
 
      ReleaseFast             Optimizations on, safety off
      ReleaseSafe             Optimizations on, safety on
 
 So, maybe that.
 
 The ReleaseSafe code looks pretty good, it generates a "jo" instruction: 
 https://godbolt.org/z/cYcscf1W5
 
 Who knows what it actually looks like in CPU microcode, though :)

Not much to write home about. The jumps scale linearly with the number 
of primitive operations:

https://godbolt.org/z/r3sj1T4hc

That's not going to be a speed demon.

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 04:08:02 UTC, Andrei Alexandrescu 
wrote:
 Not much to write home about. The jumps scale linearly with the 
 number of primitive operations:

 https://godbolt.org/z/r3sj1T4hc

 That's not going to be a speed demon.

Ideally, in release builds the compiler could loosen up the 
precision of the traps a bit and combine the overflow checks for 
short sequences of side-effect free operations.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:37 AM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 04:08:02 UTC, Andrei Alexandrescu wrote:
 Not much to write home about. The jumps scale linearly with the number 
 of primitive operations:

 https://godbolt.org/z/r3sj1T4hc

 That's not going to be a speed demon.

 
 Ideally, in release builds the compiler could loosen up the precision of 
 the traps a bit and combine the overflow checks for short sequences of 
 side-effect free operations.

Yah, was hoping I'd find something like that. Was disappointed. That 
makes their umbrella claim "Zig is faster than C" quite specious.

Mar 30 2021

Jacob Carlborg <doob me.com> writes:

On Wednesday, 31 March 2021 at 04:49:52 UTC, Andrei Alexandrescu 
wrote:

 That makes their umbrella claim "Zig is faster than C" quite 
 specious.

The reason, or one of the reasons, why Zig is/can be faster than 
C is that is uses different default optimization levels. For 
example, Zig will by default target your native CPU instead of 
some generic model. This allows to enable vectorization, SSE/AVX 
and so on.

--
/Jacob Carlborg

Mar 31 2021

Max Haughton <maxhaton gmail.com> writes:

On Wednesday, 31 March 2021 at 09:47:46 UTC, Jacob Carlborg wrote:
 On Wednesday, 31 March 2021 at 04:49:52 UTC, Andrei 
 Alexandrescu wrote:

 That makes their umbrella claim "Zig is faster than C" quite 
 specious.

 The reason, or one of the reasons, why Zig is/can be faster 
 than C is that is uses different default optimization levels. 
 For example, Zig will by default target your native CPU instead 
 of some generic model. This allows to enable vectorization, 
 SSE/AVX and so on.

 --
 /Jacob Carlborg

Specific Example? GCC and LLVM are both almost rabid when you 
turn the vectorizer on

Mar 31 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 7:46 AM, Max Haughton wrote:
 On Wednesday, 31 March 2021 at 09:47:46 UTC, Jacob Carlborg wrote:
 On Wednesday, 31 March 2021 at 04:49:52 UTC, Andrei Alexandrescu wrote:

 That makes their umbrella claim "Zig is faster than C" quite specious.

 The reason, or one of the reasons, why Zig is/can be faster than C is 
 that is uses different default optimization levels. For example, Zig 
 will by default target your native CPU instead of some generic model. 
 This allows to enable vectorization, SSE/AVX and so on.

 -- 
 /Jacob Carlborg

 
 Specific Example? GCC and LLVM are both almost rabid when you turn the 
 vectorizer on

Even if that's the case, "we choose to use by default different flags 
that make the code more specialized and therefore faster and less 
portable" can't be a serious basis of a language performance claim.

Mar 31 2021

Max Haughton <maxhaton gmail.com> writes:

On Wednesday, 31 March 2021 at 12:36:42 UTC, Andrei Alexandrescu 
wrote:
 On 3/31/21 7:46 AM, Max Haughton wrote:
 On Wednesday, 31 March 2021 at 09:47:46 UTC, Jacob Carlborg 
 wrote:
 On Wednesday, 31 March 2021 at 04:49:52 UTC, Andrei 
 Alexandrescu wrote:

 That makes their umbrella claim "Zig is faster than C" quite 
 specious.

 The reason, or one of the reasons, why Zig is/can be faster 
 than C is that is uses different default optimization levels. 
 For example, Zig will by default target your native CPU 
 instead of some generic model. This allows to enable 
 vectorization, SSE/AVX and so on.

 --
 /Jacob Carlborg

 
 Specific Example? GCC and LLVM are both almost rabid when you 
 turn the vectorizer on

 Even if that's the case, "we choose to use by default different 
 flags that make the code more specialized and therefore faster 
 and less portable" can't be a serious basis of a language 
 performance claim.

Intel C++ can be a little naughty with the fast math options, 
last time I checked, for example - gotta get those SPEC numbers!

I wonder if there is a way to leverage D's type system (or even 
extend it to allow) to allow a library solution that can hold 
information which the optimizer can use to elide these checks in 
most cases. It's probably possible already by just passing some 
kind of abstract interpretation like data structure as a template 
parameter, but this is not very ergonomic.

Standardizing some kind of `assume` semantics strikes me as a 
good long term hedge for D, even if doing static analysis and 
formal verification of D code is an unenviable task.

Mar 31 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/31/2021 7:36 AM, Max Haughton wrote:
 Intel C++ can be a little naughty with the fast math options, last time I 
 checked, for example - gotta get those SPEC numbers!

Benchmarks are always going to be unfair, but it's only reasonable to try and 
set the switches as close as practical so they are trying to accomplish the
same 
thing.

 Standardizing some kind of `assume` semantics strikes me as a good long term 
 hedge for D, even if doing static analysis and formal verification of D code
is 
 an unenviable task.

Static analysis has limits. For example, I complained to Vladimir that using 
hardcoded loop limits enabled optimizations not available to recommended 
programming practice of not using hardcoded limits.

Mar 31 2021

Jacob Carlborg <doob me.com> writes:

On 2021-03-31 13:46, Max Haughton wrote:

 Specific Example? GCC and LLVM are both almost rabid when you turn the 
 vectorizer on

No, that's why I said "can be". But what I meant is that just running 
the Zig compiler out of the box might produce better code than Clang 
because it uses different default optimizations. I mean that is a poor 
way of claiming Zig is faster than C because it's easy to add a couple 
of flags to Clang and it will probably be the same speed as Zig. They 
use the same backend anyway.

-- 
/Jacob Carlborg

Apr 01 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 9:08 PM, Andrei Alexandrescu wrote:
 Not much to write home about. The jumps scale linearly with the number of 
 primitive operations:
 
 https://godbolt.org/z/r3sj1T4hc
 
 That's not going to be a speed demon.

The ldc:
         mov     eax, edi
         imul    eax, eax
         add     eax, edi    *
         add     eax, 1      *
	ret

* should be:

	lea    eax,1[eax + edi]

Let's try dmd -O:

__D3lea6squareFiZi:
	mov	EDX,EAX
	imul	EAX,EAX
	lea	EAX,1[EAX][EDX]
	ret

Woo-hoo!

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:59 AM, Walter Bright wrote:
 On 3/30/2021 9:08 PM, Andrei Alexandrescu wrote:
 Not much to write home about. The jumps scale linearly with the number 
 of primitive operations:

 https://godbolt.org/z/r3sj1T4hc

 That's not going to be a speed demon.

 
 The ldc:
          mov     eax, edi
          imul    eax, eax
          add     eax, edi    *
          add     eax, 1      *
      ret
 
 * should be:
 
      lea    eax,1[eax + edi]
 
 Let's try dmd -O:
 
 __D3lea6squareFiZi:
      mov    EDX,EAX
      imul    EAX,EAX
      lea    EAX,1[EAX][EDX]
      ret
 
 Woo-hoo!

Yah, actually gdc uses lea as well: https://godbolt.org/z/Gb6416EKe

Mar 30 2021

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 31 March 2021 at 04:59:08 UTC, Walter Bright wrote:
 * should be:

 	lea    eax,1[eax + edi]

The lea is the exact same length as the sequence of moves, and 
may be harder to decode.  I fail to see how that's a win.

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 10:06 PM, Elronnd wrote:
 On Wednesday, 31 March 2021 at 04:59:08 UTC, Walter Bright wrote:
 * should be:

     lea    eax,1[eax + edi]

 
 The lea is the exact same length as the sequence of moves, and may be harder
to 
 decode.  I fail to see how that's a win.

It's a win because it uses the address decoder logic which is separate from the 
arithmetic logic unit. This enables it to be done in parallel with the ALU.

Although not relevant for this particular example, it also doesn't need another 
register for the intermediate value.

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright wrote:
 It's a win because it uses the address decoder logic which is 
 separate from the arithmetic logic unit. This enables it to be 
 done in parallel with the ALU.

Is this still true for modern CPUs?

 Although not relevant for this particular example, it also 
 doesn't need another register for the intermediate value.

Haven't CPUs used register renaming for a long time now? It's 
also pretty rare to see x86_64 code that uses all registers.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 1:30 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright wrote:
 It's a win because it uses the address decoder logic which is separate 
 from the arithmetic logic unit. This enables it to be done in parallel 
 with the ALU.

 
 Is this still true for modern CPUs?

Affirmative if you consider the Nehalem modern: 
https://en.wikipedia.org/wiki/Address_generation_unit

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 05:41:28 UTC, Andrei Alexandrescu 
wrote:
 Affirmative if you consider the Nehalem modern:

Um, that was released 13 years ago.

 https://en.wikipedia.org/wiki/Address_generation_unit

In the picture it still goes through the instruction decoder 
first, which means LEA and ADD/SHR might as well get decoded to 
the same microcode.

That's the thing about this whole ordeal, we don't know anything. 
The only thing we *can* do is benchmark. :)

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 1:46 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 05:41:28 UTC, Andrei Alexandrescu wrote:
 Affirmative if you consider the Nehalem modern:

 
 Um, that was released 13 years ago.

It carried over afaik to all subsequent Intel CPUs: 
https://hexus.net/tech/reviews/cpu/147440-intel-core-i9-11900k/. Sunny 
Cove actually adds one extra AGU.

 https://en.wikipedia.org/wiki/Address_generation_unit

 
 In the picture it still goes through the instruction decoder first, 
 which means LEA and ADD/SHR might as well get decoded to the same 
 microcode.

That's not the case. It's separate hardware.

 That's the thing about this whole ordeal, we don't know anything. The 
 only thing we *can* do is benchmark. :)

We can Read The Fine Manual.

Mar 30 2021

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 31 March 2021 at 05:54:43 UTC, Andrei Alexandrescu 
wrote:
 In the picture it still goes through the instruction decoder 
 first, which means LEA and ADD/SHR might as well get decoded 
 to the same microcode.

 That's not the case. It's separate hardware.

Less with the talking, more with the benchmarking!

If what you say is true, then a sequence of add interleaved with 
lea should be faster than the equivalent sequence, but with add 
replacing the lea.  Benchmark code is here 
https://files.catbox.moe/2zzrwe.tar; on my system, the 
performance is identical.

Mar 31 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 10:30 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright wrote:
 It's a win because it uses the address decoder logic which is separate from 
 the arithmetic logic unit. This enables it to be done in parallel with the ALU.

 
 Is this still true for modern CPUs?

See https://www.agner.org/optimize/optimizing_assembly.pdf page 135.


 Although not relevant for this particular example, it also doesn't need 
 another register for the intermediate value.

 Haven't CPUs used register renaming for a long time now? It's also pretty rare 
 to see x86_64 code that uses all registers.

If you use a register that needs to be saved on the stack, it's going to cost.

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 06:34:04 UTC, Walter Bright wrote:
 On 3/30/2021 10:30 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright 
 wrote:
 It's a win because it uses the address decoder logic which is 
 separate from the arithmetic logic unit. This enables it to 
 be done in parallel with the ALU.

 
 Is this still true for modern CPUs?

 See https://www.agner.org/optimize/optimizing_assembly.pdf page 
 135.

Thanks!

It also says that LEA may be slower than ADD on some CPUs.

I wrote a small benchmark using the assembler code from a few 
posts ago. It takes the same time on my AMD CPU, but the ADD is 
indeed slower than the LEA on the old Intel CPU on the server. :) 
Unfortunately I don't have access to a modern Intel CPU to test.

 Although not relevant for this particular example, it also 
 doesn't need another register for the intermediate value.

 Haven't CPUs used register renaming for a long time now? It's 
 also pretty rare to see x86_64 code that uses all registers.

 If you use a register that needs to be saved on the stack, it's 
 going to cost.

Sure, but why would you do that? If I'm reading the ABI spec 
correctly, almost all registers belong to the callee, and don't 
need to be saved/restored, and there's probably little reason to 
call a function in the middle of such a computation and therefore 
save the interim value on the stack.

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 11:54 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 06:34:04 UTC, Walter Bright wrote:
 On 3/30/2021 10:30 PM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 05:25:48 UTC, Walter Bright wrote:
 It's a win because it uses the address decoder logic which is separate from 
 the arithmetic logic unit. This enables it to be done in parallel with the ALU.

 Is this still true for modern CPUs?

 See https://www.agner.org/optimize/optimizing_assembly.pdf page 135.

 
 Thanks!
 
 It also says that LEA may be slower than ADD on some CPUs.

Slower than ADD, but not slower than multiple ADDs. DMD does not replace a mere 
ADD with LEA. If you also look at how LEA is used in the various examples of 
optimized code in the pdf, well, he uses it a lot.

 some CPUs

Code gen is generally targeted at generating code that works well on most
machines.

 If you use a register that needs to be saved on the stack, it's going to cost.

 Sure, but why would you do that?

To map as many locals into registers as possible.

 If I'm reading the ABI spec correctly, almost 
 all registers belong to the callee, and don't need to be saved/restored, and 
 there's probably little reason to call a function in the middle of such a 
 computation and therefore save the interim value on the stack.

All I can say is code gen is never that simple. There are just too many rules 
that conflict. The combinatorial explosion means some heuristics are relied on 
that produce better results most of the time. I suppose a good AI research 
project would be to train an AI to produce better overall patterns.

But, in general,

1. LEA is faster for more than one operation
2. using fewer registers is better
3. getting locals into registers is better
4. generating fewer instructions is better
5. generating shorter instructions is better
6. jumpless code is better

None of these are *always* true. And Intel/AMD change the rules slightly with 
every new processor.

As for overflow checks, I am not going to post benchmarks because everyone
picks 
at them. Every benchmark posted here by check proponents shows that overflow 
checks are slower. The Rust team apparently poured a lot of effort into
overflow 
checks, and ultimately failed, as in the checks are turned off in release code. 
I don't see much hope in replicating their efforts.

And, once again, I reiterate that D *does* have some overflow checks that are 
done at compile time (i.e. are free) in the form of integral promotions and 
Value Range Propagation, neither of which are part of Zig or Rust.

Mar 31 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 07:13:07 UTC, Walter Bright wrote:
 All I can say is code gen is never that simple. There are just 
 too many rules that conflict. The combinatorial explosion means 
 some heuristics are relied on that produce better results most 
 of the time. I suppose a good AI research project would be to 
 train an AI to produce better overall patterns.

 But, in general,

 1. LEA is faster for more than one operation
 2. using fewer registers is better
 3. getting locals into registers is better
 4. generating fewer instructions is better
 5. generating shorter instructions is better
 6. jumpless code is better

Thanks for the insight!

My personal perspective is that:

- Silicon will keep getting faster and cheaper with time

- A 7% or a 14% or even a +100% slowdown is relatively 
insignificant considering the overall march of progress - Moore's 
law, but also other factors such as the average size and 
complexity of programs, which will also keep increasing as people 
expect software to do more things, which will drown out such 
"one-time" slowdowns as integer overflow checks

- In the long term, people will invariably prefer programming 
languages which produce correct results (with less code), over 
programming languages whose benefit is only that they're faster.

So, it seems to me that Rust made the choice to only enable 
overflow checks in debug mode in order to be competitive with the 
programming languages of its time. I think Zig's design is the 
more future-proof - there will continue to be circumstances in 
which speed is preferable over correctness, such as video games 
(where an occasional wrong result is tolerable), so having 
distinct ReleaseFast and ReleaseSafe modes makes sense.

BTW, another data point along Rust and Zig is of course Python 3, 
in which all integers are BigInts (but with small numbers inlined 
in the value, akin to small string optimizations).

Mar 31 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/31/2021 12:31 AM, Vladimir Panteleev wrote:
 - Silicon will keep getting faster and cheaper with time
 
 - A 7% or a 14% or even a +100% slowdown is relatively insignificant
considering 
 the overall march of progress - Moore's law, but also other factors such as
the 
 average size and complexity of programs, which will also keep increasing as 
 people expect software to do more things, which will drown out such "one-time" 
 slowdowns as integer overflow checks

If you're running a data center, 1% translates to millions of dollars.


 - In the long term, people will invariably prefer programming languages which 
 produce correct results (with less code), over programming languages whose 
 benefit is only that they're faster.

People will prefer what makes them money :-)

D's focus is on memory safety, which is far more important than integer
overflow.


 So, it seems to me that Rust made the choice to only enable overflow checks in 
 debug mode in order to be competitive with the programming languages of its 
 time. I think Zig's design is the more future-proof - there will continue to
be 
 circumstances in which speed is preferable over correctness, such as video
games 
 (where an occasional wrong result is tolerable), so having distinct
ReleaseFast 
 and ReleaseSafe modes makes sense.

Zig doesn't do much to prevent memory corruption. Memory safety will be the 
focus of D for the near future.


 BTW, another data point along Rust and Zig is of course Python 3, in which all 
 integers are BigInts (but with small numbers inlined in the value, akin to
small 
 string optimizations).

Python isn't competitive with systems programming languages.

Mar 31 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 07:52:31 UTC, Walter Bright wrote:
 On 3/31/2021 12:31 AM, Vladimir Panteleev wrote:
 - Silicon will keep getting faster and cheaper with time
 
 - A 7% or a 14% or even a +100% slowdown is relatively 
 insignificant considering the overall march of progress - 
 Moore's law, but also other factors such as the average size 
 and complexity of programs, which will also keep increasing as 
 people expect software to do more things, which will drown out 
 such "one-time" slowdowns as integer overflow checks

 If you're running a data center, 1% translates to millions of 
 dollars.

You would think someone would have told that to all the companies 
running their services written in Ruby, JavaScript, etc.

Unfortunately, that hasn't been the case.

What remains the most valuable is 1) time/money not lost due to 
wrong results / angry customers, and  2) developer time.

 - In the long term, people will invariably prefer programming 
 languages which produce correct results (with less code), over 
 programming languages whose benefit is only that they're 
 faster.

 People will prefer what makes them money :-)

 D's focus is on memory safety, which is far more important than 
 integer overflow.

It most definitely is.

But I think sooner or later we will get to a point where memory 
safety is the norm, and writing code in memory-unsafe languages 
would be like writing raw assembler today. So, the standard for 
correctness will be higher.

Mar 31 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 3:58 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 07:52:31 UTC, Walter Bright wrote:
 On 3/31/2021 12:31 AM, Vladimir Panteleev wrote:
 - Silicon will keep getting faster and cheaper with time

 - A 7% or a 14% or even a +100% slowdown is relatively insignificant 
 considering the overall march of progress - Moore's law, but also 
 other factors such as the average size and complexity of programs, 
 which will also keep increasing as people expect software to do more 
 things, which will drown out such "one-time" slowdowns as integer 
 overflow checks

 If you're running a data center, 1% translates to millions of dollars.

 
 You would think someone would have told that to all the companies 
 running their services written in Ruby, JavaScript, etc.

Funny how things work out isn't it :o).

 Unfortunately, that hasn't been the case.

It is. I know because I collaborated with the provisioning team at Facebook.

Mar 31 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 12:38:51 UTC, Andrei Alexandrescu 
wrote:
 You would think someone would have told that to all the 
 companies running their services written in Ruby, JavaScript, 
 etc.

 Funny how things work out isn't it :o).

 Unfortunately, that hasn't been the case.

 It is. I know because I collaborated with the provisioning team 
 at Facebook.

I don't understand what you mean by this.

Do you and Facebook have a plan to forbid the entire world from 
running Ruby, JavaScript etc. en masse on datacenters?

Mar 31 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 8:40 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 12:38:51 UTC, Andrei Alexandrescu wrote:
 You would think someone would have told that to all the companies 
 running their services written in Ruby, JavaScript, etc.

 Funny how things work out isn't it :o).

 Unfortunately, that hasn't been the case.

 It is. I know because I collaborated with the provisioning team at 
 Facebook.

 
 I don't understand what you mean by this.
 
 Do you and Facebook have a plan to forbid the entire world from running 
 Ruby, JavaScript etc. en masse on datacenters?

Using languages has to take important human factors into effect, e.g. 
Facebook could not realistically switch from PHP/Hack to C++ in the 
front end (though the notion does come up time and again). It is 
factually true that to a large server farm performance percentages 
translate into millions.

Mar 31 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 3:52 AM, Walter Bright wrote:
 On 3/31/2021 12:31 AM, Vladimir Panteleev wrote:
 - Silicon will keep getting faster and cheaper with time

 - A 7% or a 14% or even a +100% slowdown is relatively insignificant 
 considering the overall march of progress - Moore's law, but also 
 other factors such as the average size and complexity of programs, 
 which will also keep increasing as people expect software to do more 
 things, which will drown out such "one-time" slowdowns as integer 
 overflow checks

 
 If you're running a data center, 1% translates to millions of dollars.

Factually true. Millions of dollars a year that is.

It's all about the clientele. There will always be companies that must 
get every bit of performance. Weka.IO must be fastest. If they were 
within 15% of the fastest, they'd be out of business.

Mar 31 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 04:08:02 UTC, Andrei Alexandrescu 
wrote:
 Not much to write home about. The jumps scale linearly with the 
 number of primitive operations:

 https://godbolt.org/z/r3sj1T4hc

Right, but as we both know, speed doesn't necessarily scale with 
the number of instructions for many decades now.

Curiosity got the better of me and I played with this for a bit.

Here is my program:

https://dump.cy.md/d7b7ae5c2d15c8c0127fd96dd74909a1/main.zig

Two interesting observations:

1. The compiler (whether it's the Zig frontend or the LLVM 
backend) is smart about adding the checks. If it can prove that 
the values will never overflow, then the overflow checks aren't 
emitted. I had to trick it into thinking that they may overflow, 
when in practice they never will.

1b. The compiler is actually that aware of the checks, that in 
one of my attempts to get it to always emit them, it actually 
generated a version of the function with and without the checks, 
and called the unchecked version in the case where it knew that 
it will never overflow! Amazing!

2. After finally getting it to always generate the checks, and 
benchmarking the results, the difference in run time I'm seeing 
between ReleaseFast and ReleaseSafe is a measly 2.7%. The 
disassembly looks all right too: https://godbolt.org/z/3nY7Ee4ff

Personally, 2.7% is a price I'm willing to pay any day, if it 
helps save me from embarrassments like 
https://github.com/CyberShadow/btdu/issues/1 :)

Mar 30 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 10:16 PM, Vladimir Panteleev wrote:
 1. The compiler (whether it's the Zig frontend or the LLVM backend) is smart 
 about adding the checks. If it can prove that the values will never overflow, 
 then the overflow checks aren't emitted. I had to trick it into thinking that 
 they may overflow, when in practice they never will.

The code uses hardcoded loop limits. Yes, the compiler can infer no overflow by 
knowing the limits of the value. In my experience, I rarely loop for a
hardcoded 
number of times.

Mar 30 2021

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 31 March 2021 at 05:32:16 UTC, Walter Bright wrote:
 On 3/30/2021 10:16 PM, Vladimir Panteleev wrote:
 1. The compiler (whether it's the Zig frontend or the LLVM 
 backend) is smart about adding the checks. If it can prove 
 that the values will never overflow, then the overflow checks 
 aren't emitted. I had to trick it into thinking that they may 
 overflow, when in practice they never will.

 The code uses hardcoded loop limits. Yes, the compiler can 
 infer no overflow by knowing the limits of the value. In my 
 experience, I rarely loop for a hardcoded number of times.

Well, this is fake artificial code, and looping a fixed number of 
times is just one aspect of its fakeness. If you do loop an 
unpredictable number of times in your real program, then you 
almost certainly do need the overflow check, so emitting it would 
be the right thing to do there :)

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 1:16 AM, Vladimir Panteleev wrote:
 On Wednesday, 31 March 2021 at 04:08:02 UTC, Andrei Alexandrescu wrote:
 Not much to write home about. The jumps scale linearly with the number 
 of primitive operations:

 https://godbolt.org/z/r3sj1T4hc

 
 Right, but as we both know, speed doesn't necessarily scale with the 
 number of instructions for many decades now.

Of course, and I wasn't suggesting the contrary. If speed would simply 
increase by decreasing instructions retired, inliners would be much more 
agreesive etc. etc. But such statements need to be carefully qualified 
which is why I do my best to not make them in isolation. The 
qualification here would be... "except most of the case when it does". 
Instructions retired is generally a telling proxy.

 Curiosity got the better of me and I played with this for a bit.
 
 Here is my program:
 
 https://dump.cy.md/d7b7ae5c2d15c8c0127fd96dd74909a1/main.zig
 
 Two interesting observations:
 
 1. The compiler (whether it's the Zig frontend or the LLVM backend) is 
 smart about adding the checks. If it can prove that the values will 
 never overflow, then the overflow checks aren't emitted. I had to trick 
 it into thinking that they may overflow, when in practice they never will.
 
 1b. The compiler is actually that aware of the checks, that in one of my 
 attempts to get it to always emit them, it actually generated a version 
 of the function with and without the checks, and called the unchecked 
 version in the case where it knew that it will never overflow! Amazing!
 
 2. After finally getting it to always generate the checks, and 
 benchmarking the results, the difference in run time I'm seeing between 
 ReleaseFast and ReleaseSafe is a measly 2.7%. The disassembly looks all 
 right too: https://godbolt.org/z/3nY7Ee4ff
 
 Personally, 2.7% is a price I'm willing to pay any day, if it helps save 
 me from embarrassments like https://github.com/CyberShadow/btdu/issues/1 :)

That's in line with expectations for a small benchmarks. On larger 
applications the impact of bigger code on the instruction cache would be 
more detrimental. (Also the branch predictor is a limited resource so 
more jumps means decreased predictability of others; not sure how that 
compares in magnitude with the impact on instruction cache, which is a 
larger and more common problem.)

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu 
wrote:
 On 3/30/21 1:09 PM, tsbockman wrote:
 So you're now dismissing Zig as slow because its feature set 
 surprised you? No real-world data is necessary? No need to 
 understand any of Zig's relevant optimizations or options?

 Instead of passing the burden of proof back and forth, some 
 evidence would be welcome.

I already posted both some Zig benchmark results of my own, and 
some C/C++ results from the paper you linked earlier. You just 
missed them, I guess:

     
https://forum.dlang.org/post/ghcnkevthguciupexeyu forum.dlang.org
     
https://forum.dlang.org/post/rnotyrxmczbdvxtalarf forum.dlang.org

Oversimplified: the extra time required in these tests ranged 
from less than 0.1% up to 14%, depending on the application.

Also, the Zig checked binaries are actually slightly smaller than 
the unchecked binaries for some reason.

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:11 AM, tsbockman wrote:
 On Wednesday, 31 March 2021 at 03:30:00 UTC, Andrei Alexandrescu wrote:
 On 3/30/21 1:09 PM, tsbockman wrote:
 So you're now dismissing Zig as slow because its feature set 
 surprised you? No real-world data is necessary? No need to understand 
 any of Zig's relevant optimizations or options?

 Instead of passing the burden of proof back and forth, some evidence 
 would be welcome.

 
 I already posted both some Zig benchmark results of my own, and some 
 C/C++ results from the paper you linked earlier. You just missed them, I 
 guess:
 
 https://forum.dlang.org/post/ghcnkevthguciupexeyu forum.dlang.org
 https://forum.dlang.org/post/rnotyrxmczbdvxtalarf forum.dlang.org
 
 Oversimplified: the extra time required in these tests ranged from less 
 than 0.1% up to 14%, depending on the application.

Thanks. This is in line with expectations.

 Also, the Zig checked binaries are actually slightly smaller than the 
 unchecked binaries for some reason.

That's surprising so some investigation would be in order. From what I 
tried on godbolt the generated code is strictly larger if it uses checks.

FWIW I just tested -fwrapv and -ftrapv. The former does nothing discernible:

https://godbolt.org/z/ErMoeKnxK

The latter generates one function call per primitive operation, which is 
sure to not win any contests:

https://godbolt.org/z/ahErY3zKn

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 04:26:28 UTC, Andrei Alexandrescu 
wrote:
 FWIW I just tested -fwrapv and -ftrapv. The former does nothing 
 discernible:

-fwrapv isn't supposed to do anything discernible; it just 
prevents the compiler from taking advantage of otherwise 
undefined behavior:

"Instructs the compiler to assume that signed arithmetic overflow 
of addition, subtraction, and multiplication, wraps using 
two's-complement representation."

https://www.keil.com/support/man/docs/armclang_ref/armclang_ref_sam1465487496421.htm

Mar 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Wednesday, 31 March 2021 at 04:26:28 UTC, Andrei Alexandrescu 
wrote:
 On 3/31/21 12:11 AM, tsbockman wrote:
 Also, the Zig checked binaries are actually slightly smaller 
 than the unchecked binaries for some reason.

 That's surprising so some investigation would be in order. From 
 what I tried on godbolt the generated code is strictly larger 
 if it uses checks.

Perhaps the additional runtime validation is causing reduced 
inlining in some cases? The test program I used has almost 300 
KiB of source code, so it may be hard to reproduce the effect 
with toy programs on godbolt.

Mar 30 2021

sighoya <sighoya gmail.com> writes:

On Saturday, 27 March 2021 at 03:25:04 UTC, Walter Bright wrote:

 4. fast integer arithmetic is fundamental to fast code, not a 
 mere micro-optimization. Who wants an overflow check on every 
 pointer increment?

The point is the overflow check is already done by most cpus 
independent if overflow will be handled by the language or not.
Unfortunately, such cpu's don't send an interrupt, so we have to 
check twice for overflows.
The best is of course to handle the language safe arithmetics, 
however this requires full semantic support in the type system.
What about providing two operators for integer arithmetic 
instead, one safe and one unsafe?

Mar 29 2021

Elronnd <elronnd elronnd.net> writes:

On Saturday, 27 March 2021 at 03:25:04 UTC, Walter Bright wrote:
 4. fast integer arithmetic is fundamental to fast code, not a 
 mere micro-optimization. Who wants an overflow check on every 
 pointer increment?

Dan Luu measures overflow checks as having an overall 1% 
performance impact for numeric-heavy c code.  
(https://danluu.com/integer-overflow/).  The code size impact is 
also very small, ~3%.

This isn't 'speculation', it's actual measurement.  'lea' is a 
microoptimization, it doesn't 'significantly' improve 
performance; yes, mul is slow, but lea can be trivially replaced 
by the equivalent sequence of shifts and adds with very little 
penalty.

Why is this being seriously discussed as a performance pitfall?

Mar 30 2021

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/31/21 12:28 AM, Elronnd wrote:
 On Saturday, 27 March 2021 at 03:25:04 UTC, Walter Bright wrote:
 4. fast integer arithmetic is fundamental to fast code, not a mere 
 micro-optimization. Who wants an overflow check on every pointer 
 increment?

 
 Dan Luu measures overflow checks as having an overall 1% performance 
 impact for numeric-heavy c code. 
 (https://danluu.com/integer-overflow/).  The code size impact is also 
 very small, ~3%.
 
 This isn't 'speculation', it's actual measurement.

Bit surprised about how you put it. Are you sure you represent what the 
article says? I skimmed the article just now and... 1% is literally not 
found in the text, and 3% is a "guesstimate" per the author. The actual 
measurements show much larger margins, closer to tsbockman's.

Mar 30 2021

Elronnd <elronnd elronnd.net> writes:

On Wednesday, 31 March 2021 at 05:08:29 UTC, Andrei Alexandrescu 
wrote:
 I skimmed the article just now and... 1% is literally not found 
 in the text, and 3% is a "guesstimate" per the author.

Look at the 'fsan ud' row of the only table.  1% is the 
performance penalty for 'zip', and 0% penalty for 'unzip'.

3% codesize I got from looking at binaries on my own system.  I 
actually forgot that that article talks about codesize at all.

Mar 31 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 3/30/2021 9:28 PM, Elronnd wrote:
 Why is this being seriously discussed as a performance pitfall?

1% is a serious improvement. If it wasn't, why would Rust (for example) who 
likely tried harder than anyone to make it work, still disable it for release
code?

Mar 31 2021

Elronnd <elronnd elronnd.net> writes:

On Thursday, 1 April 2021 at 00:50:21 UTC, Walter Bright wrote:
 On 3/30/2021 9:28 PM, Elronnd wrote:
 Why is this being seriously discussed as a performance pitfall?

 1% is a serious improvement. If it wasn't, why would Rust (for 
 example) who likely tried harder than anyone to make it work, 
 still disable it for release code?

That's an appeal to authority.  You haven't actually justified 
their choice.  (Nor, for that matter, have you justified that 1% 
is a serious performance improvement.)

Apr 01 2021

Walter Bright <newshound2 digitalmars.com> writes:

On 4/1/2021 10:59 PM, Elronnd wrote:
 On Thursday, 1 April 2021 at 00:50:21 UTC, Walter Bright wrote:
 On 3/30/2021 9:28 PM, Elronnd wrote:
 Why is this being seriously discussed as a performance pitfall?

 1% is a serious improvement. If it wasn't, why would Rust (for example) who 
 likely tried harder than anyone to make it work, still disable it for release 
 code?

 
 That's an appeal to authority.  You haven't actually justified their choice.  
 (Nor, for that matter, have you justified that 1% is a serious performance 
 improvement.)

That's backwards. You want other people to invest in this technology, you need 
to justify it.

I've been in this business for 40 years. I know for a fact that if you're
trying 
to sell a high performance language that is inherently slower than the current 
one they're using, you've got a huge problem.

Having written high performance apps myself, I'll take 1%. I work on getting a 
lot smaller gains than that, because they add up.

As for Appeal to Authority, there is more nuance to it than one might think:

"Exception: Be very careful not to confuse "deferring to an authority on the 
issue" with the appeal to authority fallacy. Remember, a fallacy is an error in 
reasoning. Dismissing the council of legitimate experts and authorities turns 
good skepticism into denialism. The appeal to authority is a fallacy in 
argumentation, but deferring to an authority is a reliable heuristic that we
all 
use virtually every day on issues of relatively little importance. There is 
always a chance that any authority can be wrong, that’s why the critical
thinker 
accepts facts provisionally. It is not at all unreasonable (or an error in 
reasoning) to accept information as provisionally true by credible authorities. 
Of course, the reasonableness is moderated by the claim being made (i.e., how 
extraordinary, how important) and the authority (how credible, how relevant to 
the claim)."

https://www.logicallyfallacious.com/logicalfallacies/Appeal-to-Authority

---

I'm not preventing anyone from adding integer overflow detection to D. Feel
free 
to make a prototype and we can all evaluate it.

Apr 02 2021

Guillaume Piolat <first.name spam.org> writes:

On Friday, 2 April 2021 at 20:56:04 UTC, Walter Bright wrote:
 I'm not preventing anyone from adding integer overflow 
 detection to D. Feel free to make a prototype and we can all 
 evaluate it.

Seems to be a bit like bounds checks (less obvious benefits), it 
could be made default but disabled in -b release-nobounds

Even while being opt-out, bounds check are annoying in D because 
with DUB you typically profile a program built with dub -b 
release-debug and that _includes_ bounds checks! So I routinely 
profile programs that aren't like the actual output.

So, integer overflow checks would - in practice - further hinder 
capacity to profile programs.

Apr 03 2021

John Colvin <john.loughran.colvin gmail.com> writes:

On Saturday, 3 April 2021 at 09:09:33 UTC, Guillaume Piolat wrote:
 On Friday, 2 April 2021 at 20:56:04 UTC, Walter Bright wrote:
 I'm not preventing anyone from adding integer overflow 
 detection to D. Feel free to make a prototype and we can all 
 evaluate it.

 Seems to be a bit like bounds checks (less obvious benefits), 
 it could be made default but disabled in -b release-nobounds

 Even while being opt-out, bounds check are annoying in D 
 because with DUB you typically profile a program built with dub 
 -b release-debug and that _includes_ bounds checks! So I 
 routinely profile programs that aren't like the actual output.

 So, integer overflow checks would - in practice - further 
 hinder capacity to profile programs.

It’s not like bounds checks because there’s loads of code out 
there that correctly uses overflow. It’s a significant breaking 
change to turn that switch on, not just a “would you like to 
trade some speed for safety” like bounds-checks are.

That’s not to say it shouldn’t be done. I’m just pointing out 
that it’s very different.

Apr 03 2021

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 24 March 2021 at 20:28:39 UTC, tsbockman wrote:
 [snip]
 TLDR; What you're really asking for is impossible in D2. It 
 would require massive breaking changes to the language to 
 implement without undermining the guarantees that a checked 
 integer type exists to provide.

Are you familiar with how Zig handles overflow [1]? They error on 
overflow by default, but they have additional functions and 
operators to handle when you want to do wraparound.

Nevertheless, I agree that the ship has sailed for D2 on this.

[1] https://ziglang.org/documentation/master/#Integer-Overflow

Mar 27 2021

tsbockman <thomas.bockman gmail.com> writes:

On Saturday, 27 March 2021 at 21:02:39 UTC, jmh530 wrote:
 Are you familiar with how Zig handles overflow [1]? They error 
 on overflow by default, but they have additional functions and 
 operators to handle when you want to do wraparound.

Thanks for the link; I hadn't seen Zig's take before. It agrees 
with my conclusions from developing checkedint: assume the user 
wants normal integer math by default, signal an error somehow 
when it fails, and wrap overflow only when this is explicitly 
requested.

It's not just about reliability vs. performance, it is about 
making the intended semantics of the code clear:

0) Is overflow wrapped on purpose?
1) Did the programmer somehow prove that overflow cannot occur 
for all valid inputs?
2) Was the programmer desperate enough for speed to knowingly 
write incorrect code?
3) Was the programmer simply ignorant or forgetful of this 
problem?
4) Did the programmer willfully ignore overflow because it is 
"not the cause of enough problems to be that concerning"?

Most code written in C/D/etc. leaves the answer to this question 
a mystery for the reader to puzzle out. In contrast, code written 
using a system like Zig's is far less likely to confuse or 
mislead the reader.

 Nevertheless, I agree that the ship has sailed for D2 on this.

Yes.

 [1] https://ziglang.org/documentation/master/#Integer-Overflow

Mar 27 2021

Berni44 <someone somemail.com> writes:

On Tuesday, 23 March 2021 at 21:22:18 UTC, Walter Bright wrote:
 It's been there long enough.

Isn't that true meanwhile for everything in std.experimental? I 
ask, because I've got the feeling, that std.experimental doesn't 
work as expected. For me it looks more or less like an attic, 
where stuff is put and then forgotten. Maybe the way we used for 
sumtype is the better approach...

Mar 24 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Wednesday, 24 March 2021 at 11:20:52 UTC, Berni44 wrote:
 On Tuesday, 23 March 2021 at 21:22:18 UTC, Walter Bright wrote:
 It's been there long enough.

 Isn't that true meanwhile for everything in std.experimental? I 
 ask, because I've got the feeling, that std.experimental 
 doesn't work as expected. For me it looks more or less like an 
 attic, where stuff is put and then forgotten. Maybe the way we 
 used for sumtype is the better approach...

I have no idea why std.experimental is a thing to begin with. It 
sounds like a bad idea and it turned out to be one. Moving stuff 
around in a standard library isn't without some disadvantages: 
The public import stays as an historic artifact or deprecation is 
needed, both things that should be avoided. There are cases where 
it's fine like splitting a module into a package.

A standard library is something expected to be particularly well 
done and stable. Having experimental stuff in it, is an oxymoron.

DUB packages that are "featured" is a way better approach. If 
deemed worth it (like sumtype), they can be incorporated into 
Phobos. We may even introduce a "Phobos candidate" tag. 
Additionally, that establishes DUB as a core part of the D 
ecosystem.

Can std.experimental packages be removed without deprecation?

The worst offender is std.experimental.typecons; while I don't 
really understand the purpose of (un-)wrap, I know enough of 
Final to be sure it's the kind of thing that must be a language 
feature or it cannot possibly live up to users' expectations. 
Final cannot work properly as a library solution. (I can 
elaborate if needed.) I tried fixing it until I realized it's 
impossible because it's design goal is unsound. I honestly cannot 
imagine anyone who uses it. It is cumbersome and has zero 
advantages.

Mar 24 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/24/21 10:57 AM, Q. Schroll wrote:
 On Wednesday, 24 March 2021 at 11:20:52 UTC, Berni44 wrote:
 On Tuesday, 23 March 2021 at 21:22:18 UTC, Walter Bright wrote:
 It's been there long enough.

 Isn't that true meanwhile for everything in std.experimental? I ask, 
 because I've got the feeling, that std.experimental doesn't work as 
 expected. For me it looks more or less like an attic, where stuff is 
 put and then forgotten. Maybe the way we used for sumtype is the 
 better approach...

 
 I have no idea why std.experimental is a thing to begin with. It sounds 
 like a bad idea and it turned out to be one. Moving stuff around in a 
 standard library isn't without some disadvantages: The public import 
 stays as an historic artifact or deprecation is needed, both things that 
 should be avoided. There are cases where it's fine like splitting a 
 module into a package.

It's there because we wanted a place for new parts of phobos to develop 
without becoming set in stone. The reason it's called "std.experimental" 
is to disclose explicitly that it is meant to be experimental, subject 
to breaking changes. Otherwise you get things like javax.

In practice, it turned out not as helpful as originally planned, which 
is why we haven't put anything new in it for a long long time. Take for 
instance std.experimental.allocator. At one point, a fundamental design 
change happened (which is perfectly allowed). But of course, code had 
depended on it, and now was broken. So stdx.allocator was born (see 
https://code.dlang.org/packages/stdx-allocator) to allow depending on 
specific versions of std.experimental.allocator without having to freeze 
yourself at a specific Phobos version.

It's important to note that std.experimental predates code.dlang.org, 
which I think is the better way to develop libraries that might become 
included into phobos (see for instance std.sumtype).

 
 Can std.experimental packages be removed without deprecation?

In a word, yes. It's experimental, anything is possible. I would 
recommend we deprecate-remove everything in it into dub packages, or 
promote them to full-fledged Phobos packages.

-Steve

Mar 29 2021

Guillaume Piolat <first.name spam.org> writes:

On Monday, 29 March 2021 at 14:47:15 UTC, Steven Schveighoffer 
wrote:
 It's important to note that std.experimental predates 
 code.dlang.org, which I think is the better way to develop 
 libraries that might become included into phobos (see for 
 instance std.sumtype).

I was intringued and digged a bit of forum history:
- the idea for std.experimental was out there in 2011 (!)
- debates about its merit vs popularity on code.dlang.org 
happened in 2014
- first module accepted was std.experimental.logger, in 2015, 
after an unusually long review time (and after being a DUB 
package for a while)
- followed by std.experimental.allocator in 2015
- std.experimental.checkedint is added in Apr 2017, at the same 
time std.experimental.ndslice is removed from Phobos. Development 
continues on DUB.
- the stdx-allocator DUB package was created in Nov 2017. Today 
it has a 4.2 score on DUB.

Mar 29 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Time to move std.experimental.checkedint to std.checkedint ?