www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should you be able to initialize a float with a char?

reply max haughton <maxhaton gmail.com> writes:
For example:

float x = 'a';

Currently compiles. I had no idea that it does but I was 
implementing this pattern in SDC and lo and behold it does (and 
thus sdc has to support it).

Should it? Implicit conversions and implicit-anything around 
floats seem to be very undocumented in the specification too.
May 18 2022
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:
 For example:

 float x = 'a';

 Currently compiles. I had no idea that it does but I was 
 implementing this pattern in SDC and lo and behold it does (and 
 thus sdc has to support it).

 Should it? Implicit conversions and implicit-anything around 
 floats seem to be very undocumented in the specification too.
Under ["integer promotions"][1], the spec says that `char` can implicitly convert to `int`. Under ["usual arithmetic conversions"][2], the spec says (by implication) that all arithmetic types can implicitly convert to `float`. "Arithmetic type" is not explicitly defined by the spec, but [in the C99 standard][3] it means "integer and floating types." It's probably safe to assume the same definition applies to D. So I would say that according to the spec, the answer is "yes, the example should work." Though it is rather surprising. [1]: https://dlang.org/spec/type.html#integer-promotions [2]: https://dlang.org/spec/type.html#usual-arithmetic-conversions
May 18 2022
parent reply forkit <forkit gmail.com> writes:
On Wednesday, 18 May 2022 at 22:24:18 UTC, Paul Backus wrote:
 So I would say that according to the spec, the answer is "yes, 
 the example should work." Though it is rather surprising.
It's actually rather *unsurprising* (given D's compatability needs with C). What is surprising, is that there's no compiler option to disable implicit type casts, or to disable them in safe, or *at the very least*, output a record of such casts for auditing (to help minimise bugs and vulnerabilities).
May 19 2022
parent forkit <forkit gmail.com> writes:
On Friday, 20 May 2022 at 02:09:43 UTC, forkit wrote:

D suffers from the same problem as C.

// ---

module test;
 safe: // completly useless annotation here!

import std;

void main()
{
     int x = 3;
     int y = 4;
     float z =x/y;
     writeln(z); // 0

}

// -----
May 19 2022
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:
 For example:
 
 float x = 'a';
 
 Currently compiles. I had no idea that it does but I was implementing
 this pattern in SDC and lo and behold it does (and thus sdc has to
 support it).
 
 Should it? Implicit conversions and implicit-anything around floats
 seem to be very undocumented in the specification too.
If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case. T -- MSDOS = MicroSoft's Denial Of Service
May 18 2022
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code like
 the above.
 
 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the same
 principle holds in this case.
People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
May 18 2022
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/18/22 8:27 PM, Walter Bright wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code like
 the above.

 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the same
 principle holds in this case.
People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
Supporting addition on char types (even with char + int) is still possible without allowing implicit conversions. -Steve
May 18 2022
prev sibling next sibling parent reply max haughton <maxhaton gmail.com> writes:
On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions 
 between
 char and non-char types. Otherwise you end up with nonsense 
 code like
 the above.
 
 But IIRC, the last time this conversation came up, Walter's 
 view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but 
 the same
 principle holds in this case.
People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
People do indeed (I'd question whether it's routine in a good D program, I'd flag it in code review) manipulate characters as integers, but I think there's something to be said for forcing people to go char -> suitable integer -> char. We have u/byte, largely for descriptive purposes already, personally I try to use them for calculation even if the byte's value is from a char.
May 18 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/18/2022 5:55 PM, max haughton wrote:
 People do indeed (I'd question whether it's routine in a good D program, I'd 
 flag it in code review) manipulate characters as integers, but I think there's 
 something to be said for forcing people to go char -> suitable integer -> char.
Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.
May 18 2022
next sibling parent reply ab <not_a_real_address nowhere.ab> writes:
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 On 5/18/2022 5:55 PM, max haughton wrote:
 People do indeed (I'd question whether it's routine in a good 
 D program, I'd flag it in code review) manipulate characters 
 as integers, but I think there's something to be said for 
 forcing people to go char -> suitable integer -> char.
Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.
This can be solved by a cast with explicit source and destination type, e.g. auto cast_from(From,To,Real)(Real a) { static if (is (From==Real)) return cast(To) a; else pragma(msg, "Wrong types"); } void main() { import std.range, std.stdio; short a=1; int b=cast_from!(short,int)(a); bool c=1; // int d=cast_from!(short,int)(a); // compile time error writeln("Test: ", b); }
May 19 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 12:46 AM, ab wrote:
 This can be solved by a cast with explicit source and destination type, e.g.
This indeed can work, but when people complain about adding attributes (valid complaints), how are they going to react to having to do this?
May 19 2022
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 Casts are a common source of bugs, not correctness. This is 
 because it is forced override of the type system. If the types 
 change due to refactoring, the cast may no longer be correct, 
 but the programmer will have no way of knowing.
I'd argue that implicit casts are more so in some cases. This is one of those cases. And also you shouldn't really do arithmetic operations on chars anyway, at least not with unicode and D is supposed to be a unicode language. Upper-casing in unicode is not as simple as an addition, because the rules for doing so are language specific. Changing case in one language isn't always the same as in another language. Even with ASCII you can't just rely on a mathematic computation, because not all characters can change case, such as symbols. That's why string/char manipulation should __always__ be a library solution, not a user-code solution. The library should handle all these rules. The user should absolutely not be able/have to to mess this up by accident, unless they really really want to. Sure a char might be represented by an integer type, but so is every single data type you can ever think of since they all convert to bytes. If D is to ever attract more users, then it must not surprise new users.
May 19 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 12:57 AM, bauss wrote:
 On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 Casts are a common source of bugs, not correctness. This is because it is 
 forced override of the type system. If the types change due to refactoring, 
 the cast may no longer be correct, but the programmer will have no way of 
 knowing.
I'd argue that implicit casts are more so in some cases.
D's rules added some constraints to C's rules to prevent loss of data with implicit casting. I don't see how D's implicit casts are a dangerous source of bugs.
 And also you shouldn't really do arithmetic operations on chars anyway, at
least 
 not with unicode and D is supposed to be a unicode language.
It turns out that for performance reasons, you definitely want to treat UTF-8 as individual code units. Autodecode taught us that the hard way.
 Upper-casing in unicode is not as simple as an addition, because the rules for 
 doing so are language specific.
I'm painfully aware that the Unicode consortium made it impossible to do "correct" Unicode without a megabyte library.
 Even with ASCII you can't just rely on a mathematic computation, because not
all 
 characters can change case, such as symbols.
Yes, you can. I posted the code in another post in this thread. ASCII hasn't changed in my professional lifetime, and I seriously doubt it will change in yours.
 If D is to ever attract more users, then it must not surprise new users.
The only problem we've had with D chars is autodecoding, which ironically does what you propose - treat everything as Unicode code points rather than code units. It's a great idea, but it simply does not work, and it took us years to become convinced of that.
May 19 2022
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, May 18, 2022 at 05:27:24PM -0700, Walter Bright via Digitalmars-d wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code
 like the above.
 
 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the
 same principle holds in this case.
People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
How is that any different from the current situation where arithmetic involving short ints require casts all over the place? Even things like this require a cast: short s = 123; //s = -s; // NG s = cast(short)-s; // required excess verbiage It got so out of hand that I wrote nopromote.d, specifically to "poison" expressions involving short ints with a custom struct with overloaded ops that always truncate, just so I don't have to litter my code with casts in just about every expression involving short ints. In the case of char + int arithmetic, my opinion is that usually people do *not* (or *should* not) do char arithmetic directly -- with Unicode, it makes much less sense than the bad ole days of ASCII. These days, you'd call one of the std.uni functions for proper case mapping instead of a slipshod hack job of adding or subtracting some magic constant (which is wrong in anything except ASCII anyway). In today's day and age, strings are best treated as opaque data that are manipulated by properly-implemented string functions in the standard library. Having a few extra char/int casts in std.uni isn't the end of the world. It shouldn't usually be done in user code anyway. (And having to write lots of casts may motivate people to actually use proper string manipulation functions instead of winging it themselves with wrong implementations involving char arithmetic.) T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
May 18 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/18/2022 5:57 PM, H. S. Teoh wrote:
 How is that any different from the current situation where arithmetic
 involving short ints require casts all over the place?  Even things
 like this require a cast:
 
 	short s = 123;
 	//s = -s; // NG
 	s = cast(short)-s; // required excess verbiage
I generally avoid using shorts. I agree the situation is hardly ideal, but there is no ideal way I've ever seen. The various schemes just shift the deck chairs around.
 It got so out of hand that I wrote nopromote.d, specifically to "poison"
 expressions involving short ints with a custom struct with overloaded
 ops that always truncate, just so I don't have to litter my code with
 casts in just about every expression involving short ints.
The only reason to ever use shorts is to save memory in a frequently allocated data structure. Short local variables do not save memory or time (in fact, they're larger and slower). If you're doing all these casts, perhaps look into using ints instead.
 In the case of char + int arithmetic, my opinion is that usually people
 do *not* (or *should* not) do char arithmetic directly -- with Unicode,
 it makes much less sense than the bad ole days of ASCII. These days,
 you'd call one of the std.uni functions for proper case mapping instead
 of a slipshod hack job of adding or subtracting some magic constant
 (which is wrong in anything except ASCII anyway).  In today's day and
 age, strings are best treated as opaque data that are manipulated by
 properly-implemented string functions in the standard library.  Having a
 few extra char/int casts in std.uni isn't the end of the world.  It
 shouldn't usually be done in user code anyway.  (And having to write
 lots of casts may motivate people to actually use proper string
 manipulation functions instead of winging it themselves with wrong
 implementations involving char arithmetic.)
There's nothing wrong with: if ('A' <= c && c <= 'Z') c = c | 0x20; D doesn't have C's problems with optionally signed chars, 10 bit chars, EBCDIC, RADIX50 and other dead technologies.
May 18 2022
next sibling parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;
There is, this assumes that the character is ascii and not unicode. What about say 'Ã…' -> 'Ã¥'? It won't work for that. So your code is wrong in D because D isn't an ascii langauge, but a unicode language. As specified by the spec: char '\xFF' unsigned 8 bit (UTF-8 code unit) wchar '\uFFFF' unsigned 16 bit (UTF-16 code unit) dchar '\U0000FFFF' unsigned 32 bit (UTF-32 code unit)
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;
There is, this assumes that the character is ascii and not unicode.
It does not assume it, it tests for if it would be valid.
 What about say 'Ã…' -> 'Ã¥'?
 
 It won't work for that.
I know. And for many applications (like dev tools), it is fine.
May 19 2022
parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
 On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;
There is, this assumes that the character is ascii and not unicode.
It does not assume it, it tests for if it would be valid.
"However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC." [1] [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? https://ogeek.cn/qa/?qa=669486/
May 19 2022
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/19/22 12:13, kdevel wrote:
 On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
 On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;
There is, this assumes that the character is ascii and not unicode.
It does not assume it, it tests for if it would be valid.
"However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC." [1] [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? https://ogeek.cn/qa/?qa=669486/
In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above is valid without making any ASCII assumption. Ali
May 19 2022
next sibling parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
[...]
 "However, the assumption that setting bit 5 of the
representation will
 convert uppercase letters to lowercase is not valid for
EBCDIC." [1]
 [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F]
characters?
      https://ogeek.cn/qa/?qa=669486/
In D, char is UTF-8 and ASCII is a subset of UTF-8.
The latter part, that ASCII is a subset of UTF-8, is 1†. I disagree with the wording of the former part, that in D a char "is" UTF-8.
 Walter's code above is valid without making any ASCII 
 assumption.
Walter made the 0 claim "It does not assume it, it tests for if it would be valid [ascii and not unicode]" [2] ‡ Okay. Let's do UTF-8: ``` import std.stdio; import std.string; import std.utf; char char_tolower_bright (char c) { if ('A' <= c && c <= 'Z') c = c | 0x20; return c; } string tolower_bright (string s) { string t; foreach (c; s.byCodeUnit) t ~= c.char_tolower_bright; return t; } void process_strings (string s) { writefln!"input : %s" (s); auto t = s.tolower_bright; writefln!"bright : %s" (t); auto u = s.toLower; writefln!"toLower (std.utf): %s" (u); } void main () { process_strings ("A Ä"); process_strings ("A Ä"); } ``` Free of charge I compiled and ran this for you: $ dmd lcb $ ./lcb input : A Ä bright : a Ä toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä See the problem? † Hint for interpretation: booleans "are" integers. [2] http://forum.dlang.org/post/t662ll$tnm$1 digitalmars.com ‡ There is probably no consensus about what "it" means.
May 19 2022
parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?
You could have use "Ali Çehreli" as a test case :)
May 19 2022
parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:
 On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?
You could have use "Ali Çehreli" as a test case :)
One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407
May 19 2022
parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 22:51:55 UTC, kdevel wrote:
 On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:
 On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?
You could have use "Ali Çehreli" as a test case :)
One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407
This, or simply U+00E7 . Which will cause a similar problem as the one you raised.
May 19 2022
parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 19 May 2022 at 23:26:57 UTC, deadalnix wrote:
    input            : A Ä
    bright           : a Ä <-- upper case
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä <-- lower case

 See the problem?
You could have use "Ali Çehreli" as a test case :)
One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407
This, or simply U+00E7 .
Not "or" but "and". The first input contains UTF-8 of the (normalized) codepoint, which is left unchanged by Walters lowercase function. The second input contains UTF-8 of the same codepoint in canonically decomposed form (NFD).
 Which will cause a similar problem as the one you raised.
The problem is that you cannot decide only from the value of a single UTF-8 codeunit (char) if it stands for an ASCII character in the string.
May 19 2022
next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/19/22 17:50, kdevel wrote:

 The first input contains UTF-8 of the (normalized)
 codepoint, which is left unchanged by Walters lowercase function.
Note that Walter did not write any function. He showed a piece of code that would lowercase ASCII letters inside a UTF-8 encoded Unicode string. The code did not assume the string was ASCII and it did not claim to lowercase all Unicode characters. Ali
May 19 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 5:50 PM, kdevel wrote:
 Not "or" but "and". The first input contains UTF-8 of the (normalized) 
 codepoint, which is left unchanged by Walters lowercase function. The second 
 input contains UTF-8 of the same codepoint in canonically decomposed form
(NFD).
Should stick with normalized forms for good reason. Having two different sequences supposedly compare equal is an abomination. Though none of this supports the notion that arithmetic should not be done on chars. Heck, UTF-8 cannot be decoded without such arithmetic.
May 19 2022
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's 
 code above is valid without making any ASCII assumption.
Sure, it also doesn't perform any useful operation. other than "Uncapitalize English, do nothing for non latin languages, and create a mess with any non English latin language", which, while it certainly is a valid program, it is doesn't looks like it is something anyone would actually want to do for other reasons than it's easy to write and good enough.
May 19 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/19/22 15:17, deadalnix wrote:
 On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code
 above is valid without making any ASCII assumption.
Sure, it also doesn't perform any useful operation. other than "Uncapitalize English, do nothing for non latin languages
It is an experimental cypher. :) Ali
May 19 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 12:13 PM, kdevel wrote:
 [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
      https://ogeek.cn/qa/?qa=669486/
No. But D does.
May 19 2022
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;
Tell me you are American without telling me you are American.
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.
I didn't know the Aussies and Brits used umlauts.
May 19 2022
next sibling parent max haughton <maxhaton gmail.com> writes:
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.
I didn't know the Aussies and Brits used umlauts.
Maybe in some parts of scotland...
May 19 2022
prev sibling next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.
I didn't know the Aussies and Brits used umlauts.
The Brits Charlotte Brontë, Emily Brontë (and other members of the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli Sandé, John le Carré and the Australians Renée Geyer and Zoë Badwi and the Americans Beyoncé Knowles, Chloë Grace Moretz, Chloë Sevigny, Renée Fleming, Renée Zellweger, Zoë Baird, Zoë Kravitz, Donté Stallworth, John C. Frémont, Robert M. Gagné, Roxanne Shanté, Janelle Monáe, Jhené Aiko might want to have a word with you ;-)
May 20 2022
parent claptrap <clap trap.com> writes:
On Friday, 20 May 2022 at 09:16:24 UTC, Patrick Schluter wrote:
 On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.
I didn't know the Aussies and Brits used umlauts.
The Brits Charlotte Brontë, Emily Brontë (and other members of the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli Sandé, John le Carré might want to have a word with you ;-)
The Brontë was originally Brunty, the father changed it to honour Nelson when he won some battle. Zoë Wanamaker's is American. Or least was born in the US. Emeli Sandé dad was from Zambia i think John Le Carre is a pen name, his real name is David John Moore Cornwell So often the source of the umluts in "british" names is not as straight forward as it may appear.
May 20 2022
prev sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.
I didn't know the Aussies and Brits used umlauts.
C'mon, even the NY times (?, used to) spells it "cöperate", and other repeated vowels contractions to single umlaut.
May 20 2022
parent Nick Treleaven <nick geany.org> writes:
On Friday, 20 May 2022 at 11:54:47 UTC, Nicholas Wilson wrote:
 On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 I didn't know the Aussies and Brits used umlauts.
C'mon, even the NY times (?, used to) spells it "cöperate", and other repeated vowels contractions to single umlaut.
Coöperate. (It's known as a diaeresis in English): "The diaeresis diacritic indicates that two adjoining letters that would normally form a digraph and be pronounced as one sound, are instead to be read as separate vowels in two syllables. For example, in the spelling 'coöperate', the diaeresis reminds the reader that the word has four syllables co-op-er-ate, not three, '*coop-er-ate'. In British English this usage has been considered obsolete for many years, and in US English, although it persisted for longer, it is now considered archaic as well.[5] Nevertheless, it is still used by the US magazine The New Yorker.[6] In English language texts it is perhaps most familiar in the spellings 'naïve', 'Noël', and 'Chloë'" Wikipedia
May 20 2022
prev sibling parent user1234 <user1234 12.de> writes:
On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:
 People routinely manipulate chars as integer types, for 
 example, in converting case. Making them not integer types 
 means lots of casting will become necessary, and overall that's 
 a step backwards.
it is indeed but let's be honest, having builtin char, wchar, and dchar, is only usefull for overload resolution and string literals. ubyte c = 's'; // OK ubyte[] a = "s".dup; // NG without the string literal problem, there's only the overload resolution one and for this one builtin character types could be library types, e.g struct wrapping ubyte, ushort, uint.
May 19 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/18/22 6:31 PM, H. S. Teoh wrote:
 On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:
 For example:

 float x = 'a';

 Currently compiles. I had no idea that it does but I was implementing
 this pattern in SDC and lo and behold it does (and thus sdc has to
 support it).

 Should it? Implicit conversions and implicit-anything around floats
 seem to be very undocumented in the specification too.
If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above.
If you were to ask me, I'd say prohibit implicit conversions between char types and any other types, including other char types. converting char to dchar isn't correct. But I have little hope for it, as Walter treats a boolean as an integer. -Steve
May 18 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as an integer.
They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. a = (b < c) ? 8 : 3; becomes: a = 3 + (b < c) * 5; // I know this is not APL syntax That works in D, too! Branchless code is a thing, it is used in GPUs, and in security code to make it resistant to timing attacks. You'll also see this in the SIMD instructions, although they set all bits instead of just 1, because & is faster than *. a = 3 + (-(b < c) & 5);
May 18 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as an integer.
They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e.     a = (b < c) ? 8 : 3; becomes:     a = 3 + (b < c) * 5;   // I know this is not APL syntax That works in D, too!
I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -Steve
May 19 2022
next sibling parent max haughton <maxhaton gmail.com> writes:
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.
They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e.     a = (b < c) ? 8 : 3; becomes:     a = 3 + (b < c) * 5;   // I know this is not APL syntax That works in D, too!
I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -Steve
That and you can have the underlying type without exposing it to the programmer. "Bools are integers", as opposed to bools not having a memory representation at all? Basically any discussion of these peephole optimizations (if this is more than just a nice to have) is a bit silly in the age where GCC and LLVM will both reach this kind of code anywhere because they want to eliminate branches like the plague (even if they couldn't do it in the first place given that you'd need to tell it what a bool is)
May 19 2022
prev sibling next sibling parent user1234 <user1234 12.de> writes:
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.
They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e.     a = (b < c) ? 8 : 3; becomes:     a = 3 + (b < c) * 5;   // I know this is not APL syntax That works in D, too!
I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -Steve
if you use bool as integer types in LLVM true + true overflows and you basically get 0 because only 1 bit is read.
May 19 2022
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree where a bool must 
 be an integer in order to have this kind of optimization.
Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
May 19 2022
next sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.
Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
a = 3 + int(b < c) * 5; avoids forcing it with an explicit cast, lower risk of writing a bug (or creating one later in a refactor).
May 19 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/19/22 2:20 PM, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree where a 
 bool must be an integer in order to have this kind of optimization.
Does that mean you prefer:     a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
No, I find that nearly unreadable. I prefer the original: a = b < c ? 8 : 3; And let the compiler come up with whatever funky stuff it wants to in order to make it fast. -Steve
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:
 And let the compiler come up with whatever funky stuff it wants to in order to 
 make it fast.
You never write things like: a += (b < c); ? I do. And, as I remarked before, GPUs favor this style of coding, as does SIMD code, as does cryto code. Hoping the compiler will transform the code into this style, if it is not specified to, is just that, hope :-/ Sometimes this style is not necessarily faster, either, even though the user may desire it for crypto reasons.
May 19 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 21:44:55 UTC, Walter Bright wrote:
 You never write things like:

    a += (b < c);

 ? I do. And, as I remarked before, GPUs favor this style of 
 coding, as does SIMD code, as does cryto code.

 Hoping the compiler will transform the code into this style, if 
 it is not specified to, is just that, hope :-/

 Sometimes this style is not necessarily faster, either, even 
 though the user may desire it for crypto reasons.
That doesn't strike me as very convincing, because the compiler will sometime do the opposite too, so either way, at least for crypto, you have to look at the disassembly. In our case, we even instrumentalized valgrind to cause a CI failure when such a branch occurs and run it on every patch.
May 19 2022
next sibling parent max haughton <maxhaton gmail.com> writes:
On Thursday, 19 May 2022 at 22:20:06 UTC, deadalnix wrote:
 In our case, we even instrumentalized valgrind to cause a CI 
 failure when such a branch occurs and run it on every patch.
That's fun. I've never enjoyed working with the valgrind code all that much but having it around is very useful.
May 19 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 3:20 PM, deadalnix wrote:
 That doesn't strike me as very convincing, because the compiler will sometime
do 
 the opposite too,
I know this is technically possible, but have you ever seen this?
May 19 2022
parent deadalnix <deadalnix gmail.com> writes:
On Friday, 20 May 2022 at 03:06:37 UTC, Walter Bright wrote:
 On 5/19/2022 3:20 PM, deadalnix wrote:
 That doesn't strike me as very convincing, because the 
 compiler will sometime do the opposite too,
I know this is technically possible, but have you ever seen this?
Yes, when you have a bunch of ternaries based on the same condition for instance.
May 20 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/19/22 5:44 PM, Walter Bright wrote:
 On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:
 And let the compiler come up with whatever funky stuff it wants to in 
 order to make it fast.
You never write things like:    a += (b < c); ? I do.
I have written these kinds of things *sometimes*, but also would be fine writing `b < c ? 1 : 0` if required, or even `int(b < c)`. I'd happily write that in exchange for not having this happen: ```d enum A : int { a } Json j = A.a; writeln(j); // false ```
 And, as I remarked before, GPUs favor this style of coding, as 
 does SIMD code, as does cryto code.
If the optimizer can't see through the ternary expression with 2 constants, then maybe it needs updating.
 Hoping the compiler will transform the code into this style, if it is 
 not specified to, is just that, hope :-/
It's just use a better compiler. And if it doesn't, so what? Just write it with casts if a) it doesn't do what you want, and b) it's critically important. I just write it the "normal" way and move on. What if the compiler rewrites it back to the ternary expression? Either way, if you are paranoid the compiler isn't doing the right thing, you check the assembly.
 Sometimes this style is not necessarily faster, either, even though the 
 user may desire it for crypto reasons.
Which is exactly why I leave it to the experts. I want to write the clearest code possible, and let the optimizer wizards do their magic. If for some reason, the compiler isn't smart enough to figure out the right thing to do (and it bothers me to the point of investigation), then D provides so many tools to get it to spit out what you want, all the way down to inline assembly. Anyone who thinks they can predict exactly what the compiler will output for any given code is fooling themselves. If you care, check the assembly. -Steve
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:
 I'd happily write that in 
 exchange for not having this happen:
 
 ```d
 enum A : int { a }
 
 Json j = A.a;
 writeln(j); // false
 ```
I presume Json is a bool. And the bool is written as false. If it's bad that 0 implicitly converts to a bool, then it should also be bad that 0 implicitly converts to char, ubyte, byte, int, float, etc. It implies all implicit conversions should be removed. While that is a reasonable point of view, I used a language that did that (Wirth's Pascal) and found it annoying and unpleasant.
 And, as I remarked before, GPUs favor this style of coding, as does SIMD code, 
 as does cryto code.
If the optimizer can't see through the ternary expression with 2 constants, then maybe it needs updating.
To make the examples understandable, I use trivial cases.
 I want to write the clearest code possible, and let the optimizer wizards do 
 their magic.
I appreciate you want to write clear code. I do, too. The form I wrote is perfectly clear. Maybe it's just me, but I've never had any difficulty with the equivalence of: true, 1, +5V, On, Yes, T, +10V, etc. I doubt that this gives anyone trouble, either: enum Flags { A = 1, B = 2, C = 4, } int flags = A | C; if (flags & C) ... It's clear to me that there is no set of rules that will please everyone and is objectively better than the others. At some point it ceases to be useful to continue to debate it, as no resolution will satisfy everyone.
May 19 2022
next sibling parent claptrap <clap trap.com> writes:
On Friday, 20 May 2022 at 03:27:12 UTC, Walter Bright wrote:
 If it's bad that 0 implicitly converts to a bool, then it 
 should also be bad that 0 implicitly converts to char, ubyte, 
 byte, int, float, etc. It implies all implicit conversions 
 should be removed.
Why should that be so? Why do you take for granted that whatever happens to bool should happen to ubyte, int etc...
May 20 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/19/22 11:27 PM, Walter Bright wrote:
 On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:
 I'd happily write that in exchange for not having this happen:

 ```d
 enum A : int { a }

 Json j = A.a;
 writeln(j); // false
 ```
I presume Json is a bool.
No, it's not. It's a [JSON](https://json.org) container. It can accept any type on opAssign that is a valid Json type (long, bool, string, double, or another Json). It's actually [this one](https://vibed.org/api/vibe.data.json/Json).
 And the bool is written as false.
The Json is written as false, because calling the overloaded opAssign turns into a bool, because bool is an integer, and the enum integer value fits in there. It's the compiler picking the bool overload that is surprising.
 It implies 
 all implicit conversions should be removed.
No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect. However, I'm also OK with bool not converting to int implicitly if that is necessary for the type system to be sane. Using the trivial `b ? 1 : 0` conversion is not bad, and the compiler should recognize this pattern easily. I don't hold out hope for this to convince you though.
 While that is a reasonable 
 point of view, I used a language that did that (Wirth's Pascal) and 
 found it annoying and unpleasant.
I actually am fine with, even *happy* with, implicit conversions that D has, *except* the bool and char implicit conversions *from* integers. I've used Swift where implicit conversions are verboten, and it's non-stop pain.
 It's clear to me that there is no set of rules that will please everyone 
 and is objectively better than the others. At some point it ceases to be 
 useful to continue to debate it, as no resolution will satisfy everyone.
Of course. There's always a tradeoff. It comes down to, when does the language surprise you with a weird thing (like an overloaded function that takes bool accepting an int enum because it can fit)? If the choice is between those surprises and cast-inconvenience, what is worth more? There is no right answer. -Steve
May 20 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
wrote:
 It implies all implicit conversions should be removed.
No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect.
In fact, it doesn't even require implicit conversion to be removed at all. Matching bool in this case really doesn't make sense, and even by the letter of the spec I'm not sure this is right. Indeed, one of the constructor in an exact match, while the other is an implicit conversion match.
May 20 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 20 May 2022 at 16:56:50 UTC, deadalnix wrote:
 On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
 wrote:
 It implies all implicit conversions should be removed.
No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect.
In fact, it doesn't even require implicit conversion to be removed at all. Matching bool in this case really doesn't make sense, and even by the letter of the spec I'm not sure this is right. Indeed, one of the constructor in an exact match, while the other is an implicit conversion match.
In this example, both `int` and `bool` are implicit conversions, because the type of `E.a` is `E`, not `int`. So partial ordering is used to disambiguate, and the compiler (correctly) determines that the `bool` overload is more specialized than the `int` overload, because you can pass a `bool` argument to an `int` parameter but not the other way around. As soon as you allow the `E` -> `bool` implicit conversion (via VRP), everything else follows.
May 20 2022
parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:
 In this example, both `int` and `bool` are implicit 
 conversions, because the type of `E.a` is `E`, not `int`. So 
 partial ordering is used to disambiguate, and the compiler 
 (correctly) determines that the `bool` overload is more 
 specialized than the `int` overload, because you can pass a 
 `bool` argument to an `int` parameter but not the other way 
 around.

 As soon as you allow the `E` -> `bool` implicit conversion (via 
 VRP), everything else follows.
Fair enough, because of the enum. You probably don't want to cast do bool via VRP. But it also happens with integer literals, so clearly there is a problem. The way I solved it on my end is to make all the opAssign templates and use specialization, in which case it doesn't go from int to bool.
May 20 2022
parent Paul Backus <snarwin gmail.com> writes:
On Friday, 20 May 2022 at 18:41:39 UTC, deadalnix wrote:
 On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:
 In this example, both `int` and `bool` are implicit 
 conversions, because the type of `E.a` is `E`, not `int`. So 
 partial ordering is used to disambiguate, and the compiler 
 (correctly) determines that the `bool` overload is more 
 specialized than the `int` overload, because you can pass a 
 `bool` argument to an `int` parameter but not the other way 
 around.

 As soon as you allow the `E` -> `bool` implicit conversion 
 (via VRP), everything else follows.
Fair enough, because of the enum. You probably don't want to cast do bool via VRP. But it also happens with integer literals, so clearly there is a problem.
It happens with literals only if the literal type is not an exact match for the parameter type: ```d import std.stdio; void fun(int) { writeln("int"); } void fun(bool) { writeln("bool"); } void main() { fun(int(0)); // int (exact match) fun(ubyte(0)); // bool (implicit conversion) } ``` So, this case is exactly the same as the enum case. Once you allow the implicit conversion to `bool`, everything else follows from the normal language rules.
May 20 2022
prev sibling parent reply forkit <forkit gmail.com> writes:
On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
wrote:
 I actually am fine with, even *happy* with, implicit 
 conversions that D has, *except* the bool and char implicit 
 conversions *from* integers. I've used Swift where implicit 
 conversions are verboten, and it's non-stop pain.

 It's clear to me that there is no set of rules that will 
 please everyone and is objectively better than the others. At 
 some point it ceases to be useful to continue to debate it, as 
 no resolution will satisfy everyone.
Of course. There's always a tradeoff. It comes down to, when does the language surprise you with a weird thing (like an overloaded function that takes bool accepting an int enum because it can fit)? If the choice is between those surprises and cast-inconvenience, what is worth more? There is no right answer. -Steve
Actually, there is a right answer. And that's to let the programmer know when this is occuring, if the programmer wants this information (and not requiring the programmer to go look at the assembly!!!). Or, possibly, for the programmer to disable implicit casts, via some annotation -> noImplicitCasting! I do not like D being like C when it comes to the many unexpected things that can occur due to implicit type casting by the compiler. D needs to be (much) better than C.
May 20 2022
parent forkit <forkit gmail.com> writes:
On Friday, 20 May 2022 at 22:10:43 UTC, forkit wrote:

I'd like to see an option, for the compiler to output this 
information, when requested.

It does it already for other things (e.g GC).

- The location in the file where the cast occurred.
- The type being cast from.
- The type being cast to.
- The result of the cast analysis: upcast, downcast, or mismatch.
- Is it an explicit or implicit cast?
   (i.e. the programmer did it, or the compiler did it)
May 20 2022
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.
Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
No. The `*` imply a promotion of its argument. It is very easy to define bool has promoting to int without making bool an int.
May 19 2022
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 19, 2022 at 10:33:14AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:
[...]
 The APL language relies on bools being integers so conditional
 operations can be carried out without branching :-)
 
 I.e.
      a = (b < c) ? 8 : 3;
 
 becomes:
 
      a = 3 + (b < c) * 5;   // I know this is not APL syntax
 
 That works in D, too!
I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.
[...] IME, gcc and ldc2 are well able to convert the above ?: expression into the latter, without uglifying the code. Why are we promoting (or even allowing) this kind of ugly code just because dmd's optimizer is so lackluster you have to manually spell things out this way? T -- Without outlines, life would be pointless.
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 1:24 PM, H. S. Teoh wrote:
 IME, gcc and ldc2 are well able to convert the above ?: expression into
 the latter, without uglifying the code.  Why are we promoting (or even
 allowing) this kind of ugly code just because dmd's optimizer is so
 lackluster you have to manually spell things out this way?
See my reply to Steven. BTW, consider auto-vectorizing compilers. A common characteristic of them is that sometimes a loop looks like it should be vectorized, but the compiler didn't, for reasons that are opaque to users. The compiler then substitutes a slow emulation to give the *appearance* of being vectorized. The only way to tell what is happening is to dump the generate assembler. This is especially troublesome you're attempting to write vector code that is portable among various SIMD instruction sets. It doesn't scale, at all. This is based on many conversations about this with Manu Evans, who's career was based on writing vector code. Manu has been very influential in the design of D's vector semantics. Hence D's approach is different. You can write vector code in D. If it won't compile to the target instruction set, it doesn't replace it with emulation. It signals an error. Thus, the user knows if he writes vector code, he gets vector code. It makes it easy for him to use versioning to adjust the shape of the expressions to line up with the vector capabilities of each target. To sum up, if you want a particular instruction mix in the output stream, a systems programming language must enable expression of that desired mix. It must not rely on undocumented and inconsistent compiler transformations.
May 19 2022
parent reply max haughton <maxhaton gmail.com> writes:
On Thursday, 19 May 2022 at 21:55:59 UTC, Walter Bright wrote:
 On 5/19/2022 1:24 PM, H. S. Teoh wrote:
 IME, gcc and ldc2 are well able to convert the above ?: 
 expression into
 the latter, without uglifying the code.  Why are we promoting 
 (or even
 allowing) this kind of ugly code just because dmd's optimizer 
 is so
 lackluster you have to manually spell things out this way?
See my reply to Steven. BTW, consider auto-vectorizing compilers. A common characteristic of them is that sometimes a loop looks like it should be vectorized, but the compiler didn't, for reasons that are opaque to users. The compiler then substitutes a slow emulation to give the *appearance* of being vectorized.
Good compilers can actually print a report of why they didn't vectorize things. If they couldn't most of the time these days it's because the compiler was right and the programmer has a loop that the compiler can't reasonably assume is free of dependencies. https://d.godbolt.org/z/djhMhMj31 has reports enabled from gcc and llvm Intel were the cutting edge for these reports but now Intel C++ is basically dead. These reports aren't that good for instruction selection issues, granted. As an addendum, I would actually contend that most optimizers are actually far too aggressive when performing loop optimizations. https://d.godbolt.org/z/Y99zs9feh See this example. Unless you give the compiler a nudge in the right direction (i.e. You can make sure you never try to compute a factorial of 100 for example), it will generate reams and reams of code. Unless you are compiling with profile guided optimizations everything the compiler does is blind. This isn't just a question of locality but the very basics of the compilers optimizations e.g. register allocation and spill placement.
 The only way to tell what is happening is to dump the generate 
 assembler. This is especially troublesome you're attempting to 
 write vector code that is portable among various SIMD 
 instruction sets. It doesn't scale, at all.
If you're writing SIMD code without dumping the assembler anyway you're not paying enough attention. If you're going to go to all that effort you're going to be profiling the code, and any good profiler will show you the disassembly alongside. Maybe it doesn't scale in some minute sense but in practice I don't think it makes that much difference because you have to either do the work anyway, or it doesn't matter. This is still ignoring that instructions sets don't mean all that much, it's all about the microarchitecture, which once again will probably require different code. For example AMD processors present-ish and past have emulated the wider SIMD in terms of more numerous smaller execution units.
 Hence D's approach is different. You can write vector code in 
 D. If it won't compile to the target instruction set, it 
 doesn't replace it with emulation. It signals an error. Thus, 
 the user knows if he writes vector code, he gets vector code. 
 It makes it easy for him to use versioning to adjust the shape 
 of the expressions to line up with the vector capabilities of 
 each target.
LDC doesn't do this, GCC does. I don't think it actually matters, whereas if you're consuming a library from someone who didn't do the SIMD parts properly, it will at very least compile with LDC.
 To sum up, if you want a particular instruction mix in the 
 output stream, a systems programming language must enable 
 expression of that desired mix. It must not rely on 
 undocumented and inconsistent compiler transformations.
I agree, although D is getting massively out of sync with the interesting instructions even on X86. The fun stuff is not really available unless you use inline asm (or Guillaume's intrinsics library). For the non-x86 world (i.e. the vast majority of all processors sold) ARM has NEON but the future will be SVE2, these are variable width vector instructions. This isn't impossible to fit into the D_SIMD paradigm but will require for example types that only have a lower bound on their size. The RISC-V vector ISA is going in a similar direction. If I can actually get my hands on some variable-width hardware I will write D code for it ("because it's there"), but I haven't found anything cheap enough yet.
May 19 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/19/2022 3:51 PM, max haughton wrote:
 Good compilers can actually print a report of why they didn't vectorize
things. 
I guess Manu never used good compilers :-) Manu asked that a report be given in the form of an error message. Since it's what he did all day, I gave that a lot of weight. Also, the point was Manu could then adjust the code with version statements to write loops that worked best on each target, rather than suffer unacceptable degradation from the fallback emulations.
 If you're writing SIMD code without dumping the assembler anyway you're not 
 paying enough attention. If you're going to go to all that effort you're going 
 to be profiling the code, and any good profiler will show you the disassembly 
 alongside. Maybe it doesn't scale in some minute sense but in practice I don't 
 think it makes that much difference because you have to either do the work 
 anyway, or it doesn't matter.
Manu did this all day and I gave a lot of weight to what he said would work best for him. If you're writing vector operations, for a vector instruction set, the compiler should give errors if it cannot do it. Emulation code is not acceptable. I advocate disassembling, too, (remember the -vasm switch?) but disassembling and inspecting manually does not scale at all.
 LDC doesn't do this, GCC does. I don't think it actually matters, whereas if 
 you're consuming a library from someone who didn't do the SIMD parts properly, 
 it will at very least compile with LDC.
At least compiling is not good enough if you're expecting vector speed.
May 19 2022
parent reply max haughton <maxhaton gmail.com> writes:
On Friday, 20 May 2022 at 03:42:06 UTC, Walter Bright wrote:
 Manu asked that a report be given in the form of an error 
 message. Since it's what he did all day, I gave that a lot of 
 weight.

 Also, the point was Manu could then adjust the code with 
 version statements to write loops that worked best on each 
 target, rather than suffer unacceptable degradation from the 
 fallback emulations.
I think you're talking about writing SIMD code not autovectorization. The report is *not* an error message, neither literally in this case nor spiritually, it's telling you what the compiler was able to infer from your code. Automatic vectorization is *not* writing code that uses SIMD instructions directly, they're two different beasts. Typically the direct-SIMD algorithm is much faster, at the expense of being orders of magnitude slower to write: The instruction selection algorithms GCC and LLVM use simply aren't good enough to exploit all 15 billion instructions Intel have in their ISA, but they're almost literally hand-beaten to be good at SPEC benchmarks so many patterns are recognized and optimized just fine.
 If you're writing SIMD code without dumping the assembler 
 anyway you're not paying enough attention. If you're going to 
 go to all that effort you're going to be profiling the code, 
 and any good profiler will show you the disassembly alongside. 
 Maybe it doesn't scale in some minute sense but in practice I 
 don't think it makes that much difference because you have to 
 either do the work anyway, or it doesn't matter.
Manu did this all day and I gave a lot of weight to what he said would work best for him. If you're writing vector operations, for a vector instruction set, the compiler should give errors if it cannot do it. Emulation code is not acceptable.
It's not an unreasonable thing to do I just don't think it it's that much of a showstopper either way. If I *really* care about being right per platform I'm probably going to be checking CPUID at runtime anyway. LDC is the compiler people who actually ship performant D code use and I've never actually seen anyone complain about this.
 I advocate disassembling, too, (remember the -vasm switch?) but 
 disassembling and inspecting manually does not scale at all.
You *have* to do it or you are lying to yourself - even if the compiler was perfect, which they often aren't. When I use VTune I see a complete breakdown of the disassembly, source code, pipeline state, memory hierarchy, how much power the CPU used etc, temperature (Cat blocking the computer's conveniently warm exhaust?) This isn't so much about the actual instructions/intrinsics you end up with , that's just a means to an end, but rather that if you aren't keeping an eye on the performance effects of each line you add and where the performance is happening then you aren't being a good engineer e.g. you can spend too much time working on the SIMD parts of an algorithm and get distracted from the parts that are the new bottleneck (the memory hierarchy, also note that ). Despite this I do think it's still a huge failure of programming as an industry that it's a site like Compiler Explorer, or a flag like -vasm, actually needs to exist. This should be something much more deeply ingrained into our workflows, programming lags behind more serious forms of engineering when it comes to the correlation of what we think things do versus what they actually do. Aside for anyone reading: See Sites's classic article/note "It's the memory stupid" https://www.ardent-tool.com/CPU/docs/MPR/101006.pdf DEC died but he was right.
 LDC doesn't do this, GCC does. I don't think it actually 
 matters, whereas if you're consuming a library from someone 
 who didn't do the SIMD parts properly, it will at very least 
 compile with LDC.
At least compiling is not good enough if you're expecting vector speed.
You still have "vector speed" in a sense. The emulated SIMD is still good it's just not optimal, as I was saying previously there are targets where even though you *have* (say) 256 bit registers, you actually might want to use 128 bit ones in some places because newer instructions tend to be emulated (in a sense) so might not actually be worth the port pressure inside the processor. Basically everything has (a lot of) SIMD units these days, so even this emulated computation will still be pretty fast. You see SIMD instruction sets included in basically anything for more than the price of a pint of beer (Sneaky DConf Plug...), e.g. the Allwinner D1 is a cheapo RISC-V core from China, comes with a reasonably standard-compliant vector instruction set implementation. Even microcontrollers. For anyone interested the core inside the D-1 is open source https://github.com/T-head-Semi/openc906
May 19 2022
next sibling parent deadalnix <deadalnix gmail.com> writes:
On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:
 Despite this I do think it's still a huge failure of 
 programming as an industry that it's a site like Compiler 
 Explorer, or a flag like -vasm, actually needs to exist. This 
 should be something much more deeply ingrained into our 
 workflows, programming lags behind more serious forms of 
 engineering when it comes to the correlation of what we think 
 things do versus what they actually do.
$ watch gcc -S -O test.c -o - ;)
May 20 2022
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:
 You still have "vector speed" in a sense. The emulated SIMD is 
 still good it's just not optimal, as I was saying previously 
 there are targets where even though you *have* (say) 256 bit 
 registers, you actually might want to use 128 bit ones in some 
 places because newer instructions tend to be emulated (in a 
 sense) so might not actually be worth the port pressure inside 
 the processor.
Yes, it is always better to allow for gradual optimization, going from generic target to increasingly specific as you need it. Case in point, WASM doesn't support SIMD, but the WASM engines (at least one of them) recognizes the output from LLVM builtin simd and reconstruct SIMD for the CPU from sequences of regular WASM instructions. So even if the target does not support SIMD you can get SIMD performance by using "generic" SIMD in the optimizer… Things are getting much more complicated now for non-real time, just look at the Intel compiler that compiles to a mix of CPU/SIMD/GPU… That's where batch programming is heading…
May 20 2022
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.

 -Steve
It is routine to use different types in the front end and backend. Types in the front are there for semantic, correctness, and generally help the developper. Types in the backend are there to help the optimizer and the code generator. bool is going to be an integer in the backend, for sure. This doesn't mean it has to in the frontend.
May 19 2022
prev sibling parent reply matheus <matheus gmail.com> writes:
On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.
They *are* integers.
I always thought them as integers, yesterday I was adding some new features do addam_d_ruppes' IRC client and I did: auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == KeyboardEvent.Key.PageUp); So to get: -1, 0 or 1, and do the next action according the input given from the user. Matheus.
May 19 2022
parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 19 May 2022 at 16:42:58 UTC, matheus wrote:
 On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.
They *are* integers.
I always thought them as integers, yesterday I was adding some new features do addam_d_ruppes' IRC client and I did: auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == KeyboardEvent.Key.PageUp); So to get: -1, 0 or 1, and do the next action according the input given from the user. Matheus.
This doesn't imply they are integer, but that they are convertible to integers. You could do the same operation with the key being a short, and pgdir would be an int. It doesn't mean that shorts are int.
May 19 2022
prev sibling next sibling parent zjh <fqbqrr 163.com> writes:
On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:

   SDC  .
We have four `D` compiler?
May 18 2022
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/18/22 15:11, max haughton wrote:
 For example:

 float x = 'a';

 Currently compiles.
Going a little off-topic, I recommend Don Clugston's very entertaining DConf 2016 presentation "Using Floating Point Without Losing Your Sanity": http://dconf.org/2016/talks/clugston.html Ali
May 19 2022