digitalmars.D - Should you be able to initialize a float with a char?
- max haughton (7/7) May 18 2022 For example:
- Paul Backus (13/20) May 18 2022 Under ["integer promotions"][1], the spec says that `char` can
- forkit (7/9) May 19 2022 It's actually rather *unsurprising* (given D's compatability
- forkit (14/14) May 19 2022 On Friday, 20 May 2022 at 02:09:43 UTC, forkit wrote:
- H. S. Teoh (11/21) May 18 2022 If you were to ask me, I'd say prohibit implicit conversions between
- Walter Bright (4/12) May 18 2022 People routinely manipulate chars as integer types, for example, in conv...
- Steven Schveighoffer (4/17) May 18 2022 Supporting addition on char types (even with char + int) is still
- max haughton (8/25) May 18 2022 People do indeed (I'd question whether it's routine in a good D
- Walter Bright (4/7) May 18 2022 Casts are a common source of bugs, not correctness. This is because it i...
- ab (21/30) May 19 2022 This can be solved by a cast with explicit source and destination
- Walter Bright (3/4) May 19 2022 This indeed can work, but when people complain about adding attributes (...
- bauss (22/26) May 19 2022 I'd argue that implicit casts are more so in some cases.
- Walter Bright (13/28) May 19 2022 D's rules added some constraints to C's rules to prevent loss of data wi...
- H. S. Teoh (27/40) May 18 2022 How is that any different from the current situation where arithmetic
- Walter Bright (13/37) May 18 2022 I generally avoid using shorts. I agree the situation is hardly ideal, b...
- bauss (11/14) May 19 2022 There is, this assumes that the character is ascii and not
- Walter Bright (3/15) May 19 2022 I know. And for many applications (like dev tools), it is fine.
- kdevel (7/19) May 19 2022 "However, the assumption that setting bit 5 of the representation
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/21) May 19 2022 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above
- kdevel (52/63) May 19 2022 The latter part, that ASCII is a subset of UTF-8, is 1†. I
- deadalnix (2/12) May 19 2022 You could have use "Ali Çehreli" as a test case :)
- kdevel (5/19) May 19 2022 One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327
- deadalnix (3/23) May 19 2022 This, or simply U+00E7 .
- kdevel (8/26) May 19 2022 Not "or" but "and". The first input contains UTF-8 of the
- =?UTF-8?Q?Ali_=c3=87ehreli?= (6/8) May 19 2022 Note that Walter did not write any function. He showed a piece of code
- Walter Bright (5/8) May 19 2022 Should stick with normalized forms for good reason. Having two different...
- deadalnix (7/9) May 19 2022 Sure, it also doesn't perform any useful operation. other than
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/9) May 19 2022 It is an experimental cypher. :)
- Walter Bright (2/4) May 19 2022 No. But D does.
- deadalnix (2/5) May 19 2022 Tell me you are American without telling me you are American.
- Walter Bright (2/3) May 19 2022 I didn't know the Aussies and Brits used umlauts.
- max haughton (2/5) May 19 2022 Maybe in some parts of scotland...
- Patrick Schluter (10/13) May 20 2022 The Brits Charlotte Brontë, Emily Brontë (and other members of
- claptrap (9/18) May 20 2022 The Brontë was originally Brunty, the father changed it to honour
- Nicholas Wilson (3/6) May 20 2022 C'mon, even the NY times (?, used to) spells it "cöperate", and
- Nick Treleaven (15/20) May 20 2022 Coöperate. (It's known as a diaeresis in English):
- user1234 (9/13) May 19 2022 it is indeed but let's be honest, having builtin char, wchar, and
- Steven Schveighoffer (6/21) May 18 2022 If you were to ask me, I'd say prohibit implicit conversions between
- Walter Bright (14/15) May 18 2022 They *are* integers.
- Steven Schveighoffer (4/20) May 19 2022 I hope we are not depending on the type system to the degree where a
- max haughton (11/33) May 19 2022 That and you can have the underlying type without exposing it to
- user1234 (5/27) May 19 2022 if you use bool as integer types in LLVM
- Walter Bright (4/6) May 19 2022 Does that mean you prefer:
- John Colvin (4/11) May 19 2022 a = 3 + int(b < c) * 5;
- Steven Schveighoffer (6/16) May 19 2022 No, I find that nearly unreadable. I prefer the original:
- Walter Bright (9/11) May 19 2022 You never write things like:
- deadalnix (6/14) May 19 2022 That doesn't strike me as very convincing, because the compiler
- max haughton (3/5) May 19 2022 That's fun. I've never enjoyed working with the valgrind code all
- Walter Bright (2/4) May 19 2022 I know this is technically possible, but have you ever seen this?
- deadalnix (3/8) May 20 2022 Yes, when you have a bunch of ternaries based on the same
- Steven Schveighoffer (27/42) May 19 2022 I have written these kinds of things *sometimes*, but also would be fine...
- Walter Bright (18/34) May 19 2022 I presume Json is a bool. And the bool is written as false. If it's bad ...
- claptrap (3/7) May 20 2022 Why should that be so? Why do you take for granted that whatever
- Steven Schveighoffer (28/48) May 20 2022 No, it's not. It's a [JSON](https://json.org) container. It can accept
- deadalnix (8/15) May 20 2022 In fact, it doesn't even require implicit conversion to be
- Paul Backus (9/26) May 20 2022 In this example, both `int` and `bool` are implicit conversions,
- deadalnix (8/17) May 20 2022 Fair enough, because of the enum. You probably don't want to cast
- Paul Backus (16/31) May 20 2022 It happens with literals only if the literal type is not an exact
- forkit (12/27) May 20 2022 Actually, there is a right answer.
- forkit (10/10) May 20 2022 On Friday, 20 May 2022 at 22:10:43 UTC, forkit wrote:
- deadalnix (3/10) May 19 2022 No. The `*` imply a promotion of its argument. It is very easy to
- H. S. Teoh (10/25) May 19 2022 [...]
- Walter Bright (20/24) May 19 2022 See my reply to Steven.
- max haughton (49/78) May 19 2022 Good compilers can actually print a report of why they didn't
- Walter Bright (13/23) May 19 2022 I guess Manu never used good compilers :-)
- max haughton (61/88) May 19 2022 I think you're talking about writing SIMD code not
- deadalnix (3/10) May 20 2022 $ watch gcc -S -O test.c -o -
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/18) May 20 2022 Yes, it is always better to allow for gradual optimization, going
- deadalnix (8/12) May 19 2022 It is routine to use different types in the front end and
- matheus (8/13) May 19 2022 I always thought them as integers, yesterday I was adding some
- deadalnix (5/19) May 19 2022 This doesn't imply they are integer, but that they are
- zjh (2/3) May 18 2022 We have four `D` compiler?
- =?UTF-8?Q?Ali_=c3=87ehreli?= (5/8) May 19 2022 Going a little off-topic, I recommend Don Clugston's very entertaining
For example: float x = 'a'; Currently compiles. I had no idea that it does but I was implementing this pattern in SDC and lo and behold it does (and thus sdc has to support it). Should it? Implicit conversions and implicit-anything around floats seem to be very undocumented in the specification too.
May 18 2022
On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:For example: float x = 'a'; Currently compiles. I had no idea that it does but I was implementing this pattern in SDC and lo and behold it does (and thus sdc has to support it). Should it? Implicit conversions and implicit-anything around floats seem to be very undocumented in the specification too.Under ["integer promotions"][1], the spec says that `char` can implicitly convert to `int`. Under ["usual arithmetic conversions"][2], the spec says (by implication) that all arithmetic types can implicitly convert to `float`. "Arithmetic type" is not explicitly defined by the spec, but [in the C99 standard][3] it means "integer and floating types." It's probably safe to assume the same definition applies to D. So I would say that according to the spec, the answer is "yes, the example should work." Though it is rather surprising. [1]: https://dlang.org/spec/type.html#integer-promotions [2]: https://dlang.org/spec/type.html#usual-arithmetic-conversions
May 18 2022
On Wednesday, 18 May 2022 at 22:24:18 UTC, Paul Backus wrote:So I would say that according to the spec, the answer is "yes, the example should work." Though it is rather surprising.It's actually rather *unsurprising* (given D's compatability needs with C). What is surprising, is that there's no compiler option to disable implicit type casts, or to disable them in safe, or *at the very least*, output a record of such casts for auditing (to help minimise bugs and vulnerabilities).
May 19 2022
On Friday, 20 May 2022 at 02:09:43 UTC, forkit wrote:D suffers from the same problem as C. // --- module test; safe: // completly useless annotation here! import std; void main() { int x = 3; int y = 4; float z =x/y; writeln(z); // 0 } // -----
May 19 2022
On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:For example: float x = 'a'; Currently compiles. I had no idea that it does but I was implementing this pattern in SDC and lo and behold it does (and thus sdc has to support it). Should it? Implicit conversions and implicit-anything around floats seem to be very undocumented in the specification too.If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case. T -- MSDOS = MicroSoft's Denial Of Service
May 18 2022
On 5/18/2022 3:31 PM, H. S. Teoh wrote:If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case.People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
May 18 2022
On 5/18/22 8:27 PM, Walter Bright wrote:On 5/18/2022 3:31 PM, H. S. Teoh wrote:Supporting addition on char types (even with char + int) is still possible without allowing implicit conversions. -SteveIf you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case.People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
May 18 2022
On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:On 5/18/2022 3:31 PM, H. S. Teoh wrote:People do indeed (I'd question whether it's routine in a good D program, I'd flag it in code review) manipulate characters as integers, but I think there's something to be said for forcing people to go char -> suitable integer -> char. We have u/byte, largely for descriptive purposes already, personally I try to use them for calculation even if the byte's value is from a char.If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case.People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
May 18 2022
On 5/18/2022 5:55 PM, max haughton wrote:People do indeed (I'd question whether it's routine in a good D program, I'd flag it in code review) manipulate characters as integers, but I think there's something to be said for forcing people to go char -> suitable integer -> char.Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.
May 18 2022
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:On 5/18/2022 5:55 PM, max haughton wrote:This can be solved by a cast with explicit source and destination type, e.g. auto cast_from(From,To,Real)(Real a) { static if (is (From==Real)) return cast(To) a; else pragma(msg, "Wrong types"); } void main() { import std.range, std.stdio; short a=1; int b=cast_from!(short,int)(a); bool c=1; // int d=cast_from!(short,int)(a); // compile time error writeln("Test: ", b); }People do indeed (I'd question whether it's routine in a good D program, I'd flag it in code review) manipulate characters as integers, but I think there's something to be said for forcing people to go char -> suitable integer -> char.Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.
May 19 2022
On 5/19/2022 12:46 AM, ab wrote:This can be solved by a cast with explicit source and destination type, e.g.This indeed can work, but when people complain about adding attributes (valid complaints), how are they going to react to having to do this?
May 19 2022
On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.I'd argue that implicit casts are more so in some cases. This is one of those cases. And also you shouldn't really do arithmetic operations on chars anyway, at least not with unicode and D is supposed to be a unicode language. Upper-casing in unicode is not as simple as an addition, because the rules for doing so are language specific. Changing case in one language isn't always the same as in another language. Even with ASCII you can't just rely on a mathematic computation, because not all characters can change case, such as symbols. That's why string/char manipulation should __always__ be a library solution, not a user-code solution. The library should handle all these rules. The user should absolutely not be able/have to to mess this up by accident, unless they really really want to. Sure a char might be represented by an integer type, but so is every single data type you can ever think of since they all convert to bytes. If D is to ever attract more users, then it must not surprise new users.
May 19 2022
On 5/19/2022 12:57 AM, bauss wrote:On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:D's rules added some constraints to C's rules to prevent loss of data with implicit casting. I don't see how D's implicit casts are a dangerous source of bugs.Casts are a common source of bugs, not correctness. This is because it is forced override of the type system. If the types change due to refactoring, the cast may no longer be correct, but the programmer will have no way of knowing.I'd argue that implicit casts are more so in some cases.And also you shouldn't really do arithmetic operations on chars anyway, at least not with unicode and D is supposed to be a unicode language.It turns out that for performance reasons, you definitely want to treat UTF-8 as individual code units. Autodecode taught us that the hard way.Upper-casing in unicode is not as simple as an addition, because the rules for doing so are language specific.I'm painfully aware that the Unicode consortium made it impossible to do "correct" Unicode without a megabyte library.Even with ASCII you can't just rely on a mathematic computation, because not all characters can change case, such as symbols.Yes, you can. I posted the code in another post in this thread. ASCII hasn't changed in my professional lifetime, and I seriously doubt it will change in yours.If D is to ever attract more users, then it must not surprise new users.The only problem we've had with D chars is autodecoding, which ironically does what you propose - treat everything as Unicode code points rather than code units. It's a great idea, but it simply does not work, and it took us years to become convinced of that.
May 19 2022
On Wed, May 18, 2022 at 05:27:24PM -0700, Walter Bright via Digitalmars-d wrote:On 5/18/2022 3:31 PM, H. S. Teoh wrote:How is that any different from the current situation where arithmetic involving short ints require casts all over the place? Even things like this require a cast: short s = 123; //s = -s; // NG s = cast(short)-s; // required excess verbiage It got so out of hand that I wrote nopromote.d, specifically to "poison" expressions involving short ints with a custom struct with overloaded ops that always truncate, just so I don't have to litter my code with casts in just about every expression involving short ints. In the case of char + int arithmetic, my opinion is that usually people do *not* (or *should* not) do char arithmetic directly -- with Unicode, it makes much less sense than the bad ole days of ASCII. These days, you'd call one of the std.uni functions for proper case mapping instead of a slipshod hack job of adding or subtracting some magic constant (which is wrong in anything except ASCII anyway). In today's day and age, strings are best treated as opaque data that are manipulated by properly-implemented string functions in the standard library. Having a few extra char/int casts in std.uni isn't the end of the world. It shouldn't usually be done in user code anyway. (And having to write lots of casts may motivate people to actually use proper string manipulation functions instead of winging it themselves with wrong implementations involving char arithmetic.) T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above. But IIRC, the last time this conversation came up, Walter's view was that they are all integral types and therefore should be interconvertible. The topic at the time was bool vs int, but the same principle holds in this case.People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.
May 18 2022
On 5/18/2022 5:57 PM, H. S. Teoh wrote:How is that any different from the current situation where arithmetic involving short ints require casts all over the place? Even things like this require a cast: short s = 123; //s = -s; // NG s = cast(short)-s; // required excess verbiageI generally avoid using shorts. I agree the situation is hardly ideal, but there is no ideal way I've ever seen. The various schemes just shift the deck chairs around.It got so out of hand that I wrote nopromote.d, specifically to "poison" expressions involving short ints with a custom struct with overloaded ops that always truncate, just so I don't have to litter my code with casts in just about every expression involving short ints.The only reason to ever use shorts is to save memory in a frequently allocated data structure. Short local variables do not save memory or time (in fact, they're larger and slower). If you're doing all these casts, perhaps look into using ints instead.In the case of char + int arithmetic, my opinion is that usually people do *not* (or *should* not) do char arithmetic directly -- with Unicode, it makes much less sense than the bad ole days of ASCII. These days, you'd call one of the std.uni functions for proper case mapping instead of a slipshod hack job of adding or subtracting some magic constant (which is wrong in anything except ASCII anyway). In today's day and age, strings are best treated as opaque data that are manipulated by properly-implemented string functions in the standard library. Having a few extra char/int casts in std.uni isn't the end of the world. It shouldn't usually be done in user code anyway. (And having to write lots of casts may motivate people to actually use proper string manipulation functions instead of winging it themselves with wrong implementations involving char arithmetic.)There's nothing wrong with: if ('A' <= c && c <= 'Z') c = c | 0x20; D doesn't have C's problems with optionally signed chars, 10 bit chars, EBCDIC, RADIX50 and other dead technologies.
May 18 2022
On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:There's nothing wrong with: if ('A' <= c && c <= 'Z') c = c | 0x20;There is, this assumes that the character is ascii and not unicode. What about say 'Ã…' -> 'Ã¥'? It won't work for that. So your code is wrong in D because D isn't an ascii langauge, but a unicode language. As specified by the spec: char '\xFF' unsigned 8 bit (UTF-8 code unit) wchar '\uFFFF' unsigned 16 bit (UTF-16 code unit) dchar '\U0000FFFF' unsigned 32 bit (UTF-32 code unit)
May 19 2022
On 5/19/2022 1:05 AM, bauss wrote:On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:It does not assume it, it tests for if it would be valid.There's nothing wrong with: Â Â Â if ('A' <= c && c <= 'Z') Â Â Â Â Â Â Â c = c | 0x20;There is, this assumes that the character is ascii and not unicode.What about say 'Ã…' -> 'Ã¥'? It won't work for that.I know. And for many applications (like dev tools), it is fine.
May 19 2022
On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:On 5/19/2022 1:05 AM, bauss wrote:"However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC." [1] [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? https://ogeek.cn/qa/?qa=669486/On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:It does not assume it, it tests for if it would be valid.There's nothing wrong with: Â Â Â if ('A' <= c && c <= 'Z') Â Â Â Â Â Â Â c = c | 0x20;There is, this assumes that the character is ascii and not unicode.
May 19 2022
On 5/19/22 12:13, kdevel wrote:On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above is valid without making any ASCII assumption. AliOn 5/19/2022 1:05 AM, bauss wrote:"However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC." [1] [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? https://ogeek.cn/qa/?qa=669486/On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:It does not assume it, it tests for if it would be valid.There's nothing wrong with: if ('A' <= c && c <= 'Z') c = c | 0x20;There is, this assumes that the character is ascii and not unicode.
May 19 2022
On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote: [...]The latter part, that ASCII is a subset of UTF-8, is 1†. I disagree with the wording of the former part, that in D a char "is" UTF-8."However, the assumption that setting bit 5 of therepresentation willconvert uppercase letters to lowercase is not valid forEBCDIC." [1][1] Does C and C++ guarantee the ASCII of [a-f] and [A-F]characters?https://ogeek.cn/qa/?qa=669486/In D, char is UTF-8 and ASCII is a subset of UTF-8.Walter's code above is valid without making any ASCII assumption.Walter made the 0 claim "It does not assume it, it tests for if it would be valid [ascii and not unicode]" [2] ‡ Okay. Let's do UTF-8: ``` import std.stdio; import std.string; import std.utf; char char_tolower_bright (char c) { if ('A' <= c && c <= 'Z') c = c | 0x20; return c; } string tolower_bright (string s) { string t; foreach (c; s.byCodeUnit) t ~= c.char_tolower_bright; return t; } void process_strings (string s) { writefln!"input : %s" (s); auto t = s.tolower_bright; writefln!"bright : %s" (t); auto u = s.toLower; writefln!"toLower (std.utf): %s" (u); } void main () { process_strings ("A Ä"); process_strings ("A Ä"); } ``` Free of charge I compiled and ran this for you: $ dmd lcb $ ./lcb input : A Ä bright : a Ä toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä See the problem? †Hint for interpretation: booleans "are" integers. [2] http://forum.dlang.org/post/t662ll$tnm$1 digitalmars.com ‡ There is probably no consensus about what "it" means.
May 19 2022
On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:Free of charge I compiled and ran this for you: $ dmd lcb $ ./lcb input : A Ä bright : a Ä toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä See the problem?You could have use "Ali Çehreli" as a test case :)
May 19 2022
On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407Free of charge I compiled and ran this for you: $ dmd lcb $ ./lcb input : A Ä bright : a Ä toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä See the problem?You could have use "Ali Çehreli" as a test case :)
May 19 2022
On Thursday, 19 May 2022 at 22:51:55 UTC, kdevel wrote:On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:This, or simply U+00E7 . Which will cause a similar problem as the one you raised.On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407Free of charge I compiled and ran this for you: $ dmd lcb $ ./lcb input : A Ä bright : a Ä toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä See the problem?You could have use "Ali Çehreli" as a test case :)
May 19 2022
On Thursday, 19 May 2022 at 23:26:57 UTC, deadalnix wrote:Not "or" but "and". The first input contains UTF-8 of the (normalized) codepoint, which is left unchanged by Walters lowercase function. The second input contains UTF-8 of the same codepoint in canonically decomposed form (NFD).This, or simply U+00E7 .One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 COMBINING CEDILLA in this case. Further Reading: [3] https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407input : A Ä bright : a Ä <-- upper case toLower (std.utf): a ä input : A Ä bright : a ä toLower (std.utf): a ä <-- lower case See the problem?You could have use "Ali Çehreli" as a test case :)Which will cause a similar problem as the one you raised.The problem is that you cannot decide only from the value of a single UTF-8 codeunit (char) if it stands for an ASCII character in the string.
May 19 2022
On 5/19/22 17:50, kdevel wrote:The first input contains UTF-8 of the (normalized) codepoint, which is left unchanged by Walters lowercase function.Note that Walter did not write any function. He showed a piece of code that would lowercase ASCII letters inside a UTF-8 encoded Unicode string. The code did not assume the string was ASCII and it did not claim to lowercase all Unicode characters. Ali
May 19 2022
On 5/19/2022 5:50 PM, kdevel wrote:Not "or" but "and". The first input contains UTF-8 of the (normalized) codepoint, which is left unchanged by Walters lowercase function. The second input contains UTF-8 of the same codepoint in canonically decomposed form (NFD).Should stick with normalized forms for good reason. Having two different sequences supposedly compare equal is an abomination. Though none of this supports the notion that arithmetic should not be done on chars. Heck, UTF-8 cannot be decoded without such arithmetic.
May 19 2022
On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above is valid without making any ASCII assumption.Sure, it also doesn't perform any useful operation. other than "Uncapitalize English, do nothing for non latin languages, and create a mess with any non English latin language", which, while it certainly is a valid program, it is doesn't looks like it is something anyone would actually want to do for other reasons than it's easy to write and good enough.
May 19 2022
On 5/19/22 15:17, deadalnix wrote:On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:It is an experimental cypher. :) AliIn D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above is valid without making any ASCII assumption.Sure, it also doesn't perform any useful operation. other than "Uncapitalize English, do nothing for non latin languages
May 19 2022
On 5/19/2022 12:13 PM, kdevel wrote:[1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters? Â Â Â https://ogeek.cn/qa/?qa=669486/No. But D does.
May 19 2022
On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:There's nothing wrong with: if ('A' <= c && c <= 'Z') c = c | 0x20;Tell me you are American without telling me you are American.
May 19 2022
On 5/19/2022 3:04 PM, deadalnix wrote:Tell me you are American without telling me you are American.I didn't know the Aussies and Brits used umlauts.
May 19 2022
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:On 5/19/2022 3:04 PM, deadalnix wrote:Maybe in some parts of scotland...Tell me you are American without telling me you are American.I didn't know the Aussies and Brits used umlauts.
May 19 2022
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:On 5/19/2022 3:04 PM, deadalnix wrote:The Brits Charlotte Brontë, Emily Brontë (and other members of the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli Sandé, John le Carré and the Australians Renée Geyer and Zoë Badwi and the Americans Beyoncé Knowles, Chloë Grace Moretz, Chloë Sevigny, Renée Fleming, Renée Zellweger, Zoë Baird, Zoë Kravitz, Donté Stallworth, John C. Frémont, Robert M. Gagné, Roxanne Shanté, Janelle Monáe, Jhené Aiko might want to have a word with you ;-)Tell me you are American without telling me you are American.I didn't know the Aussies and Brits used umlauts.
May 20 2022
On Friday, 20 May 2022 at 09:16:24 UTC, Patrick Schluter wrote:On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:The Brontë was originally Brunty, the father changed it to honour Nelson when he won some battle. Zoë Wanamaker's is American. Or least was born in the US. Emeli Sandé dad was from Zambia i think John Le Carre is a pen name, his real name is David John Moore Cornwell So often the source of the umluts in "british" names is not as straight forward as it may appear.On 5/19/2022 3:04 PM, deadalnix wrote:The Brits Charlotte Brontë, Emily Brontë (and other members of the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli Sandé, John le Carré might want to have a word with you ;-)Tell me you are American without telling me you are American.I didn't know the Aussies and Brits used umlauts.
May 20 2022
On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:On 5/19/2022 3:04 PM, deadalnix wrote:C'mon, even the NY times (?, used to) spells it "cöperate", and other repeated vowels contractions to single umlaut.Tell me you are American without telling me you are American.I didn't know the Aussies and Brits used umlauts.
May 20 2022
On Friday, 20 May 2022 at 11:54:47 UTC, Nicholas Wilson wrote:On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:Coöperate. (It's known as a diaeresis in English): "The diaeresis diacritic indicates that two adjoining letters that would normally form a digraph and be pronounced as one sound, are instead to be read as separate vowels in two syllables. For example, in the spelling 'coöperate', the diaeresis reminds the reader that the word has four syllables co-op-er-ate, not three, '*coop-er-ate'. In British English this usage has been considered obsolete for many years, and in US English, although it persisted for longer, it is now considered archaic as well.[5] Nevertheless, it is still used by the US magazine The New Yorker.[6] In English language texts it is perhaps most familiar in the spellings 'naïve', 'Noël', and 'Chloë'" WikipediaI didn't know the Aussies and Brits used umlauts.C'mon, even the NY times (?, used to) spells it "cöperate", and other repeated vowels contractions to single umlaut.
May 20 2022
On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:People routinely manipulate chars as integer types, for example, in converting case. Making them not integer types means lots of casting will become necessary, and overall that's a step backwards.it is indeed but let's be honest, having builtin char, wchar, and dchar, is only usefull for overload resolution and string literals. ubyte c = 's'; // OK ubyte[] a = "s".dup; // NG without the string literal problem, there's only the overload resolution one and for this one builtin character types could be library types, e.g struct wrapping ubyte, ushort, uint.
May 19 2022
On 5/18/22 6:31 PM, H. S. Teoh wrote:On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:If you were to ask me, I'd say prohibit implicit conversions between char types and any other types, including other char types. converting char to dchar isn't correct. But I have little hope for it, as Walter treats a boolean as an integer. -SteveFor example: float x = 'a'; Currently compiles. I had no idea that it does but I was implementing this pattern in SDC and lo and behold it does (and thus sdc has to support it). Should it? Implicit conversions and implicit-anything around floats seem to be very undocumented in the specification too.If you were to ask me, I'd say prohibit implicit conversions between char and non-char types. Otherwise you end up with nonsense code like the above.
May 18 2022
On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:But I have little hope for it, as Walter treats a boolean as an integer.They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. a = (b < c) ? 8 : 3; becomes: a = 3 + (b < c) * 5; // I know this is not APL syntax That works in D, too! Branchless code is a thing, it is used in GPUs, and in security code to make it resistant to timing attacks. You'll also see this in the SIMD instructions, although they set all bits instead of just 1, because & is faster than *. a = 3 + (-(b < c) & 5);
May 18 2022
On 5/19/22 12:35 AM, Walter Bright wrote:On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -SteveBut I have little hope for it, as Walter treats a boolean as an integer.They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. Â Â Â a = (b < c) ? 8 : 3; becomes: Â Â Â a = 3 + (b < c) * 5;Â Â // I know this is not APL syntax That works in D, too!
May 19 2022
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer wrote:On 5/19/22 12:35 AM, Walter Bright wrote:That and you can have the underlying type without exposing it to the programmer. "Bools are integers", as opposed to bools not having a memory representation at all? Basically any discussion of these peephole optimizations (if this is more than just a nice to have) is a bit silly in the age where GCC and LLVM will both reach this kind of code anywhere because they want to eliminate branches like the plague (even if they couldn't do it in the first place given that you'd need to tell it what a bool is)On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -SteveBut I have little hope for it, as Walter treats a boolean as an integer.They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. Â Â Â a = (b < c) ? 8 : 3; becomes: Â Â Â a = 3 + (b < c) * 5;Â Â // I know this is not APL syntax That works in D, too!
May 19 2022
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer wrote:On 5/19/22 12:35 AM, Walter Bright wrote:if you use bool as integer types in LLVM true + true overflows and you basically get 0 because only 1 bit is read.On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -SteveBut I have little hope for it, as Walter treats a boolean as an integer.They *are* integers. The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. Â Â Â a = (b < c) ? 8 : 3; becomes: Â Â Â a = 3 + (b < c) * 5;Â Â // I know this is not APL syntax That works in D, too!
May 19 2022
On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
May 19 2022
On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:a = 3 + int(b < c) * 5; avoids forcing it with an explicit cast, lower risk of writing a bug (or creating one later in a refactor).I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
May 19 2022
On 5/19/22 2:20 PM, Walter Bright wrote:On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:No, I find that nearly unreadable. I prefer the original: a = b < c ? 8 : 3; And let the compiler come up with whatever funky stuff it wants to in order to make it fast. -SteveI hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.Does that mean you prefer: Â Â Â a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
May 19 2022
On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:And let the compiler come up with whatever funky stuff it wants to in order to make it fast.You never write things like: a += (b < c); ? I do. And, as I remarked before, GPUs favor this style of coding, as does SIMD code, as does cryto code. Hoping the compiler will transform the code into this style, if it is not specified to, is just that, hope :-/ Sometimes this style is not necessarily faster, either, even though the user may desire it for crypto reasons.
May 19 2022
On Thursday, 19 May 2022 at 21:44:55 UTC, Walter Bright wrote:You never write things like: a += (b < c); ? I do. And, as I remarked before, GPUs favor this style of coding, as does SIMD code, as does cryto code. Hoping the compiler will transform the code into this style, if it is not specified to, is just that, hope :-/ Sometimes this style is not necessarily faster, either, even though the user may desire it for crypto reasons.That doesn't strike me as very convincing, because the compiler will sometime do the opposite too, so either way, at least for crypto, you have to look at the disassembly. In our case, we even instrumentalized valgrind to cause a CI failure when such a branch occurs and run it on every patch.
May 19 2022
On Thursday, 19 May 2022 at 22:20:06 UTC, deadalnix wrote:In our case, we even instrumentalized valgrind to cause a CI failure when such a branch occurs and run it on every patch.That's fun. I've never enjoyed working with the valgrind code all that much but having it around is very useful.
May 19 2022
On 5/19/2022 3:20 PM, deadalnix wrote:That doesn't strike me as very convincing, because the compiler will sometime do the opposite too,I know this is technically possible, but have you ever seen this?
May 19 2022
On Friday, 20 May 2022 at 03:06:37 UTC, Walter Bright wrote:On 5/19/2022 3:20 PM, deadalnix wrote:Yes, when you have a bunch of ternaries based on the same condition for instance.That doesn't strike me as very convincing, because the compiler will sometime do the opposite too,I know this is technically possible, but have you ever seen this?
May 20 2022
On 5/19/22 5:44 PM, Walter Bright wrote:On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:I have written these kinds of things *sometimes*, but also would be fine writing `b < c ? 1 : 0` if required, or even `int(b < c)`. I'd happily write that in exchange for not having this happen: ```d enum A : int { a } Json j = A.a; writeln(j); // false ```And let the compiler come up with whatever funky stuff it wants to in order to make it fast.You never write things like: Â Â a += (b < c); ? I do.And, as I remarked before, GPUs favor this style of coding, as does SIMD code, as does cryto code.If the optimizer can't see through the ternary expression with 2 constants, then maybe it needs updating.Hoping the compiler will transform the code into this style, if it is not specified to, is just that, hope :-/It's just use a better compiler. And if it doesn't, so what? Just write it with casts if a) it doesn't do what you want, and b) it's critically important. I just write it the "normal" way and move on. What if the compiler rewrites it back to the ternary expression? Either way, if you are paranoid the compiler isn't doing the right thing, you check the assembly.Sometimes this style is not necessarily faster, either, even though the user may desire it for crypto reasons.Which is exactly why I leave it to the experts. I want to write the clearest code possible, and let the optimizer wizards do their magic. If for some reason, the compiler isn't smart enough to figure out the right thing to do (and it bothers me to the point of investigation), then D provides so many tools to get it to spit out what you want, all the way down to inline assembly. Anyone who thinks they can predict exactly what the compiler will output for any given code is fooling themselves. If you care, check the assembly. -Steve
May 19 2022
On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:I'd happily write that in exchange for not having this happen: ```d enum A : int { a } Json j = A.a; writeln(j); // false ```I presume Json is a bool. And the bool is written as false. If it's bad that 0 implicitly converts to a bool, then it should also be bad that 0 implicitly converts to char, ubyte, byte, int, float, etc. It implies all implicit conversions should be removed. While that is a reasonable point of view, I used a language that did that (Wirth's Pascal) and found it annoying and unpleasant.To make the examples understandable, I use trivial cases.And, as I remarked before, GPUs favor this style of coding, as does SIMD code, as does cryto code.If the optimizer can't see through the ternary expression with 2 constants, then maybe it needs updating.I want to write the clearest code possible, and let the optimizer wizards do their magic.I appreciate you want to write clear code. I do, too. The form I wrote is perfectly clear. Maybe it's just me, but I've never had any difficulty with the equivalence of: true, 1, +5V, On, Yes, T, +10V, etc. I doubt that this gives anyone trouble, either: enum Flags { A = 1, B = 2, C = 4, } int flags = A | C; if (flags & C) ... It's clear to me that there is no set of rules that will please everyone and is objectively better than the others. At some point it ceases to be useful to continue to debate it, as no resolution will satisfy everyone.
May 19 2022
On Friday, 20 May 2022 at 03:27:12 UTC, Walter Bright wrote:If it's bad that 0 implicitly converts to a bool, then it should also be bad that 0 implicitly converts to char, ubyte, byte, int, float, etc. It implies all implicit conversions should be removed.Why should that be so? Why do you take for granted that whatever happens to bool should happen to ubyte, int etc...
May 20 2022
On 5/19/22 11:27 PM, Walter Bright wrote:On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:No, it's not. It's a [JSON](https://json.org) container. It can accept any type on opAssign that is a valid Json type (long, bool, string, double, or another Json). It's actually [this one](https://vibed.org/api/vibe.data.json/Json).I'd happily write that in exchange for not having this happen: ```d enum A : int { a } Json j = A.a; writeln(j); // false ```I presume Json is a bool.And the bool is written as false.The Json is written as false, because calling the overloaded opAssign turns into a bool, because bool is an integer, and the enum integer value fits in there. It's the compiler picking the bool overload that is surprising.It implies all implicit conversions should be removed.No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect. However, I'm also OK with bool not converting to int implicitly if that is necessary for the type system to be sane. Using the trivial `b ? 1 : 0` conversion is not bad, and the compiler should recognize this pattern easily. I don't hold out hope for this to convince you though.While that is a reasonable point of view, I used a language that did that (Wirth's Pascal) and found it annoying and unpleasant.I actually am fine with, even *happy* with, implicit conversions that D has, *except* the bool and char implicit conversions *from* integers. I've used Swift where implicit conversions are verboten, and it's non-stop pain.It's clear to me that there is no set of rules that will please everyone and is objectively better than the others. At some point it ceases to be useful to continue to debate it, as no resolution will satisfy everyone.Of course. There's always a tradeoff. It comes down to, when does the language surprise you with a weird thing (like an overloaded function that takes bool accepting an int enum because it can fit)? If the choice is between those surprises and cast-inconvenience, what is worth more? There is no right answer. -Steve
May 20 2022
On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer wrote:In fact, it doesn't even require implicit conversion to be removed at all. Matching bool in this case really doesn't make sense, and even by the letter of the spec I'm not sure this is right. Indeed, one of the constructor in an exact match, while the other is an implicit conversion match.It implies all implicit conversions should be removed.No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect.
May 20 2022
On Friday, 20 May 2022 at 16:56:50 UTC, deadalnix wrote:On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer wrote:In this example, both `int` and `bool` are implicit conversions, because the type of `E.a` is `E`, not `int`. So partial ordering is used to disambiguate, and the compiler (correctly) determines that the `bool` overload is more specialized than the `int` overload, because you can pass a `bool` argument to an `int` parameter but not the other way around. As soon as you allow the `E` -> `bool` implicit conversion (via VRP), everything else follows.In fact, it doesn't even require implicit conversion to be removed at all. Matching bool in this case really doesn't make sense, and even by the letter of the spec I'm not sure this is right. Indeed, one of the constructor in an exact match, while the other is an implicit conversion match.It implies all implicit conversions should be removed.No, not at all. bool can implicitly convert to int, char is probably fine also (thinking about the OP of this thread, I'm actually coming around to realize, it's not that bad). I don't like integers converting *to* bool or char (or dchar, etc). That would stop this problem from happening. bool being treated as an integral type is suspect.
May 20 2022
On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:In this example, both `int` and `bool` are implicit conversions, because the type of `E.a` is `E`, not `int`. So partial ordering is used to disambiguate, and the compiler (correctly) determines that the `bool` overload is more specialized than the `int` overload, because you can pass a `bool` argument to an `int` parameter but not the other way around. As soon as you allow the `E` -> `bool` implicit conversion (via VRP), everything else follows.Fair enough, because of the enum. You probably don't want to cast do bool via VRP. But it also happens with integer literals, so clearly there is a problem. The way I solved it on my end is to make all the opAssign templates and use specialization, in which case it doesn't go from int to bool.
May 20 2022
On Friday, 20 May 2022 at 18:41:39 UTC, deadalnix wrote:On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:It happens with literals only if the literal type is not an exact match for the parameter type: ```d import std.stdio; void fun(int) { writeln("int"); } void fun(bool) { writeln("bool"); } void main() { fun(int(0)); // int (exact match) fun(ubyte(0)); // bool (implicit conversion) } ``` So, this case is exactly the same as the enum case. Once you allow the implicit conversion to `bool`, everything else follows from the normal language rules.In this example, both `int` and `bool` are implicit conversions, because the type of `E.a` is `E`, not `int`. So partial ordering is used to disambiguate, and the compiler (correctly) determines that the `bool` overload is more specialized than the `int` overload, because you can pass a `bool` argument to an `int` parameter but not the other way around. As soon as you allow the `E` -> `bool` implicit conversion (via VRP), everything else follows.Fair enough, because of the enum. You probably don't want to cast do bool via VRP. But it also happens with integer literals, so clearly there is a problem.
May 20 2022
On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer wrote:I actually am fine with, even *happy* with, implicit conversions that D has, *except* the bool and char implicit conversions *from* integers. I've used Swift where implicit conversions are verboten, and it's non-stop pain.Actually, there is a right answer. And that's to let the programmer know when this is occuring, if the programmer wants this information (and not requiring the programmer to go look at the assembly!!!). Or, possibly, for the programmer to disable implicit casts, via some annotation -> noImplicitCasting! I do not like D being like C when it comes to the many unexpected things that can occur due to implicit type casting by the compiler. D needs to be (much) better than C.It's clear to me that there is no set of rules that will please everyone and is objectively better than the others. At some point it ceases to be useful to continue to debate it, as no resolution will satisfy everyone.Of course. There's always a tradeoff. It comes down to, when does the language surprise you with a weird thing (like an overloaded function that takes bool accepting an int enum because it can fit)? If the choice is between those surprises and cast-inconvenience, what is worth more? There is no right answer. -Steve
May 20 2022
On Friday, 20 May 2022 at 22:10:43 UTC, forkit wrote:I'd like to see an option, for the compiler to output this information, when requested. It does it already for other things (e.g GC). - The location in the file where the cast occurred. - The type being cast from. - The type being cast to. - The result of the cast analysis: upcast, downcast, or mismatch. - Is it an explicit or implicit cast? (i.e. the programmer did it, or the compiler did it)
May 20 2022
On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:No. The `*` imply a promotion of its argument. It is very easy to define bool has promoting to int without making bool an int.I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.Does that mean you prefer: a = 3 + cast(int)(b < c) * 5; ? If so, I don't see what is gained by that.
May 19 2022
On Thu, May 19, 2022 at 10:33:14AM -0400, Steven Schveighoffer via Digitalmars-d wrote:On 5/19/22 12:35 AM, Walter Bright wrote:[...][...] IME, gcc and ldc2 are well able to convert the above ?: expression into the latter, without uglifying the code. Why are we promoting (or even allowing) this kind of ugly code just because dmd's optimizer is so lackluster you have to manually spell things out this way? T -- Without outlines, life would be pointless.The APL language relies on bools being integers so conditional operations can be carried out without branching :-) I.e. a = (b < c) ? 8 : 3; becomes: a = 3 + (b < c) * 5; // I know this is not APL syntax That works in D, too!I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization.
May 19 2022
On 5/19/2022 1:24 PM, H. S. Teoh wrote:IME, gcc and ldc2 are well able to convert the above ?: expression into the latter, without uglifying the code. Why are we promoting (or even allowing) this kind of ugly code just because dmd's optimizer is so lackluster you have to manually spell things out this way?See my reply to Steven. BTW, consider auto-vectorizing compilers. A common characteristic of them is that sometimes a loop looks like it should be vectorized, but the compiler didn't, for reasons that are opaque to users. The compiler then substitutes a slow emulation to give the *appearance* of being vectorized. The only way to tell what is happening is to dump the generate assembler. This is especially troublesome you're attempting to write vector code that is portable among various SIMD instruction sets. It doesn't scale, at all. This is based on many conversations about this with Manu Evans, who's career was based on writing vector code. Manu has been very influential in the design of D's vector semantics. Hence D's approach is different. You can write vector code in D. If it won't compile to the target instruction set, it doesn't replace it with emulation. It signals an error. Thus, the user knows if he writes vector code, he gets vector code. It makes it easy for him to use versioning to adjust the shape of the expressions to line up with the vector capabilities of each target. To sum up, if you want a particular instruction mix in the output stream, a systems programming language must enable expression of that desired mix. It must not rely on undocumented and inconsistent compiler transformations.
May 19 2022
On Thursday, 19 May 2022 at 21:55:59 UTC, Walter Bright wrote:On 5/19/2022 1:24 PM, H. S. Teoh wrote:Good compilers can actually print a report of why they didn't vectorize things. If they couldn't most of the time these days it's because the compiler was right and the programmer has a loop that the compiler can't reasonably assume is free of dependencies. https://d.godbolt.org/z/djhMhMj31 has reports enabled from gcc and llvm Intel were the cutting edge for these reports but now Intel C++ is basically dead. These reports aren't that good for instruction selection issues, granted. As an addendum, I would actually contend that most optimizers are actually far too aggressive when performing loop optimizations. https://d.godbolt.org/z/Y99zs9feh See this example. Unless you give the compiler a nudge in the right direction (i.e. You can make sure you never try to compute a factorial of 100 for example), it will generate reams and reams of code. Unless you are compiling with profile guided optimizations everything the compiler does is blind. This isn't just a question of locality but the very basics of the compilers optimizations e.g. register allocation and spill placement.IME, gcc and ldc2 are well able to convert the above ?: expression into the latter, without uglifying the code. Why are we promoting (or even allowing) this kind of ugly code just because dmd's optimizer is so lackluster you have to manually spell things out this way?See my reply to Steven. BTW, consider auto-vectorizing compilers. A common characteristic of them is that sometimes a loop looks like it should be vectorized, but the compiler didn't, for reasons that are opaque to users. The compiler then substitutes a slow emulation to give the *appearance* of being vectorized.The only way to tell what is happening is to dump the generate assembler. This is especially troublesome you're attempting to write vector code that is portable among various SIMD instruction sets. It doesn't scale, at all.If you're writing SIMD code without dumping the assembler anyway you're not paying enough attention. If you're going to go to all that effort you're going to be profiling the code, and any good profiler will show you the disassembly alongside. Maybe it doesn't scale in some minute sense but in practice I don't think it makes that much difference because you have to either do the work anyway, or it doesn't matter. This is still ignoring that instructions sets don't mean all that much, it's all about the microarchitecture, which once again will probably require different code. For example AMD processors present-ish and past have emulated the wider SIMD in terms of more numerous smaller execution units.Hence D's approach is different. You can write vector code in D. If it won't compile to the target instruction set, it doesn't replace it with emulation. It signals an error. Thus, the user knows if he writes vector code, he gets vector code. It makes it easy for him to use versioning to adjust the shape of the expressions to line up with the vector capabilities of each target.LDC doesn't do this, GCC does. I don't think it actually matters, whereas if you're consuming a library from someone who didn't do the SIMD parts properly, it will at very least compile with LDC.To sum up, if you want a particular instruction mix in the output stream, a systems programming language must enable expression of that desired mix. It must not rely on undocumented and inconsistent compiler transformations.I agree, although D is getting massively out of sync with the interesting instructions even on X86. The fun stuff is not really available unless you use inline asm (or Guillaume's intrinsics library). For the non-x86 world (i.e. the vast majority of all processors sold) ARM has NEON but the future will be SVE2, these are variable width vector instructions. This isn't impossible to fit into the D_SIMD paradigm but will require for example types that only have a lower bound on their size. The RISC-V vector ISA is going in a similar direction. If I can actually get my hands on some variable-width hardware I will write D code for it ("because it's there"), but I haven't found anything cheap enough yet.
May 19 2022
On 5/19/2022 3:51 PM, max haughton wrote:Good compilers can actually print a report of why they didn't vectorize things.I guess Manu never used good compilers :-) Manu asked that a report be given in the form of an error message. Since it's what he did all day, I gave that a lot of weight. Also, the point was Manu could then adjust the code with version statements to write loops that worked best on each target, rather than suffer unacceptable degradation from the fallback emulations.If you're writing SIMD code without dumping the assembler anyway you're not paying enough attention. If you're going to go to all that effort you're going to be profiling the code, and any good profiler will show you the disassembly alongside. Maybe it doesn't scale in some minute sense but in practice I don't think it makes that much difference because you have to either do the work anyway, or it doesn't matter.Manu did this all day and I gave a lot of weight to what he said would work best for him. If you're writing vector operations, for a vector instruction set, the compiler should give errors if it cannot do it. Emulation code is not acceptable. I advocate disassembling, too, (remember the -vasm switch?) but disassembling and inspecting manually does not scale at all.LDC doesn't do this, GCC does. I don't think it actually matters, whereas if you're consuming a library from someone who didn't do the SIMD parts properly, it will at very least compile with LDC.At least compiling is not good enough if you're expecting vector speed.
May 19 2022
On Friday, 20 May 2022 at 03:42:06 UTC, Walter Bright wrote:Manu asked that a report be given in the form of an error message. Since it's what he did all day, I gave that a lot of weight. Also, the point was Manu could then adjust the code with version statements to write loops that worked best on each target, rather than suffer unacceptable degradation from the fallback emulations.I think you're talking about writing SIMD code not autovectorization. The report is *not* an error message, neither literally in this case nor spiritually, it's telling you what the compiler was able to infer from your code. Automatic vectorization is *not* writing code that uses SIMD instructions directly, they're two different beasts. Typically the direct-SIMD algorithm is much faster, at the expense of being orders of magnitude slower to write: The instruction selection algorithms GCC and LLVM use simply aren't good enough to exploit all 15 billion instructions Intel have in their ISA, but they're almost literally hand-beaten to be good at SPEC benchmarks so many patterns are recognized and optimized just fine.It's not an unreasonable thing to do I just don't think it it's that much of a showstopper either way. If I *really* care about being right per platform I'm probably going to be checking CPUID at runtime anyway. LDC is the compiler people who actually ship performant D code use and I've never actually seen anyone complain about this.If you're writing SIMD code without dumping the assembler anyway you're not paying enough attention. If you're going to go to all that effort you're going to be profiling the code, and any good profiler will show you the disassembly alongside. Maybe it doesn't scale in some minute sense but in practice I don't think it makes that much difference because you have to either do the work anyway, or it doesn't matter.Manu did this all day and I gave a lot of weight to what he said would work best for him. If you're writing vector operations, for a vector instruction set, the compiler should give errors if it cannot do it. Emulation code is not acceptable.I advocate disassembling, too, (remember the -vasm switch?) but disassembling and inspecting manually does not scale at all.You *have* to do it or you are lying to yourself - even if the compiler was perfect, which they often aren't. When I use VTune I see a complete breakdown of the disassembly, source code, pipeline state, memory hierarchy, how much power the CPU used etc, temperature (Cat blocking the computer's conveniently warm exhaust?) This isn't so much about the actual instructions/intrinsics you end up with , that's just a means to an end, but rather that if you aren't keeping an eye on the performance effects of each line you add and where the performance is happening then you aren't being a good engineer e.g. you can spend too much time working on the SIMD parts of an algorithm and get distracted from the parts that are the new bottleneck (the memory hierarchy, also note that ). Despite this I do think it's still a huge failure of programming as an industry that it's a site like Compiler Explorer, or a flag like -vasm, actually needs to exist. This should be something much more deeply ingrained into our workflows, programming lags behind more serious forms of engineering when it comes to the correlation of what we think things do versus what they actually do. Aside for anyone reading: See Sites's classic article/note "It's the memory stupid" https://www.ardent-tool.com/CPU/docs/MPR/101006.pdf DEC died but he was right.You still have "vector speed" in a sense. The emulated SIMD is still good it's just not optimal, as I was saying previously there are targets where even though you *have* (say) 256 bit registers, you actually might want to use 128 bit ones in some places because newer instructions tend to be emulated (in a sense) so might not actually be worth the port pressure inside the processor. Basically everything has (a lot of) SIMD units these days, so even this emulated computation will still be pretty fast. You see SIMD instruction sets included in basically anything for more than the price of a pint of beer (Sneaky DConf Plug...), e.g. the Allwinner D1 is a cheapo RISC-V core from China, comes with a reasonably standard-compliant vector instruction set implementation. Even microcontrollers. For anyone interested the core inside the D-1 is open source https://github.com/T-head-Semi/openc906LDC doesn't do this, GCC does. I don't think it actually matters, whereas if you're consuming a library from someone who didn't do the SIMD parts properly, it will at very least compile with LDC.At least compiling is not good enough if you're expecting vector speed.
May 19 2022
On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:Despite this I do think it's still a huge failure of programming as an industry that it's a site like Compiler Explorer, or a flag like -vasm, actually needs to exist. This should be something much more deeply ingrained into our workflows, programming lags behind more serious forms of engineering when it comes to the correlation of what we think things do versus what they actually do.$ watch gcc -S -O test.c -o - ;)
May 20 2022
On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:You still have "vector speed" in a sense. The emulated SIMD is still good it's just not optimal, as I was saying previously there are targets where even though you *have* (say) 256 bit registers, you actually might want to use 128 bit ones in some places because newer instructions tend to be emulated (in a sense) so might not actually be worth the port pressure inside the processor.Yes, it is always better to allow for gradual optimization, going from generic target to increasingly specific as you need it. Case in point, WASM doesn't support SIMD, but the WASM engines (at least one of them) recognizes the output from LLVM builtin simd and reconstruct SIMD for the CPU from sequences of regular WASM instructions. So even if the target does not support SIMD you can get SIMD performance by using "generic" SIMD in the optimizer… Things are getting much more complicated now for non-real time, just look at the Intel compiler that compiles to a mix of CPU/SIMD/GPU… That's where batch programming is heading…
May 20 2022
On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer wrote:I hope we are not depending on the type system to the degree where a bool must be an integer in order to have this kind of optimization. -SteveIt is routine to use different types in the front end and backend. Types in the front are there for semantic, correctness, and generally help the developper. Types in the backend are there to help the optimizer and the code generator. bool is going to be an integer in the backend, for sure. This doesn't mean it has to in the frontend.
May 19 2022
On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:I always thought them as integers, yesterday I was adding some new features do addam_d_ruppes' IRC client and I did: auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == KeyboardEvent.Key.PageUp); So to get: -1, 0 or 1, and do the next action according the input given from the user. Matheus.But I have little hope for it, as Walter treats a boolean as an integer.They *are* integers.
May 19 2022
On Thursday, 19 May 2022 at 16:42:58 UTC, matheus wrote:On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:This doesn't imply they are integer, but that they are convertible to integers. You could do the same operation with the key being a short, and pgdir would be an int. It doesn't mean that shorts are int.On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:I always thought them as integers, yesterday I was adding some new features do addam_d_ruppes' IRC client and I did: auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == KeyboardEvent.Key.PageUp); So to get: -1, 0 or 1, and do the next action according the input given from the user. Matheus.But I have little hope for it, as Walter treats a boolean as an integer.They *are* integers.
May 19 2022
On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:SDC .We have four `D` compiler?
May 18 2022
On 5/18/22 15:11, max haughton wrote:For example: float x = 'a'; Currently compiles.Going a little off-topic, I recommend Don Clugston's very entertaining DConf 2016 presentation "Using Floating Point Without Losing Your Sanity": http://dconf.org/2016/talks/clugston.html Ali
May 19 2022