digitalmars.D - Should you be able to initialize a float with a char?

max haughton (7/7) May 18 2022 For example:

Paul Backus (13/20) May 18 2022 Under ["integer promotions"][1], the spec says that `char` can

forkit (7/9) May 19 2022 It's actually rather *unsurprising* (given D's compatability

forkit (14/14) May 19 2022 On Friday, 20 May 2022 at 02:09:43 UTC, forkit wrote:

H. S. Teoh (11/21) May 18 2022 If you were to ask me, I'd say prohibit implicit conversions between

Walter Bright (4/12) May 18 2022 People routinely manipulate chars as integer types, for example, in conv...

Steven Schveighoffer (4/17) May 18 2022 Supporting addition on char types (even with char + int) is still
max haughton (8/25) May 18 2022 People do indeed (I'd question whether it's routine in a good D

Walter Bright (4/7) May 18 2022 Casts are a common source of bugs, not correctness. This is because it i...

ab (21/30) May 19 2022 This can be solved by a cast with explicit source and destination

Walter Bright (3/4) May 19 2022 This indeed can work, but when people complain about adding attributes (...

bauss (22/26) May 19 2022 I'd argue that implicit casts are more so in some cases.

Walter Bright (13/28) May 19 2022 D's rules added some constraints to C's rules to prevent loss of data wi...

H. S. Teoh (27/40) May 18 2022 How is that any different from the current situation where arithmetic

Walter Bright (13/37) May 18 2022 I generally avoid using shorts. I agree the situation is hardly ideal, b...

bauss (11/14) May 19 2022 There is, this assumes that the character is ascii and not

Walter Bright (3/15) May 19 2022 I know. And for many applications (like dev tools), it is fine.

kdevel (7/19) May 19 2022 "However, the assumption that setting bit 5 of the representation

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/21) May 19 2022 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above

kdevel (52/63) May 19 2022 The latter part, that ASCII is a subset of UTF-8, is 1†. I

deadalnix (2/12) May 19 2022 You could have use "Ali Çehreli" as a test case :)

kdevel (5/19) May 19 2022 One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327

deadalnix (3/23) May 19 2022 This, or simply U+00E7 .

kdevel (8/26) May 19 2022 Not "or" but "and". The first input contains UTF-8 of the

=?UTF-8?Q?Ali_=c3=87ehreli?= (6/8) May 19 2022 Note that Walter did not write any function. He showed a piece of code
Walter Bright (5/8) May 19 2022 Should stick with normalized forms for good reason. Having two different...

deadalnix (7/9) May 19 2022 Sure, it also doesn't perform any useful operation. other than

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/9) May 19 2022 It is an experimental cypher. :)

Walter Bright (2/4) May 19 2022 No. But D does.

deadalnix (2/5) May 19 2022 Tell me you are American without telling me you are American.

Walter Bright (2/3) May 19 2022 I didn't know the Aussies and Brits used umlauts.

max haughton (2/5) May 19 2022 Maybe in some parts of scotland...
Patrick Schluter (10/13) May 20 2022 The Brits Charlotte Brontë, Emily Brontë (and other members of

claptrap (9/18) May 20 2022 The Brontë was originally Brunty, the father changed it to honour

Nicholas Wilson (3/6) May 20 2022 C'mon, even the NY times (?, used to) spells it "cöperate", and

Nick Treleaven (15/20) May 20 2022 Coöperate. (It's known as a diaeresis in English):

user1234 (9/13) May 19 2022 it is indeed but let's be honest, having builtin char, wchar, and

Steven Schveighoffer (6/21) May 18 2022 If you were to ask me, I'd say prohibit implicit conversions between

Walter Bright (14/15) May 18 2022 They *are* integers.

Steven Schveighoffer (4/20) May 19 2022 I hope we are not depending on the type system to the degree where a

max haughton (11/33) May 19 2022 That and you can have the underlying type without exposing it to
user1234 (5/27) May 19 2022 if you use bool as integer types in LLVM
Walter Bright (4/6) May 19 2022 Does that mean you prefer:

John Colvin (4/11) May 19 2022 a = 3 + int(b < c) * 5;
Steven Schveighoffer (6/16) May 19 2022 No, I find that nearly unreadable. I prefer the original:

Walter Bright (9/11) May 19 2022 You never write things like:

deadalnix (6/14) May 19 2022 That doesn't strike me as very convincing, because the compiler

max haughton (3/5) May 19 2022 That's fun. I've never enjoyed working with the valgrind code all
Walter Bright (2/4) May 19 2022 I know this is technically possible, but have you ever seen this?

deadalnix (3/8) May 20 2022 Yes, when you have a bunch of ternaries based on the same

Steven Schveighoffer (27/42) May 19 2022 I have written these kinds of things *sometimes*, but also would be fine...

Walter Bright (18/34) May 19 2022 I presume Json is a bool. And the bool is written as false. If it's bad ...

claptrap (3/7) May 20 2022 Why should that be so? Why do you take for granted that whatever
Steven Schveighoffer (28/48) May 20 2022 No, it's not. It's a [JSON](https://json.org) container. It can accept

deadalnix (8/15) May 20 2022 In fact, it doesn't even require implicit conversion to be

Paul Backus (9/26) May 20 2022 In this example, both `int` and `bool` are implicit conversions,

deadalnix (8/17) May 20 2022 Fair enough, because of the enum. You probably don't want to cast

Paul Backus (16/31) May 20 2022 It happens with literals only if the literal type is not an exact

forkit (12/27) May 20 2022 Actually, there is a right answer.

forkit (10/10) May 20 2022 On Friday, 20 May 2022 at 22:10:43 UTC, forkit wrote:

deadalnix (3/10) May 19 2022 No. The `*` imply a promotion of its argument. It is very easy to

H. S. Teoh (10/25) May 19 2022 [...]

Walter Bright (20/24) May 19 2022 See my reply to Steven.

max haughton (49/78) May 19 2022 Good compilers can actually print a report of why they didn't

Walter Bright (13/23) May 19 2022 I guess Manu never used good compilers :-)

max haughton (61/88) May 19 2022 I think you're talking about writing SIMD code not

deadalnix (3/10) May 20 2022 $ watch gcc -S -O test.c -o -
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/18) May 20 2022 Yes, it is always better to allow for gradual optimization, going

deadalnix (8/12) May 19 2022 It is routine to use different types in the front end and

matheus (8/13) May 19 2022 I always thought them as integers, yesterday I was adding some

deadalnix (5/19) May 19 2022 This doesn't imply they are integer, but that they are

zjh (2/3) May 18 2022 We have four `D` compiler?
=?UTF-8?Q?Ali_=c3=87ehreli?= (5/8) May 19 2022 Going a little off-topic, I recommend Don Clugston's very entertaining

max haughton <maxhaton gmail.com> writes:

For example:

float x = 'a';

Currently compiles. I had no idea that it does but I was 
implementing this pattern in SDC and lo and behold it does (and 
thus sdc has to support it).

Should it? Implicit conversions and implicit-anything around 
floats seem to be very undocumented in the specification too.

May 18 2022

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:
 For example:

 float x = 'a';

 Currently compiles. I had no idea that it does but I was 
 implementing this pattern in SDC and lo and behold it does (and 
 thus sdc has to support it).

 Should it? Implicit conversions and implicit-anything around 
 floats seem to be very undocumented in the specification too.

Under ["integer promotions"][1], the spec says that `char` can 
implicitly convert to `int`. Under ["usual arithmetic 
conversions"][2], the spec says (by implication) that all 
arithmetic types can implicitly convert to `float`. "Arithmetic 
type" is not explicitly defined by the spec, but [in the C99 
standard][3] it means "integer and floating types." It's probably 
safe to assume the same definition applies to D.

So I would say that according to the spec, the answer is "yes, 
the example should work." Though it is rather surprising.

[1]: https://dlang.org/spec/type.html#integer-promotions
[2]: https://dlang.org/spec/type.html#usual-arithmetic-conversions

May 18 2022

forkit <forkit gmail.com> writes:

On Wednesday, 18 May 2022 at 22:24:18 UTC, Paul Backus wrote:
 So I would say that according to the spec, the answer is "yes, 
 the example should work." Though it is rather surprising.

It's actually rather *unsurprising* (given D's compatability 
needs with C).

What is surprising, is that there's no compiler option to disable 
implicit type casts, or to disable them in  safe, or *at the very 
least*, output a record of such casts for auditing (to help 
minimise bugs and vulnerabilities).

May 19 2022

forkit <forkit gmail.com> writes:

On Friday, 20 May 2022 at 02:09:43 UTC, forkit wrote:

D suffers from the same problem as C.

// ---

module test;
 safe: // completly useless annotation here!

import std;

void main()
{
     int x = 3;
     int y = 4;
     float z =x/y;
     writeln(z); // 0

}

// -----

May 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:
 For example:
 
 float x = 'a';
 
 Currently compiles. I had no idea that it does but I was implementing
 this pattern in SDC and lo and behold it does (and thus sdc has to
 support it).
 
 Should it? Implicit conversions and implicit-anything around floats
 seem to be very undocumented in the specification too.

If you were to ask me, I'd say prohibit implicit conversions between
char and non-char types. Otherwise you end up with nonsense code like
the above.

But IIRC, the last time this conversation came up, Walter's view was
that they are all integral types and therefore should be
interconvertible.  The topic at the time was bool vs int, but the same
principle holds in this case.


T

-- 
MSDOS = MicroSoft's Denial Of Service

May 18 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code like
 the above.
 
 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the same
 principle holds in this case.

People routinely manipulate chars as integer types, for example, in converting 
case. Making them not integer types means lots of casting will become
necessary, 
and overall that's a step backwards.

May 18 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/18/22 8:27 PM, Walter Bright wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code like
 the above.

 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the same
 principle holds in this case.

 
 People routinely manipulate chars as integer types, for example, in 
 converting case. Making them not integer types means lots of casting 
 will become necessary, and overall that's a step backwards.

Supporting addition on char types (even with char + int) is still 
possible without allowing implicit conversions.

-Steve

May 18 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions 
 between
 char and non-char types. Otherwise you end up with nonsense 
 code like
 the above.
 
 But IIRC, the last time this conversation came up, Walter's 
 view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but 
 the same
 principle holds in this case.

 People routinely manipulate chars as integer types, for 
 example, in converting case. Making them not integer types 
 means lots of casting will become necessary, and overall that's 
 a step backwards.

People do indeed (I'd question whether it's routine in a good D 
program, I'd flag it in code review) manipulate characters as 
integers, but I think there's something to be said for forcing 
people to go char -> suitable integer -> char.

We have u/byte, largely for descriptive purposes already, 
personally I try to use them for calculation even if the byte's 
value is from a char.

May 18 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/18/2022 5:55 PM, max haughton wrote:
 People do indeed (I'd question whether it's routine in a good D program, I'd 
 flag it in code review) manipulate characters as integers, but I think there's 
 something to be said for forcing people to go char -> suitable integer -> char.

Casts are a common source of bugs, not correctness. This is because it is
forced 
override of the type system. If the types change due to refactoring, the cast 
may no longer be correct, but the programmer will have no way of knowing.

May 18 2022

ab <not_a_real_address nowhere.ab> writes:

On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 On 5/18/2022 5:55 PM, max haughton wrote:
 People do indeed (I'd question whether it's routine in a good 
 D program, I'd flag it in code review) manipulate characters 
 as integers, but I think there's something to be said for 
 forcing people to go char -> suitable integer -> char.

 Casts are a common source of bugs, not correctness. This is 
 because it is forced override of the type system. If the types 
 change due to refactoring, the cast may no longer be correct, 
 but the programmer will have no way of knowing.

This can be solved by a cast with explicit source and destination 
type, e.g.


auto cast_from(From,To,Real)(Real a)
{
	static if (is (From==Real))
		return cast(To) a;
	else
		pragma(msg, "Wrong types");
}

void main()
{
     import std.range, std.stdio;
	
	short a=1;
	int b=cast_from!(short,int)(a);

	bool c=1;
	// int d=cast_from!(short,int)(a);	// compile time error
	
     writeln("Test: ", b);
}

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 12:46 AM, ab wrote:
 This can be solved by a cast with explicit source and destination type, e.g.

This indeed can work, but when people complain about adding attributes (valid 
complaints), how are they going to react to having to do this?

May 19 2022

bauss <jj_1337 live.dk> writes:

On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 Casts are a common source of bugs, not correctness. This is 
 because it is forced override of the type system. If the types 
 change due to refactoring, the cast may no longer be correct, 
 but the programmer will have no way of knowing.

I'd argue that implicit casts are more so in some cases.

This is one of those cases.

And also you shouldn't really do arithmetic operations on chars 
anyway, at least not with unicode and D is supposed to be a 
unicode language.

Upper-casing in unicode is not as simple as an addition, because 
the rules for doing so are language specific.

Changing case in one language isn't always the same as in another 
language.

Even with ASCII you can't just rely on a mathematic computation, 
because not all characters can change case, such as symbols.

That's why string/char manipulation should __always__ be a 
library solution, not a user-code solution. The library should 
handle all these rules.

The user should absolutely not be able/have to to mess this up by 
accident, unless they really really want to.

Sure a char might be represented by an integer type, but so is 
every single data type you can ever think of since they all 
convert to bytes.

If D is to ever attract more users, then it must not surprise new 
users.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 12:57 AM, bauss wrote:
 On Thursday, 19 May 2022 at 03:46:42 UTC, Walter Bright wrote:
 Casts are a common source of bugs, not correctness. This is because it is 
 forced override of the type system. If the types change due to refactoring, 
 the cast may no longer be correct, but the programmer will have no way of 
 knowing.

 
 I'd argue that implicit casts are more so in some cases.

D's rules added some constraints to C's rules to prevent loss of data with 
implicit casting. I don't see how D's implicit casts are a dangerous source of
bugs.


 And also you shouldn't really do arithmetic operations on chars anyway, at
least 
 not with unicode and D is supposed to be a unicode language.

It turns out that for performance reasons, you definitely want to treat UTF-8
as 
individual code units. Autodecode taught us that the hard way.


 Upper-casing in unicode is not as simple as an addition, because the rules for 
 doing so are language specific.

I'm painfully aware that the Unicode consortium made it impossible to do 
"correct" Unicode without a megabyte library.

 Even with ASCII you can't just rely on a mathematic computation, because not
all 
 characters can change case, such as symbols.

Yes, you can. I posted the code in another post in this thread. ASCII hasn't 
changed in my professional lifetime, and I seriously doubt it will change in
yours.

 If D is to ever attract more users, then it must not surprise new users.

The only problem we've had with D chars is autodecoding, which ironically does 
what you propose - treat everything as Unicode code points rather than code
units.

It's a great idea, but it simply does not work, and it took us years to become 
convinced of that.

May 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, May 18, 2022 at 05:27:24PM -0700, Walter Bright via Digitalmars-d wrote:
 On 5/18/2022 3:31 PM, H. S. Teoh wrote:
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code
 like the above.
 
 But IIRC, the last time this conversation came up, Walter's view was
 that they are all integral types and therefore should be
 interconvertible.  The topic at the time was bool vs int, but the
 same principle holds in this case.

 
 People routinely manipulate chars as integer types, for example, in
 converting case. Making them not integer types means lots of casting
 will become necessary, and overall that's a step backwards.

How is that any different from the current situation where arithmetic
involving short ints require casts all over the place?  Even things
like this require a cast:

	short s = 123;
	//s = -s; // NG
	s = cast(short)-s; // required excess verbiage

It got so out of hand that I wrote nopromote.d, specifically to "poison"
expressions involving short ints with a custom struct with overloaded
ops that always truncate, just so I don't have to litter my code with
casts in just about every expression involving short ints.


In the case of char + int arithmetic, my opinion is that usually people
do *not* (or *should* not) do char arithmetic directly -- with Unicode,
it makes much less sense than the bad ole days of ASCII. These days,
you'd call one of the std.uni functions for proper case mapping instead
of a slipshod hack job of adding or subtracting some magic constant
(which is wrong in anything except ASCII anyway).  In today's day and
age, strings are best treated as opaque data that are manipulated by
properly-implemented string functions in the standard library.  Having a
few extra char/int casts in std.uni isn't the end of the world.  It
shouldn't usually be done in user code anyway.  (And having to write
lots of casts may motivate people to actually use proper string
manipulation functions instead of winging it themselves with wrong
implementations involving char arithmetic.)


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be
algorithms.

May 18 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/18/2022 5:57 PM, H. S. Teoh wrote:
 How is that any different from the current situation where arithmetic
 involving short ints require casts all over the place?  Even things
 like this require a cast:
 
 	short s = 123;
 	//s = -s; // NG
 	s = cast(short)-s; // required excess verbiage

I generally avoid using shorts. I agree the situation is hardly ideal, but
there 
is no ideal way I've ever seen. The various schemes just shift the deck chairs 
around.


 It got so out of hand that I wrote nopromote.d, specifically to "poison"
 expressions involving short ints with a custom struct with overloaded
 ops that always truncate, just so I don't have to litter my code with
 casts in just about every expression involving short ints.

The only reason to ever use shorts is to save memory in a frequently allocated 
data structure. Short local variables do not save memory or time (in fact, 
they're larger and slower). If you're doing all these casts, perhaps look into 
using ints instead.


 In the case of char + int arithmetic, my opinion is that usually people
 do *not* (or *should* not) do char arithmetic directly -- with Unicode,
 it makes much less sense than the bad ole days of ASCII. These days,
 you'd call one of the std.uni functions for proper case mapping instead
 of a slipshod hack job of adding or subtracting some magic constant
 (which is wrong in anything except ASCII anyway).  In today's day and
 age, strings are best treated as opaque data that are manipulated by
 properly-implemented string functions in the standard library.  Having a
 few extra char/int casts in std.uni isn't the end of the world.  It
 shouldn't usually be done in user code anyway.  (And having to write
 lots of casts may motivate people to actually use proper string
 manipulation functions instead of winging it themselves with wrong
 implementations involving char arithmetic.)

There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

D doesn't have C's problems with optionally signed chars, 10 bit chars, EBCDIC, 
RADIX50 and other dead technologies.

May 18 2022

bauss <jj_1337 live.dk> writes:

On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

There is, this assumes that the character is ascii and not 
unicode.

What about say 'Å' -> 'å'?

It won't work for that.

So your code is wrong in D because D isn't an ascii langauge, but 
a unicode language.

As specified by the spec:

char	'\xFF'	unsigned 8 bit (UTF-8 code unit)
wchar	'\uFFFF'	unsigned 16 bit (UTF-16 code unit)
dchar	'\U0000FFFF'	unsigned 32 bit (UTF-32 code unit)

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

 
 There is, this assumes that the character is ascii and not unicode.

It does not assume it, it tests for if it would be valid.


 What about say 'Å' -> 'å'?
 
 It won't work for that.

I know. And for many applications (like dev tools), it is fine.

May 19 2022

kdevel <kdevel vogtner.de> writes:

On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
 On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

 
 There is, this assumes that the character is ascii and not 
 unicode.

 It does not assume it, it tests for if it would be valid.

"However, the assumption that setting bit 5 of the representation 
will convert uppercase letters to lowercase is not valid for 
EBCDIC." [1]

[1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] 
characters?
     https://ogeek.cn/qa/?qa=669486/

May 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 5/19/22 12:13, kdevel wrote:
 On Thursday, 19 May 2022 at 18:35:42 UTC, Walter Bright wrote:
 On 5/19/2022 1:05 AM, bauss wrote:
 On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

 There is, this assumes that the character is ascii and not unicode.

 It does not assume it, it tests for if it would be valid.

 "However, the assumption that setting bit 5 of the representation will
 convert uppercase letters to lowercase is not valid for EBCDIC." [1]

 [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
      https://ogeek.cn/qa/?qa=669486/

In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code above 
is valid without making any ASCII assumption.

Ali

May 19 2022

kdevel <kdevel vogtner.de> writes:

On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
[...]
 "However, the assumption that setting bit 5 of the

 representation will
 convert uppercase letters to lowercase is not valid for

 EBCDIC." [1]
 [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F]

 characters?
      https://ogeek.cn/qa/?qa=669486/

 In D, char is UTF-8 and ASCII is a subset of UTF-8.

The latter part, that ASCII is a subset of UTF-8, is 1†. I 
disagree with the wording of the former part, that in D a char 
"is" UTF-8.

 Walter's code above is valid without making any ASCII 
 assumption.

Walter made the 0 claim "It does not assume it, it tests for if 
it would be valid [ascii and not unicode]" [2] ‡

Okay. Let's do UTF-8:

```
import std.stdio;
import std.string;
import std.utf;

char char_tolower_bright (char c)
{
    if ('A' <= c && c <= 'Z')
       c = c | 0x20;
    return c;
}

string tolower_bright (string s)
{
    string t;
    foreach (c; s.byCodeUnit)
       t ~= c.char_tolower_bright;
    return t;
}

void process_strings (string s)
{
    writefln!"input            : %s" (s);
    auto t = s.tolower_bright;
    writefln!"bright           : %s" (t);
    auto u = s.toLower;
    writefln!"toLower (std.utf): %s" (u);
}

void main ()
{
    process_strings ("A Ä");
    process_strings ("A Ä");
}
```

Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

See the problem?

   † Hint for interpretation: booleans "are" integers.
[2] http://forum.dlang.org/post/t662ll$tnm$1 digitalmars.com
   ‡ There is probably no consensus about what "it" means.

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?

You could have use "Ali Çehreli" as a test case :)

May 19 2022

kdevel <kdevel vogtner.de> writes:

On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:
 On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?

 You could have use "Ali Çehreli" as a test case :)

One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 
COMBINING CEDILLA in this case. Further Reading:

[3] 
https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 22:51:55 UTC, kdevel wrote:
 On Thursday, 19 May 2022 at 22:24:35 UTC, deadalnix wrote:
 On Thursday, 19 May 2022 at 22:14:31 UTC, kdevel wrote:
 Free of charge I compiled and ran this for you:

    $ dmd lcb
    $ ./lcb
    input            : A Ä
    bright           : a Ä
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä

 See the problem?

 You could have use "Ali Çehreli" as a test case :)

 One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 
 COMBINING CEDILLA in this case. Further Reading:

 [3] 
 https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407

This, or simply U+00E7 .

Which will cause a similar problem as the one you raised.

May 19 2022

kdevel <kdevel vogtner.de> writes:

On Thursday, 19 May 2022 at 23:26:57 UTC, deadalnix wrote:
    input            : A Ä
    bright           : a Ä <-- upper case
    toLower (std.utf): a ä
    input            : A Ä
    bright           : a ä
    toLower (std.utf): a ä <-- lower case

 See the problem?

 You could have use "Ali Çehreli" as a test case :)

 One needs to combine U+0043 LATIN CAPITAL LETTER C + U+0327 
 COMBINING CEDILLA in this case. Further Reading:

 [3] 
 https://medium.com/ sthadewald/the-utf-8-hell-of-mac-osx-feef5ea42407

 This, or simply U+00E7 .

Not "or" but "and". The first input contains UTF-8 of the 
(normalized) codepoint, which is left unchanged by Walters 
lowercase function. The second input contains UTF-8 of the same 
codepoint in canonically decomposed form (NFD).

 Which will cause a similar problem as the one you raised.

The problem is that you cannot decide only from the value of a 
single UTF-8 codeunit (char) if it stands for an ASCII character 
in the string.

May 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 5/19/22 17:50, kdevel wrote:

 The first input contains UTF-8 of the (normalized)
 codepoint, which is left unchanged by Walters lowercase function.

Note that Walter did not write any function. He showed a piece of code 
that would lowercase ASCII letters inside a UTF-8 encoded Unicode string.

The code did not assume the string was ASCII and it did not claim to 
lowercase all Unicode characters.

Ali

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 5:50 PM, kdevel wrote:
 Not "or" but "and". The first input contains UTF-8 of the (normalized) 
 codepoint, which is left unchanged by Walters lowercase function. The second 
 input contains UTF-8 of the same codepoint in canonically decomposed form
(NFD).

Should stick with normalized forms for good reason. Having two different 
sequences supposedly compare equal is an abomination.

Though none of this supports the notion that arithmetic should not be done on 
chars. Heck, UTF-8 cannot be decoded without such arithmetic.

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's 
 code above is valid without making any ASCII assumption.

Sure, it also doesn't perform any useful operation. other than 
"Uncapitalize English, do nothing for non latin languages, and 
create a mess with any non English latin language", which, while 
it certainly is a valid program, it is doesn't looks like it is 
something anyone would actually want to do for other reasons than 
it's easy to write and good enough.

May 19 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 5/19/22 15:17, deadalnix wrote:
 On Thursday, 19 May 2022 at 20:48:47 UTC, Ali Çehreli wrote:
 In D, char is UTF-8 and ASCII is a subset of UTF-8. Walter's code
 above is valid without making any ASCII assumption.

 Sure, it also doesn't perform any useful operation. other than
 "Uncapitalize English, do nothing for non latin languages

It is an experimental cypher. :)

Ali

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 12:13 PM, kdevel wrote:
 [1] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?
      https://ogeek.cn/qa/?qa=669486/

No. But D does.

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 04:10:04 UTC, Walter Bright wrote:
 There's nothing wrong with:

     if ('A' <= c && c <= 'Z')
         c = c | 0x20;

Tell me you are American without telling me you are American.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.

I didn't know the Aussies and Brits used umlauts.

May 19 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.

 I didn't know the Aussies and Brits used umlauts.

Maybe in some parts of scotland...

May 19 2022

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.

 I didn't know the Aussies and Brits used umlauts.

The Brits Charlotte Brontë, Emily Brontë (and other members of 
the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli 
Sandé, John le Carré
and the Australians  Renée Geyer and Zoë Badwi
and the Americans  Beyoncé Knowles, Chloë Grace Moretz, Chloë 
Sevigny, Renée Fleming, Renée Zellweger, Zoë Baird, Zoë Kravitz, 
Donté Stallworth, John C. Frémont, Robert M. Gagné, Roxanne 
Shanté, Janelle Monáe, Jhené Aiko

might want to have a word with you ;-)

May 20 2022

claptrap <clap trap.com> writes:

On Friday, 20 May 2022 at 09:16:24 UTC, Patrick Schluter wrote:
 On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.

 I didn't know the Aussies and Brits used umlauts.

 The Brits Charlotte Brontë, Emily Brontë (and other members of 
 the Brontë family), Noël Coward, Zoë Wanamaker, Zoë Ball, Emeli 
 Sandé, John le Carré

 might want to have a word with you ;-)

The Brontë was originally Brunty, the father changed it to honour 
Nelson when he won some battle.

Zoë Wanamaker's is American. Or least was born in the US.

Emeli Sandé dad was from Zambia i think

John Le Carre is a pen name, his real name is David John Moore 
Cornwell

So often the source of the umluts in "british" names is not as 
straight forward as it may appear.

May 20 2022

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 On 5/19/2022 3:04 PM, deadalnix wrote:
 Tell me you are American without telling me you are American.

 I didn't know the Aussies and Brits used umlauts.

C'mon, even the NY times (?, used to) spells it "cöperate", and 
other repeated vowels contractions to single umlaut.

May 20 2022

Nick Treleaven <nick geany.org> writes:

On Friday, 20 May 2022 at 11:54:47 UTC, Nicholas Wilson wrote:
 On Friday, 20 May 2022 at 03:45:23 UTC, Walter Bright wrote:
 I didn't know the Aussies and Brits used umlauts.

 C'mon, even the NY times (?, used to) spells it "cöperate", and 
 other repeated vowels contractions to single umlaut.

Coöperate. (It's known as a diaeresis in English):
"The diaeresis diacritic indicates that two adjoining letters 
that would normally form a digraph and be pronounced as one 
sound, are instead to be read as separate vowels in two 
syllables. For example, in the spelling 'coöperate', the 
diaeresis reminds the reader that the word has four syllables 
co-op-er-ate, not three, '*coop-er-ate'. In British English this 
usage has been considered obsolete for many years, and in US 
English, although it persisted for longer, it is now considered 
archaic as well.[5] Nevertheless, it is still used by the US 
magazine The New Yorker.[6] In English language texts it is 
perhaps most familiar in the spellings 'naïve', 'Noël', and 
'Chloë'"
Wikipedia

May 20 2022

user1234 <user1234 12.de> writes:

On Thursday, 19 May 2022 at 00:27:24 UTC, Walter Bright wrote:
 People routinely manipulate chars as integer types, for 
 example, in converting case. Making them not integer types 
 means lots of casting will become necessary, and overall that's 
 a step backwards.

it is indeed but let's be honest, having builtin char, wchar, and 
dchar, is only usefull for overload resolution and string 
literals.

     ubyte c = 's'; // OK
     ubyte[] a = "s".dup; // NG

without the string literal problem, there's only the overload 
resolution one and for this one builtin character types could be 
library types, e.g struct wrapping ubyte, ushort, uint.

May 19 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/18/22 6:31 PM, H. S. Teoh wrote:
 On Wed, May 18, 2022 at 10:11:34PM +0000, max haughton via Digitalmars-d wrote:
 For example:

 float x = 'a';

 Currently compiles. I had no idea that it does but I was implementing
 this pattern in SDC and lo and behold it does (and thus sdc has to
 support it).

 Should it? Implicit conversions and implicit-anything around floats
 seem to be very undocumented in the specification too.

 
 If you were to ask me, I'd say prohibit implicit conversions between
 char and non-char types. Otherwise you end up with nonsense code like
 the above.

If you were to ask me, I'd say prohibit implicit conversions between 
char types and any other types, including other char types. converting 
char to dchar isn't correct.

But I have little hope for it, as Walter treats a boolean as an integer.

-Steve

May 18 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as an integer.

They *are* integers.

The APL language relies on bools being integers so conditional operations can
be 
carried out without branching :-)

I.e.
     a = (b < c) ? 8 : 3;

becomes:

     a = 3 + (b < c) * 5;   // I know this is not APL syntax

That works in D, too!

Branchless code is a thing, it is used in GPUs, and in security code to make it 
resistant to timing attacks.

You'll also see this in the SIMD instructions, although they set all bits 
instead of just 1, because & is faster than *.

     a = 3 + (-(b < c) & 5);

May 18 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as an integer.

 
 They *are* integers.
 
 The APL language relies on bools being integers so conditional 
 operations can be carried out without branching :-)
 
 I.e.
      a = (b < c) ? 8 : 3;
 
 becomes:
 
      a = 3 + (b < c) * 5;   // I know this is not APL syntax
 
 That works in D, too!

I hope we are not depending on the type system to the degree where a 
bool must be an integer in order to have this kind of optimization.

-Steve

May 19 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.

 
 They *are* integers.
 
 The APL language relies on bools being integers so conditional 
 operations can be carried out without branching :-)
 
 I.e.
      a = (b < c) ? 8 : 3;
 
 becomes:
 
      a = 3 + (b < c) * 5;   // I know this is not APL syntax
 
 That works in D, too!

 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.

 -Steve

That and you can have the underlying type without exposing it to 
the programmer. "Bools are integers", as opposed to bools not 
having a memory representation at all?

Basically any discussion of these peephole optimizations (if this 
is more than just a nice to have) is a bit silly in the age where 
GCC and LLVM will both reach this kind of code anywhere because 
they want to eliminate branches like the plague (even if they 
couldn't do it in the first place given that you'd need to tell 
it what a bool is)

May 19 2022

user1234 <user1234 12.de> writes:

On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.

 
 They *are* integers.
 
 The APL language relies on bools being integers so conditional 
 operations can be carried out without branching :-)
 
 I.e.
      a = (b < c) ? 8 : 3;
 
 becomes:
 
      a = 3 + (b < c) * 5;   // I know this is not APL syntax
 
 That works in D, too!

 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.

 -Steve

if you use bool as integer types in LLVM

    true + true

overflows  and you basically get 0 because only 1 bit is read.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree where a bool must 
 be an integer in order to have this kind of optimization.


Does that mean you prefer:

     a = 3 + cast(int)(b < c) * 5;

? If so, I don't see what is gained by that.

May 19 2022

John Colvin <john.loughran.colvin gmail.com> writes:

On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.


 Does that mean you prefer:

     a = 3 + cast(int)(b < c) * 5;

 ? If so, I don't see what is gained by that.

a = 3 + int(b < c) * 5;

avoids forcing it with an explicit cast, lower risk of writing a 
bug (or creating one later in a refactor).

May 19 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/19/22 2:20 PM, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree where a 
 bool must be an integer in order to have this kind of optimization.

 
 
 Does that mean you prefer:
 
      a = 3 + cast(int)(b < c) * 5;
 
 ? If so, I don't see what is gained by that.

No, I find that nearly unreadable. I prefer the original:

     a = b < c ? 8 : 3;

And let the compiler come up with whatever funky stuff it wants to in 
order to make it fast.

-Steve

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:
 And let the compiler come up with whatever funky stuff it wants to in order to 
 make it fast.

You never write things like:

    a += (b < c);

? I do. And, as I remarked before, GPUs favor this style of coding, as does
SIMD 
code, as does cryto code.

Hoping the compiler will transform the code into this style, if it is not 
specified to, is just that, hope :-/

Sometimes this style is not necessarily faster, either, even though the user
may 
desire it for crypto reasons.

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 21:44:55 UTC, Walter Bright wrote:
 You never write things like:

    a += (b < c);

 ? I do. And, as I remarked before, GPUs favor this style of 
 coding, as does SIMD code, as does cryto code.

 Hoping the compiler will transform the code into this style, if 
 it is not specified to, is just that, hope :-/

 Sometimes this style is not necessarily faster, either, even 
 though the user may desire it for crypto reasons.

That doesn't strike me as very convincing, because the compiler 
will sometime do the opposite too, so either way, at least for 
crypto, you have to look at the disassembly.

In our case, we even instrumentalized valgrind to cause a CI 
failure when such a branch occurs and run it on every patch.

May 19 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 19 May 2022 at 22:20:06 UTC, deadalnix wrote:
 In our case, we even instrumentalized valgrind to cause a CI 
 failure when such a branch occurs and run it on every patch.

That's fun. I've never enjoyed working with the valgrind code all 
that much but having it around is very useful.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 3:20 PM, deadalnix wrote:
 That doesn't strike me as very convincing, because the compiler will sometime
do 
 the opposite too,

I know this is technically possible, but have you ever seen this?

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Friday, 20 May 2022 at 03:06:37 UTC, Walter Bright wrote:
 On 5/19/2022 3:20 PM, deadalnix wrote:
 That doesn't strike me as very convincing, because the 
 compiler will sometime do the opposite too,

 I know this is technically possible, but have you ever seen 
 this?

Yes, when you have a bunch of ternaries based on the same 
condition for instance.

May 20 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/19/22 5:44 PM, Walter Bright wrote:
 On 5/19/2022 1:01 PM, Steven Schveighoffer wrote:
 And let the compiler come up with whatever funky stuff it wants to in 
 order to make it fast.

 
 You never write things like:
 
     a += (b < c);
 
 ? I do.

I have written these kinds of things *sometimes*, but also would be fine 
writing `b < c ? 1 : 0` if required, or even `int(b < c)`. I'd happily 
write that in exchange for not having this happen:

```d
enum A : int { a }

Json j = A.a;
writeln(j); // false
```

 And, as I remarked before, GPUs favor this style of coding, as 
 does SIMD code, as does cryto code.

If the optimizer can't see through the ternary expression with 2 
constants, then maybe it needs updating.

 Hoping the compiler will transform the code into this style, if it is 
 not specified to, is just that, hope :-/

It's just use a better compiler. And if it doesn't, so what? Just write 
it with casts if a) it doesn't do what you want, and b) it's critically 
important. I just write it the "normal" way and move on.

What if the compiler rewrites it back to the ternary expression? Either 
way, if you are paranoid the compiler isn't doing the right thing, you 
check the assembly.

 Sometimes this style is not necessarily faster, either, even though the 
 user may desire it for crypto reasons.

Which is exactly why I leave it to the experts.

I want to write the clearest code possible, and let the optimizer 
wizards do their magic.

If for some reason, the compiler isn't smart enough to figure out the 
right thing to do (and it bothers me to the point of investigation), 
then D provides so many tools to get it to spit out what you want, all 
the way down to inline assembly. Anyone who thinks they can predict 
exactly what the compiler will output for any given code is fooling 
themselves. If you care, check the assembly.

-Steve

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:
 I'd happily write that in 
 exchange for not having this happen:
 
 ```d
 enum A : int { a }
 
 Json j = A.a;
 writeln(j); // false
 ```

I presume Json is a bool. And the bool is written as false. If it's bad that 0 
implicitly converts to a bool, then it should also be bad that 0 implicitly 
converts to char, ubyte, byte, int, float, etc. It implies all implicit 
conversions should be removed. While that is a reasonable point of view, I used 
a language that did that (Wirth's Pascal) and found it annoying and unpleasant.


 And, as I remarked before, GPUs favor this style of coding, as does SIMD code, 
 as does cryto code.

 
 If the optimizer can't see through the ternary expression with 2 constants,
then 
 maybe it needs updating.

To make the examples understandable, I use trivial cases.


 I want to write the clearest code possible, and let the optimizer wizards do 
 their magic.

I appreciate you want to write clear code. I do, too. The form I wrote is 
perfectly clear. Maybe it's just me, but I've never had any difficulty with the 
equivalence of:

   true, 1, +5V, On, Yes, T, +10V, etc.

I doubt that this gives anyone trouble, either:

enum Flags { A = 1, B = 2, C = 4, }

int flags = A | C;
if (flags & C) ...

It's clear to me that there is no set of rules that will please everyone and is 
objectively better than the others. At some point it ceases to be useful to 
continue to debate it, as no resolution will satisfy everyone.

May 19 2022

claptrap <clap trap.com> writes:

On Friday, 20 May 2022 at 03:27:12 UTC, Walter Bright wrote:
 If it's bad that 0 implicitly converts to a bool, then it 
 should also be bad that 0 implicitly converts to char, ubyte, 
 byte, int, float, etc. It implies all implicit conversions 
 should be removed.

Why should that be so? Why do you take for granted that whatever 
happens to bool should happen to ubyte, int etc...

May 20 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 5/19/22 11:27 PM, Walter Bright wrote:
 On 5/19/2022 5:06 PM, Steven Schveighoffer wrote:
 I'd happily write that in exchange for not having this happen:

 ```d
 enum A : int { a }

 Json j = A.a;
 writeln(j); // false
 ```

 
 I presume Json is a bool.

No, it's not. It's a [JSON](https://json.org) container. It can accept 
any type on opAssign that is a valid Json type (long, bool, string, 
double, or another Json).

It's actually [this one](https://vibed.org/api/vibe.data.json/Json).

 And the bool is written as false.

The Json is written as false, because calling the overloaded opAssign 
turns into a bool, because bool is an integer, and the enum integer 
value fits in there. It's the compiler picking the bool overload that is 
surprising.

 It implies 
 all implicit conversions should be removed.

No, not at all. bool can implicitly convert to int, char is probably 
fine also (thinking about the OP of this thread, I'm actually coming 
around to realize, it's not that bad). I don't like integers converting 
*to* bool or char (or dchar, etc). That would stop this problem from 
happening. bool being treated as an integral type is suspect.

However, I'm also OK with bool not converting to int implicitly if that 
is necessary for the type system to be sane. Using the trivial `b ? 1 : 
0` conversion is not bad, and the compiler should recognize this pattern 
easily. I don't hold out hope for this to convince you though.

 While that is a reasonable 
 point of view, I used a language that did that (Wirth's Pascal) and 
 found it annoying and unpleasant.

I actually am fine with, even *happy* with, implicit conversions that D 
has, *except* the bool and char implicit conversions *from* integers. 
I've used Swift where implicit conversions are verboten, and it's 
non-stop pain.

 It's clear to me that there is no set of rules that will please everyone 
 and is objectively better than the others. At some point it ceases to be 
 useful to continue to debate it, as no resolution will satisfy everyone.

Of course. There's always a tradeoff. It comes down to, when does the 
language surprise you with a weird thing (like an overloaded function 
that takes bool accepting an int enum because it can fit)? If the choice 
is between those surprises and cast-inconvenience, what is worth more? 
There is no right answer.

-Steve

May 20 2022

deadalnix <deadalnix gmail.com> writes:

On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
wrote:
 It implies all implicit conversions should be removed.

 No, not at all. bool can implicitly convert to int, char is 
 probably fine also (thinking about the OP of this thread, I'm 
 actually coming around to realize, it's not that bad). I don't 
 like integers converting *to* bool or char (or dchar, etc). 
 That would stop this problem from happening. bool being treated 
 as an integral type is suspect.

In fact, it doesn't even require implicit conversion to be 
removed at all. Matching bool in this case really doesn't make 
sense, and even by the letter of the spec I'm not sure this is 
right.

Indeed, one of the constructor in an exact match, while the other 
is an implicit conversion match.

May 20 2022

Paul Backus <snarwin gmail.com> writes:

On Friday, 20 May 2022 at 16:56:50 UTC, deadalnix wrote:
 On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
 wrote:
 It implies all implicit conversions should be removed.

 No, not at all. bool can implicitly convert to int, char is 
 probably fine also (thinking about the OP of this thread, I'm 
 actually coming around to realize, it's not that bad). I don't 
 like integers converting *to* bool or char (or dchar, etc). 
 That would stop this problem from happening. bool being 
 treated as an integral type is suspect.

 In fact, it doesn't even require implicit conversion to be 
 removed at all. Matching bool in this case really doesn't make 
 sense, and even by the letter of the spec I'm not sure this is 
 right.

 Indeed, one of the constructor in an exact match, while the 
 other is an implicit conversion match.

In this example, both `int` and `bool` are implicit conversions, 
because the type of `E.a` is `E`, not `int`. So partial ordering 
is used to disambiguate, and the compiler (correctly) determines 
that the `bool` overload is more specialized than the `int` 
overload, because you can pass a `bool` argument to an `int` 
parameter but not the other way around.

As soon as you allow the `E` -> `bool` implicit conversion (via 
VRP), everything else follows.

May 20 2022

deadalnix <deadalnix gmail.com> writes:

On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:
 In this example, both `int` and `bool` are implicit 
 conversions, because the type of `E.a` is `E`, not `int`. So 
 partial ordering is used to disambiguate, and the compiler 
 (correctly) determines that the `bool` overload is more 
 specialized than the `int` overload, because you can pass a 
 `bool` argument to an `int` parameter but not the other way 
 around.

 As soon as you allow the `E` -> `bool` implicit conversion (via 
 VRP), everything else follows.

Fair enough, because of the enum. You probably don't want to cast 
do bool via VRP.

But it also happens with integer literals, so clearly there is a 
problem.

The way I solved it on my end is to make all the opAssign 
templates and use specialization, in which case it doesn't go 
from int to bool.

May 20 2022

Paul Backus <snarwin gmail.com> writes:

On Friday, 20 May 2022 at 18:41:39 UTC, deadalnix wrote:
 On Friday, 20 May 2022 at 17:15:07 UTC, Paul Backus wrote:
 In this example, both `int` and `bool` are implicit 
 conversions, because the type of `E.a` is `E`, not `int`. So 
 partial ordering is used to disambiguate, and the compiler 
 (correctly) determines that the `bool` overload is more 
 specialized than the `int` overload, because you can pass a 
 `bool` argument to an `int` parameter but not the other way 
 around.

 As soon as you allow the `E` -> `bool` implicit conversion 
 (via VRP), everything else follows.

 Fair enough, because of the enum. You probably don't want to 
 cast do bool via VRP.

 But it also happens with integer literals, so clearly there is 
 a problem.

It happens with literals only if the literal type is not an exact 
match for the parameter type:

```d
import std.stdio;

void fun(int) { writeln("int"); }
void fun(bool) { writeln("bool"); }

void main()
{
     fun(int(0)); // int (exact match)
     fun(ubyte(0)); // bool (implicit conversion)
}
```

So, this case is exactly the same as the enum case. Once you 
allow the implicit conversion to `bool`, everything else follows 
from the normal language rules.

May 20 2022

forkit <forkit gmail.com> writes:

On Friday, 20 May 2022 at 16:02:15 UTC, Steven Schveighoffer 
wrote:
 I actually am fine with, even *happy* with, implicit 
 conversions that D has, *except* the bool and char implicit 
 conversions *from* integers. I've used Swift where implicit 
 conversions are verboten, and it's non-stop pain.

 It's clear to me that there is no set of rules that will 
 please everyone and is objectively better than the others. At 
 some point it ceases to be useful to continue to debate it, as 
 no resolution will satisfy everyone.

 Of course. There's always a tradeoff. It comes down to, when 
 does the language surprise you with a weird thing (like an 
 overloaded function that takes bool accepting an int enum 
 because it can fit)? If the choice is between those surprises 
 and cast-inconvenience, what is worth more? There is no right 
 answer.

 -Steve

Actually, there is a right answer.

And that's to let the programmer know when this is occuring, if 
the programmer wants this information (and not requiring the 
programmer to go look at the assembly!!!).

Or, possibly, for the programmer to disable implicit casts, via 
some annotation ->  noImplicitCasting!

I do not like D being like C when it comes to the many unexpected 
things that can occur due to implicit type casting by the 
compiler.

D needs to be (much) better than C.

May 20 2022

forkit <forkit gmail.com> writes:

On Friday, 20 May 2022 at 22:10:43 UTC, forkit wrote:

I'd like to see an option, for the compiler to output this 
information, when requested.

It does it already for other things (e.g GC).

- The location in the file where the cast occurred.
- The type being cast from.
- The type being cast to.
- The result of the cast analysis: upcast, downcast, or mismatch.
- Is it an explicit or implicit cast?
   (i.e. the programmer did it, or the compiler did it)

May 20 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 18:20:26 UTC, Walter Bright wrote:
 On 5/19/2022 7:33 AM, Steven Schveighoffer wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.


 Does that mean you prefer:

     a = 3 + cast(int)(b < c) * 5;

 ? If so, I don't see what is gained by that.

No. The `*` imply a promotion of its argument. It is very easy to 
define bool has promoting to int without making bool an int.

May 19 2022

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, May 19, 2022 at 10:33:14AM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 On 5/19/22 12:35 AM, Walter Bright wrote:

[...]
 The APL language relies on bools being integers so conditional
 operations can be carried out without branching :-)
 
 I.e.
  ��� a = (b < c) ? 8 : 3;
 
 becomes:
 
  ��� a = 3 + (b < c) * 5;�� // I know this is not APL syntax
 
 That works in D, too!

 
 I hope we are not depending on the type system to the degree where a
 bool must be an integer in order to have this kind of optimization.

[...]

IME, gcc and ldc2 are well able to convert the above ?: expression into
the latter, without uglifying the code.  Why are we promoting (or even
allowing) this kind of ugly code just because dmd's optimizer is so
lackluster you have to manually spell things out this way?


T

-- 
Without outlines, life would be pointless.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 1:24 PM, H. S. Teoh wrote:
 IME, gcc and ldc2 are well able to convert the above ?: expression into
 the latter, without uglifying the code.  Why are we promoting (or even
 allowing) this kind of ugly code just because dmd's optimizer is so
 lackluster you have to manually spell things out this way?

See my reply to Steven.

BTW, consider auto-vectorizing compilers. A common characteristic of them is 
that sometimes a loop looks like it should be vectorized, but the compiler 
didn't, for reasons that are opaque to users. The compiler then substitutes a 
slow emulation to give the *appearance* of being vectorized.

The only way to tell what is happening is to dump the generate assembler. This 
is especially troublesome you're attempting to write vector code that is 
portable among various SIMD instruction sets. It doesn't scale, at all.

This is based on many conversations about this with Manu Evans, who's career
was 
based on writing vector code. Manu has been very influential in the design of 
D's vector semantics.

Hence D's approach is different. You can write vector code in D. If it won't 
compile to the target instruction set, it doesn't replace it with emulation. It 
signals an error. Thus, the user knows if he writes vector code, he gets vector 
code. It makes it easy for him to use versioning to adjust the shape of the 
expressions to line up with the vector capabilities of each target.

To sum up, if you want a particular instruction mix in the output stream, a 
systems programming language must enable expression of that desired mix. It
must 
not rely on undocumented and inconsistent compiler transformations.

May 19 2022

max haughton <maxhaton gmail.com> writes:

On Thursday, 19 May 2022 at 21:55:59 UTC, Walter Bright wrote:
 On 5/19/2022 1:24 PM, H. S. Teoh wrote:
 IME, gcc and ldc2 are well able to convert the above ?: 
 expression into
 the latter, without uglifying the code.  Why are we promoting 
 (or even
 allowing) this kind of ugly code just because dmd's optimizer 
 is so
 lackluster you have to manually spell things out this way?

 See my reply to Steven.

 BTW, consider auto-vectorizing compilers. A common 
 characteristic of them is that sometimes a loop looks like it 
 should be vectorized, but the compiler didn't, for reasons that 
 are opaque to users. The compiler then substitutes a slow 
 emulation to give the *appearance* of being vectorized.

Good compilers can actually print a report of why they didn't 
vectorize things. If they couldn't most of the time these days 
it's because the compiler was right and the programmer has a loop 
that the compiler can't reasonably assume is free of dependencies.

https://d.godbolt.org/z/djhMhMj31 has reports enabled from gcc 
and llvm

Intel were the cutting edge for these reports but now Intel C++ 
is basically dead.

These reports aren't that good for instruction selection issues, 
granted.

As an addendum, I would actually contend that most optimizers are 
actually far too aggressive when performing loop optimizations.

https://d.godbolt.org/z/Y99zs9feh See this example. Unless you 
give the compiler a nudge in the right direction (i.e. You can 
make sure you never try to compute a factorial of 100 for 
example), it will generate reams and reams of code.

Unless you are compiling with profile guided optimizations 
everything the compiler does is blind. This isn't just a question 
of locality but the very basics of the compilers optimizations 
e.g. register allocation and spill placement.

 The only way to tell what is happening is to dump the generate 
 assembler. This is especially troublesome you're attempting to 
 write vector code that is portable among various SIMD 
 instruction sets. It doesn't scale, at all.

If you're writing SIMD code without dumping the assembler anyway 
you're not paying enough attention. If you're going to go to all 
that effort you're going to be profiling the code, and any good 
profiler will show you the disassembly alongside. Maybe it 
doesn't scale in some minute sense but in practice I don't think 
it makes that much difference because you have to either do the 
work anyway, or it doesn't matter.

This is still ignoring that instructions sets don't mean all that 
much, it's all about the microarchitecture, which once again will 
probably require different code. For example AMD processors 
present-ish and past have emulated the wider SIMD in terms of 
more numerous smaller execution units.

 Hence D's approach is different. You can write vector code in 
 D. If it won't compile to the target instruction set, it 
 doesn't replace it with emulation. It signals an error. Thus, 
 the user knows if he writes vector code, he gets vector code. 
 It makes it easy for him to use versioning to adjust the shape 
 of the expressions to line up with the vector capabilities of 
 each target.

LDC doesn't do this, GCC does. I don't think it actually matters, 
whereas if you're consuming a library from someone who didn't do 
the SIMD parts properly, it will at very least compile with LDC.

 To sum up, if you want a particular instruction mix in the 
 output stream, a systems programming language must enable 
 expression of that desired mix. It must not rely on 
 undocumented and inconsistent compiler transformations.

I agree, although D is getting massively out of sync with the 
interesting instructions even on X86. The fun stuff is not really 
available unless you use inline asm (or Guillaume's intrinsics 
library).

For the non-x86 world (i.e. the vast majority of all processors 
sold) ARM has NEON but the future will be SVE2, these are 
variable width vector instructions. This isn't impossible to fit 
into the D_SIMD paradigm but will require for example types that 
only have a lower bound on their size.

The RISC-V vector ISA is going in a similar direction.

If I can actually get my hands on some variable-width hardware I 
will write D code for it ("because it's there"), but I haven't 
found anything cheap enough yet.

May 19 2022

Walter Bright <newshound2 digitalmars.com> writes:

On 5/19/2022 3:51 PM, max haughton wrote:
 Good compilers can actually print a report of why they didn't vectorize
things. 

I guess Manu never used good compilers :-)

Manu asked that a report be given in the form of an error message. Since it's 
what he did all day, I gave that a lot of weight.

Also, the point was Manu could then adjust the code with version statements to 
write loops that worked best on each target, rather than suffer unacceptable 
degradation from the fallback emulations.


 If you're writing SIMD code without dumping the assembler anyway you're not 
 paying enough attention. If you're going to go to all that effort you're going 
 to be profiling the code, and any good profiler will show you the disassembly 
 alongside. Maybe it doesn't scale in some minute sense but in practice I don't 
 think it makes that much difference because you have to either do the work 
 anyway, or it doesn't matter.

Manu did this all day and I gave a lot of weight to what he said would work
best 
for him. If you're writing vector operations, for a vector instruction set, the 
compiler should give errors if it cannot do it. Emulation code is not
acceptable.

I advocate disassembling, too, (remember the -vasm switch?) but disassembling 
and inspecting manually does not scale at all.


 LDC doesn't do this, GCC does. I don't think it actually matters, whereas if 
 you're consuming a library from someone who didn't do the SIMD parts properly, 
 it will at very least compile with LDC.

At least compiling is not good enough if you're expecting vector speed.

May 19 2022

max haughton <maxhaton gmail.com> writes:

On Friday, 20 May 2022 at 03:42:06 UTC, Walter Bright wrote:
 Manu asked that a report be given in the form of an error 
 message. Since it's what he did all day, I gave that a lot of 
 weight.

 Also, the point was Manu could then adjust the code with 
 version statements to write loops that worked best on each 
 target, rather than suffer unacceptable degradation from the 
 fallback emulations.

I think you're talking about writing SIMD code not 
autovectorization. The report is *not* an error message, neither 
literally in this case nor spiritually, it's telling you what the 
compiler was able to infer from your code. Automatic 
vectorization is *not* writing code that uses SIMD instructions 
directly, they're two different beasts.

Typically the direct-SIMD algorithm is much faster, at the 
expense of being orders of magnitude slower to write: The 
instruction selection algorithms GCC and LLVM use simply aren't 
good enough to exploit all 15 billion instructions Intel have in 
their ISA, but they're almost literally hand-beaten to be good at 
SPEC benchmarks so many patterns are recognized and optimized 
just fine.

 If you're writing SIMD code without dumping the assembler 
 anyway you're not paying enough attention. If you're going to 
 go to all that effort you're going to be profiling the code, 
 and any good profiler will show you the disassembly alongside. 
 Maybe it doesn't scale in some minute sense but in practice I 
 don't think it makes that much difference because you have to 
 either do the work anyway, or it doesn't matter.

 Manu did this all day and I gave a lot of weight to what he 
 said would work best for him. If you're writing vector 
 operations, for a vector instruction set, the compiler should 
 give errors if it cannot do it. Emulation code is not 
 acceptable.

It's not an unreasonable thing to do I just don't think it it's 
that much of a showstopper either way. If I *really* care about 
being right per platform I'm probably going to be checking CPUID 
at runtime anyway.

LDC is the compiler people who actually ship performant D code 
use and I've never actually seen anyone complain about this.

 I advocate disassembling, too, (remember the -vasm switch?) but 
 disassembling and inspecting manually does not scale at all.

You *have* to do it or you are lying to yourself - even if the 
compiler was perfect, which they often aren't. When I use VTune I 
see a complete breakdown of the disassembly, source code, 
pipeline state, memory hierarchy, how much power the CPU used 
etc, temperature (Cat blocking the computer's conveniently warm 
exhaust?)

This isn't so much about the actual instructions/intrinsics you 
end up with , that's just a means to an end, but rather that if 
you aren't keeping an eye on the performance effects of each line 
you add and where the performance is happening then you aren't 
being a good engineer e.g. you can spend too much time working on 
the SIMD parts of an algorithm and get distracted from the parts 
that are the new bottleneck (the memory hierarchy, also note that 
).

Despite this I do think it's still a huge failure of programming 
as an industry that it's a site like Compiler Explorer, or a flag 
like -vasm, actually needs to exist. This should be something 
much more deeply ingrained into our workflows, programming lags 
behind more serious forms of engineering when it comes to the 
correlation of what we think things do versus what they actually 
do.

Aside for anyone reading:
See Sites's classic article/note "It's the memory stupid" 
https://www.ardent-tool.com/CPU/docs/MPR/101006.pdf DEC died but 
he was right.


 LDC doesn't do this, GCC does. I don't think it actually 
 matters, whereas if you're consuming a library from someone 
 who didn't do the SIMD parts properly, it will at very least 
 compile with LDC.

 At least compiling is not good enough if you're expecting 
 vector speed.

You still have "vector speed" in a sense. The emulated SIMD is 
still good it's just not optimal, as I was saying previously 
there are targets where even though you *have* (say) 256 bit 
registers, you actually might want to use 128 bit ones in some 
places because newer instructions tend to be emulated (in a 
sense) so might not actually be worth the port pressure inside 
the processor.

Basically everything has (a lot of) SIMD units these days, so 
even this emulated computation will still be pretty fast. You see 
SIMD instruction sets included in basically anything for more 
than the price of a pint of beer (Sneaky DConf Plug...), e.g. the 
Allwinner D1 is a cheapo RISC-V core from China, comes with a 
reasonably standard-compliant vector instruction set 
implementation. Even microcontrollers.

For anyone interested the core inside the D-1 is open source 
https://github.com/T-head-Semi/openc906

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:
 Despite this I do think it's still a huge failure of 
 programming as an industry that it's a site like Compiler 
 Explorer, or a flag like -vasm, actually needs to exist. This 
 should be something much more deeply ingrained into our 
 workflows, programming lags behind more serious forms of 
 engineering when it comes to the correlation of what we think 
 things do versus what they actually do.

$ watch gcc -S -O test.c -o -

;)

May 20 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 20 May 2022 at 04:34:28 UTC, max haughton wrote:
 You still have "vector speed" in a sense. The emulated SIMD is 
 still good it's just not optimal, as I was saying previously 
 there are targets where even though you *have* (say) 256 bit 
 registers, you actually might want to use 128 bit ones in some 
 places because newer instructions tend to be emulated (in a 
 sense) so might not actually be worth the port pressure inside 
 the processor.

Yes, it is always better to allow for gradual optimization, going 
from generic target to increasingly specific as you need it. Case 
in point, WASM doesn't support SIMD, but the WASM engines (at 
least one of them) recognizes the output from LLVM builtin simd 
and reconstruct SIMD for the CPU from sequences of regular WASM 
instructions. So even if the target does not support SIMD you can 
get SIMD performance by using "generic" SIMD in the optimizer…

Things are getting much more complicated now for non-real time, 
just look at the Intel compiler that compiles to a mix of 
CPU/SIMD/GPU… That's where batch programming is heading…

May 20 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 14:33:14 UTC, Steven Schveighoffer 
wrote:
 I hope we are not depending on the type system to the degree 
 where a bool must be an integer in order to have this kind of 
 optimization.

 -Steve

It is routine to use different types in the front end and 
backend. Types in the front are there for semantic, correctness, 
and generally help the developper. Types in the backend are there 
to help the optimizer and the code generator.

bool is going to be an integer in the backend, for sure. This 
doesn't mean it has to in the frontend.

May 19 2022

matheus <matheus gmail.com> writes:

On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.

 They *are* integers.
 

I always thought them as integers, yesterday I was adding some 
new features do addam_d_ruppes' IRC client and I did:

    auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == 
KeyboardEvent.Key.PageUp);

So to get: -1, 0 or 1, and do the next action according the input 
given from the user.

Matheus.

May 19 2022

deadalnix <deadalnix gmail.com> writes:

On Thursday, 19 May 2022 at 16:42:58 UTC, matheus wrote:
 On Thursday, 19 May 2022 at 04:35:45 UTC, Walter Bright wrote:
 On 5/18/2022 5:47 PM, Steven Schveighoffer wrote:
 But I have little hope for it, as Walter treats a boolean as 
 an integer.

 They *are* integers.
 

 I always thought them as integers, yesterday I was adding some 
 new features do addam_d_ruppes' IRC client and I did:

    auto pgdir = (ev.key == Keyboard.Key.PageDown)-(ev.key == 
 KeyboardEvent.Key.PageUp);

 So to get: -1, 0 or 1, and do the next action according the 
 input given from the user.

 Matheus.

This doesn't imply they are integer, but that they are 
convertible to integers.

You could do the same operation with the key being a short, and 
pgdir would be an int. It doesn't mean that shorts are int.

May 19 2022

zjh <fqbqrr 163.com> writes:

On Wednesday, 18 May 2022 at 22:11:34 UTC, max haughton wrote:

   SDC  .

We have four `D` compiler?

May 18 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 5/18/22 15:11, max haughton wrote:
 For example:

 float x = 'a';

 Currently compiles.

Going a little off-topic, I recommend Don Clugston's very entertaining 
DConf 2016 presentation "Using Floating Point Without Losing Your Sanity":

   http://dconf.org/2016/talks/clugston.html

Ali

May 19 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Should you be able to initialize a float with a char?