www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Odd behaviour of std.range

reply frame <frame86 live.com> writes:
What am I missing here? Is this some UTF conversion issue?

```d
string a;
char[] b;

pragma(msg, typeof(a.take(1).front)); // dchar
pragma(msg, typeof(b.take(1).front)); // dchar
```
Feb 22 2022
next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
 What am I missing here? Is this some UTF conversion issue?
`front` is a phobos function. Phobos treats char as special than all other arrays. It was a naive design flaw that nobody has the courage to fix. Either just don't use phobos on strings (the language itself treats them sane, you can foreach etc), use the .representation member on them before putting it into any range, or ask why you're doing range operations on a string in the first place and see if the behavior actually kinda makes sense for you.
Feb 22 2022
parent frame <frame86 live.com> writes:
On Tuesday, 22 February 2022 at 12:53:03 UTC, Adam D Ruppe wrote:
 On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
 What am I missing here? Is this some UTF conversion issue?
`front` is a phobos function. Phobos treats char as special than all other arrays.
Ah, ok. It directly attaches `front` to the string, regardless of the function. That is the problem.
 It was a naive design flaw that nobody has the courage to fix.
 ... or ask why you're doing range operations on a string in the 
 first place and see if the behavior actually kinda makes sense 
 for you.
Because I needed a similar function to `tail` that takes care of the length and even it's trivial to implement it by myself, I just thought it's better to use a function that is already there.
Feb 22 2022
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
 What am I missing here? Is this some UTF conversion issue?

 ```d
 string a;
 char[] b;

 pragma(msg, typeof(a.take(1).front)); // dchar
 pragma(msg, typeof(b.take(1).front)); // dchar
 ```
This is a feature of the D standard library known as "auto decoding":
 as a convenience, when iterating over a string using the range 
 functions, each element of strings and wstrings is converted 
 into a UTF-32 code-point as each item. This practice, known as 
 auto decoding, means that

 `static assert(is(typeof(utf8.front) == dchar));`
Source: https://tour.dlang.org/tour/en/gems/unicode
Feb 22 2022
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Tuesday, 22 February 2022 at 12:48:21 UTC, frame wrote:
 What am I missing here? Is this some UTF conversion issue?

 ```d
 string a;
 char[] b;

 pragma(msg, typeof(a.take(1).front)); // dchar
 pragma(msg, typeof(b.take(1).front)); // dchar
 ```
Welcome to the world of auto decoding, D's million dollar mistake.
Feb 22 2022
parent reply frame <frame86 live.com> writes:
On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:

 Welcome to the world of auto decoding, D's million dollar 
 mistake.
Well, I think it's ok for strings but it shouldn't do it for simple arrays where it's intentional that I want to process the character and not a UTF-8 codepoint. Thank you all.
Feb 22 2022
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via Digitalmars-d-learn wrote:
 On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:
 
 Welcome to the world of auto decoding, D's million dollar mistake.
Well, I think it's ok for strings but it shouldn't do it for simple arrays
[...] In D, a string *is* an array. `string` is just an alias for `immutable(char)[]`. T -- Gone Chopin. Bach in a minuet.
Feb 22 2022
parent frame <frame86 live.com> writes:
On Tuesday, 22 February 2022 at 17:33:18 UTC, H. S. Teoh wrote:
 On Tue, Feb 22, 2022 at 05:25:18PM +0000, frame via 
 Digitalmars-d-learn wrote:
 On Tuesday, 22 February 2022 at 13:25:16 UTC, bauss wrote:
 
 Welcome to the world of auto decoding, D's million dollar 
 mistake.
Well, I think it's ok for strings but it shouldn't do it for simple arrays
[...] In D, a string *is* an array. `string` is just an alias for `immutable(char)[]`.
I know, but it's also a type that says "this data belongs together, characters will not change, it's finalized" and it makes sense that it can contain combined bytes for a code point. `char[]` is just an array to work with. It should be seen as a collection of single characters. If you want auto decoding, use a string instead.
Feb 22 2022
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 2/22/22 09:25, frame wrote:

 Well, I think it's ok for strings but it shouldn't do it for simple
 arrays
string is a simple array as well just with immutable(char) as elements. It is just an alias: alias string = immutable(char)[];
 where it's intentional that I want to process the character and
 not a UTF-8 codepoint.
I understand how auto decoding can be bad but I doubt you need to process a char. char is a UTF-8 code unit, likely one of multiple bytes that represent a Unicode character; an information encoding byte, not the information. That code unit includes encoding bits that tell the decoder whether it is the first character or a continuation character. Not many programmer will ever need to write code to decode UTF-8. Ali
Feb 22 2022