digitalmars.D.learn - until strange behavior

Temtaime (3/3) Jun 02 2013 Why

Jack Applegame (4/7) Jun 02 2013 It is something wrong with ElementType template.

bearophile (7/10) Jun 02 2013 Try also ForeachType:

Jack Applegame (4/6) Jun 02 2013 Root is isNarrowString template.

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (7/13) Jun 02 2013 char and wchar arrays are narrow strings. Regardless, their element type...

ixid (3/5) Jun 02 2013 Do you have a link to any of the discussions? This is one of the

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/9) Jun 02 2013 I think the following link is the first time this idea was discussed:

David Nadlinger (5/6) Jun 02 2013 ElementType works as intended.

David Nadlinger (2/6) Jun 02 2013 (See also: ElementEncodingType)

Jonathan M Davis (3/7) Jun 02 2013 http://stackoverflow.com/questions/12288465

Jack Applegame (11/12) Jun 02 2013 Lets have string of chars, and it contains UTF-8 string.

Jonathan M Davis (20/34) Jun 02 2013 The language treats strings as arrays of code units. The standard librar...

Jack Applegame (4/4) Jun 02 2013 Jonathan, thanks for the detailed response.

Jonathan M Davis (10/15) Jun 02 2013 Exactly. If you want bytes, use ubyte[] or byte[] (probably ubyte[]). C+...

Jack Applegame (1/1) Jun 02 2013 Good. Now it is much clearer.

"Temtaime" <temtaime gmail.com> writes:

Why

char arr[3] = "abc";
arr[].until('b').front has type of dchar ???

Jun 02 2013

"Jack Applegame" <japplegame gmail.com> writes:

On Sunday, 2 June 2013 at 13:20:32 UTC, Temtaime wrote:
 Why

 char arr[3] = "abc";
 arr[].until('b').front has type of dchar ???

It is something wrong with ElementType template.

char arr[4] = [1,2,3,0];
writeln(ElementType!(typeof(arr[])).stringof); // writes dchar

Jun 02 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Jack Applegame:

 It is something wrong with ElementType template.

 char arr[4] = [1,2,3,0];
 writeln(ElementType!(typeof(arr[])).stringof); // writes dchar

Try also ForeachType:


I agree it's often a pain in the ass, but technically it's not a 
bug, it's working as designed.

Bye,
bearophile

Jun 02 2013

"Jack Applegame" <japplegame gmail.com> writes:

On Sunday, 2 June 2013 at 13:30:16 UTC, bearophile wrote:

 I agree it's often a pain in the ass, but technically it's not 
 a bug, it's working as designed.

Root is isNarrowString template.

Do you believe that "isNarrowString!(char[]) == true" technically 
is not a bug?

Jun 02 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 06/02/2013 06:43 AM, Jack Applegame wrote:

 On Sunday, 2 June 2013 at 13:30:16 UTC, bearophile wrote:

 I agree it's often a pain in the ass, but technically it's not a bug,
 it's working as designed.

 Root is isNarrowString template.

 Do you believe that "isNarrowString!(char[]) == true" technically is not
 a bug?

char and wchar arrays are narrow strings. Regardless, their element type 
is dchar. This causes confusion but makes sense because they are 
conceptually ranges of Unicode characters.

There were long and heated discussions when this behavior was first 
proposed.

Ali

Jun 02 2013

"ixid" <nuaccount gmail.com> writes:

 There were long and heated discussions when this behavior was 
 first proposed.

Do you have a link to any of the discussions? This is one of the 
few things that irritates me in D in that I feel like I am 
fighting to control the type unnecessarily.

Jun 02 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 06/02/2013 11:23 AM, ixid wrote:
 There were long and heated discussions when this behavior was first
 proposed.

 Do you have a link to any of the discussions? This is one of the few
 things that irritates me in D in that I feel like I am fighting to
 control the type unnecessarily.

I think the following link is the first time this idea was discussed:

   http://forum.dlang.org/thread/hkd9nl$h08$1 digitalmars.com

Ali

Jun 02 2013

"David Nadlinger" <see klickverbot.at> writes:

On Sunday, 2 June 2013 at 13:26:02 UTC, Jack Applegame wrote:
 It is something wrong with ElementType template.

ElementType works as intended.

As to whether that is a good idea… – just search the NG archives 
for "string range element type" or something like that.

David

Jun 02 2013

"David Nadlinger" <see klickverbot.at> writes:

On Sunday, 2 June 2013 at 16:21:46 UTC, David Nadlinger wrote:
 On Sunday, 2 June 2013 at 13:26:02 UTC, Jack Applegame wrote:
 ElementType works as intended.

 As to whether that is a good idea… – just search the NG 
 archives for "string range element type" or something like that.

(See also: ElementEncodingType)

Jun 02 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, June 02, 2013 15:20:31 Temtaime wrote:
 Why
 
 char arr[3] = "abc";
 arr[].until('b').front has type of dchar ???

http://stackoverflow.com/questions/12288465

- Jonathan M Davis

Jun 02 2013

"Jack Applegame" <japplegame gmail.com> writes:

On Sunday, 2 June 2013 at 20:50:31 UTC, Jonathan M Davis wrote:

 http://stackoverflow.com/questions/12288465

Lets have string of chars, and it contains UTF-8 string.
Does front(str[]) automatically convert first unicode character 
to UTF-32 and returns it?
I made a test case and answer is: "Yes, it does!"
May be this make sense. But such implicit conversion confuses 
everyone whom I asked.
Therefore, string is not ordinary array (in Phobos context), but 
special array with special processing rules.

I'm moving from C++ and often ask myself: "why D has so much 
hidden confusing things?"

Jun 02 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 03, 2013 01:04:28 Jack Applegame wrote:
 On Sunday, 2 June 2013 at 20:50:31 UTC, Jonathan M Davis wrote:
 http://stackoverflow.com/questions/12288465

 
 Lets have string of chars, and it contains UTF-8 string.
 Does front(str[]) automatically convert first unicode character
 to UTF-32 and returns it?
 I made a test case and answer is: "Yes, it does!"
 May be this make sense. But such implicit conversion confuses
 everyone whom I asked.
 Therefore, string is not ordinary array (in Phobos context), but
 special array with special processing rules.
 
 I'm moving from C++ and often ask myself: "why D has so much
 hidden confusing things?"

The language treats strings as arrays of code units. The standard library 
treats them as ranges of code points. Yes, this can be confusing, but we need 
both. In order to operate on strings efficiently, they need to be made up of 
code units, but correctness requires code points. This means that the 
complexity is to a great extent an intrinsic part of dealing with strings 
properly. In C++, people usually just screw it up and treat char as if it were  
a character when in fact it's not. It's a piece of one.

Whether we went about handling the complexity of code units vs code points in 
the best manner is debatable, but it can't be made simple if you want both 
efficiency and correctness. A better approach might have been to have a string 
type which operated on code points and held the code units internally so that 
everything operated on code points by default, but the library stuff was added 
later, and Walter Bright tends to think that everyone should understand 
Unicode well, so the decisions he makes with regards to that aren't always the 
best (since most people don't understand Unicode well and don't want to care).

What we have actually works quite well, but it does require that you come to 
at least a basic understanding of the difference between code units and code 
points.

- Jonathan M Davis

Jun 02 2013

"Jack Applegame" <japplegame gmail.com> writes:

Jonathan, thanks for the detailed response.

I think in D we should not use strings for storing "non text" 
data. For such things we must use byte[] or ubyte[]. And ranges 
will work as expected. Is it correct?

Jun 02 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, June 03, 2013 01:29:35 Jack Applegame wrote:
 Jonathan, thanks for the detailed response.
 
 I think in D we should not use strings for storing "non text"
 data. For such things we must use byte[] or ubyte[]. And ranges
 will work as expected. Is it correct?

Exactly. If you want bytes, use ubyte[] or byte[] (probably ubyte[]). C++ 
lacks such a proper type (though C99 has uint8_t). char is specifically a UTF-8 
code unit and should be treated as such.

Also, if you have text that you _know_ is ASCII, then it's more efficient to 
cast the string to immutable(ubyte)[] and operate on it that way (so that it 
doesn't do any decoding). That's not currently handled by the string-specific 
functions (though the general array and range-based ones will handle it just 
fine), but I expect that that will change.

- Jonathan M Davis

Jun 02 2013

"Jack Applegame" <japplegame gmail.com> writes:

Good. Now it is much clearer.

Jun 02 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - until strange behavior