digitalmars.D - TDPL reaches Thermopylae level

Andrei Alexandrescu (2/2) Oct 25 2009 303 pages and counting!

Walter Bright (2/3) Oct 25 2009 Come and get them!
Jeremie Pelletier (2/5) Oct 26 2009 Soon the PI level, or at least 10 times PI!

Bill Baxter (3/9) Oct 26 2009 A hundred even. ;-)

Andrei Alexandrescu (8/17) Oct 26 2009 Coming along. I'm writing about strings and Unicode right now. I was

Bill Baxter (7/28) Oct 26 2009 So a common way to convert wchar to char might then become ""~myWcharStr...

Andrei Alexandrescu (10/35) Oct 26 2009 Well, I guess. In particular, to me it's not clear what type we should

Chris Nicholson-Sauls (5/38) Oct 27 2009 My intuition would be to expect the same as adding an int to a byte: you...

Denis Koroskin (9/47) Oct 27 2009 ubyte i = 42;

Bill Baxter (9/67) Oct 27 2009 ar

Andrei Alexandrescu (14/25) Oct 27 2009 Yah, I agree. The problem is, there's a big difference too: all

Michel Fortin (8/12) Oct 27 2009 Seems the most intuitive option to me. Also, it makes "a ~= b"

Bill Baxter (6/14) Oct 27 2009 And that kind of suggests to me that even a = b should work.

Andrei Alexandrescu (8/23) Oct 27 2009 I agree. This one, however, will be very difficult to slide by Walter's
=?ISO-8859-1?Q?Pelle_M=E5nsson?= (5/23) Oct 27 2009 int a;

Bill Baxter (9/38) Oct 27 2009 by

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (3/35) Oct 27 2009 They are?

Bill Baxter (17/62) Oct 27 2009 om>

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (2/59) Oct 27 2009 Thank you, that cleared things up for me :)
Leandro Lucarella (12/31) Oct 27 2009 And here is a nice artible about Unicode and encodings:

Andrei Alexandrescu (4/27) Oct 27 2009 Damn guys, with these good explanations, nobody's going to use the one

Leandro Lucarella (9/35) Oct 27 2009 :)
Leandro Lucarella (11/38) Oct 29 2009 BTW, seeing the explanation about Unicode in your book, one wonders why

Justin Johansson (9/49) Oct 27 2009 Though I'm sure Shannon would say that the number of bits of intrinsic i...

Chris Nicholson-Sauls (13/63) Oct 29 2009 Granted LTR is common enough to be expectable and acceptable. To be per...

Justin Johansson (3/17) Oct 29 2009 Your overall reply well put. On last point: agree; cheap hacks should b...
Nick Sabalausky (10/15) Oct 29 2009 Given that just about anything outside of D (at least as far as I've see...

Lars T. Kyllingstad (4/21) Oct 30 2009 I think this says it all:

Andrei Alexandrescu (8/34) Oct 30 2009 Yep, there was a frenzy when UCS-2 came about: everybody thought two

Justin Johansson (16/52) Oct 30 2009 "I personally think UTF-8 is a better overall design though."

Andrei Alexandrescu (11/70) Oct 30 2009 Thanks for the pointers. One of the reasons for which I like the design

Jeremie Pelletier (10/32) Oct 26 2009 I don't know if thats a good idea, its better when string encoding is

Andrei Alexandrescu (10/45) Oct 26 2009 The beauty of it is that reallocation with ~ occurs anyway, and with ~=

Jeremie Pelletier (5/58) Oct 26 2009 Good points, I didn't think of the separation between characters and

Bill Baxter (12/77) Oct 26 2009 Yeh, me too. Saving an allocation is good. And I agree that having

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

303 pages and counting!

Andrei

Oct 25 2009

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 303 pages and counting!

Come and get them!

Oct 25 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 303 pages and counting!
 
 Andrei

Soon the PI level, or at least 10 times PI!

Oct 26 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

A hundred even. ;-)

--bb

Oct 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 
 A hundred even. ;-)

Coming along. I'm writing about strings and Unicode right now. I was 
wondering what people think about allowing concatenation (with ~ and ~=) 
of strings of different character widths. The support library could do 
all of the transcoding.

(I understand that concatenating an array of wchar or char with a dchar 
is already in bugzilla.)


Andrei

Oct 26 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and ~=) of
 strings of different character widths. The support library could do all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a dchar is
 already in bugzilla.)

So a common way to convert wchar to char might then become ""~myWcharString?

That seems kind of odd.  Just using something like
to!(char[])(myWcharString) seems less goofy to me.

But that subjective reaction is all I have against it.

--bb

Oct 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and ~=) of
 strings of different character widths. The support library could do all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a dchar is
 already in bugzilla.)

 
 So a common way to convert wchar to char might then become ""~myWcharString?
 
 That seems kind of odd.

Well, I guess. In particular, to me it's not clear what type we should 
assign to a concatenation between a string and a wstring. With ~=, it's 
much easier...

  Just using something like
 to!(char[])(myWcharString) seems less goofy to me.

Problem is, an append + one transcoding requires two allocations. We 
could always define routines in std.string or std.utf:

append(s, ws); // s ~= ws

but really it's quite unambiguous what ~= should do. A nod from the 
language is a nice touch.


Andrei

Oct 26 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and 
 ~=) of
 strings of different character widths. The support library could do 
 all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a 
 dchar is
 already in bugzilla.)

 So a common way to convert wchar to char might then become 
 ""~myWcharString?

 That seems kind of odd.

 
 Well, I guess. In particular, to me it's not clear what type we should 
 assign to a concatenation between a string and a wstring. With ~=, it's 
 much easier...
 

My intuition would be to expect the same as adding an int to a byte: you get an
int. 
Concatenating a string and a wstring should yield a wstring; ie, encode to the
wider of 
the two types.

-- Chris Nicholson-Sauls

Oct 27 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 27 Oct 2009 10:04:33 +0300, Chris Nicholson-Sauls  
<ibisbasenji gmail.com> wrote:

 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier  
 <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and  
 ~=) of
 strings of different character widths. The support library could do  
 all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a  
 dchar is
 already in bugzilla.)

 So a common way to convert wchar to char might then become  
 ""~myWcharString?

 That seems kind of odd.

  Well, I guess. In particular, to me it's not clear what type we should  
 assign to a concatenation between a string and a wstring. With ~=, it's  
 much easier...

 My intuition would be to expect the same as adding an int to a byte: you  
 get an int. Concatenating a string and a wstring should yield a wstring;  
 ie, encode to the wider of the two types.

 -- Chris Nicholson-Sauls

ubyte i = 42;
int j = 1;

i += j; // still ubyte

same here:

string a = "hello";
wstring b = "world"w;

a ~= b; // still string

Oct 27 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Oct 27, 2009 at 4:37 AM, Denis Koroskin <2korden gmail.com> wrote:
 On Tue, 27 Oct 2009 10:04:33 +0300, Chris Nicholson-Sauls
 <ibisbasenji gmail.com> wrote:

 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier
 <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and
 ~=3D) of
 strings of different character widths. The support library could do a=





ll
 of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a dch=





ar
 is
 already in bugzilla.)

 So a common way to convert wchar to char might then become
 ""~myWcharString?

 That seems kind of odd.

 =A0Well, I guess. In particular, to me it's not clear what type we shou=



ld
 assign to a concatenation between a string and a wstring. With ~=3D, it=



's much
 easier...

 My intuition would be to expect the same as adding an int to a byte: you
 get an int. Concatenating a string and a wstring should yield a wstring;=


 ie,
 encode to the wider of the two types.

 -- Chris Nicholson-Sauls

 ubyte i =3D 42;
 int j =3D 1;

 i +=3D j; // still ubyte

 same here:

 string a =3D "hello";
 wstring b =3D "world"w;

 a ~=3D b; // still string

As Andrei said (and maybe you missed) "With ~=3D, it's much easier...".
The only question is about what "a ~ b" should do.

--bb

Oct 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Chris Nicholson-Sauls wrote:
 Andrei Alexandrescu wrote:

[snip]
 Well, I guess. In particular, to me it's not clear what type we should 
 assign to a concatenation between a string and a wstring. With ~=, 
 it's much easier...

 
 My intuition would be to expect the same as adding an int to a byte: you 
 get an int. Concatenating a string and a wstring should yield a wstring; 
 ie, encode to the wider of the two types.
 
 -- Chris Nicholson-Sauls

Yah, I agree. The problem is, there's a big difference too: all 
encodings are able to represent the same information, unlike numeric 
widths where there's a clear inclusion relationship. It could even be 
argued that in pure theory UTF-16 is the least general of the three (I 
dislike UTF-16 from an engineering standpoint; unlike UTF-8 which I 
think is brilliant, I find UTF-16 is forced and uninspired - the typical 
outcome of a committee.)

My current thought is to ascribe lhs ~ rhs the same type as lhs (thereby 
making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~ 
rhs) in case lhs is a string type. If lhs is a character type, the 
result type is obviously the same as rhs.


Andrei

Oct 27 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs 
 (thereby making ~ consistent with ~= by making lhs ~= rhs same as lhs = 
 lhs ~ rhs) in case lhs is a string type. If lhs is a character type, 
 the result type is obviously the same as rhs.

Seems the most intuitive option to me. Also, it makes "a ~= b" 
equivalent to "a = a ~ b" which is always nice.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 27 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
<michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs (thereby
 making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~ rhs) in
 case lhs is a string type. If lhs is a character type, the result type is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~= b" equivalent to
 "a = a ~ b" which is always nice.

And that kind of suggests to me that even  a = b  should work.
It has many of the same characteristics as ~=.  It's pretty
unambiguous what you'd expect to happen if not an error.


--bb

Oct 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs (thereby
 making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~ rhs) in
 case lhs is a string type. If lhs is a character type, the result type is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~= b" equivalent to
 "a = a ~ b" which is always nice.

 
 And that kind of suggests to me that even  a = b  should work.
 It has many of the same characteristics as ~=.  It's pretty
 unambiguous what you'd expect to happen if not an error.

I agree. This one, however, will be very difficult to slide by Walter's 
watchful eye. He doesn't like hidden allocations, and a width adjustment 
does involve one.

Andrei

P.S. I got green light from my editor's marketing folks. Will release 
The Thermopylae Excerpt of TDPL today for free off my website. Stay 
tuned. It's a rough draft but I hope you will enjoy it.

Oct 27 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs (thereby
 making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~ rhs) in
 case lhs is a string type. If lhs is a character type, the result type is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~= b" equivalent to
 "a = a ~ b" which is always nice.

 
 And that kind of suggests to me that even  a = b  should work.
 It has many of the same characteristics as ~=.  It's pretty
 unambiguous what you'd expect to happen if not an error.
 
 
 --bb

int a;
float b = 2.1;
a = b;
also unambiguous?

Oct 27 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Oct 27, 2009 at 12:48 PM, Pelle M=E5nsson <pelle.mansson gmail.com>=
 wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs (there=




by
 making ~ consistent with ~=3D by making lhs ~=3D rhs same as lhs =3D l=




hs ~
 rhs) in
 case lhs is a string type. If lhs is a character type, the result type
 is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~=3D b" equiva=



lent
 to
 "a =3D a ~ b" which is always nice.

 And that kind of suggests to me that even =A0a =3D b =A0should work.
 It has many of the same characteristics as ~=3D. =A0It's pretty
 unambiguous what you'd expect to happen if not an error.


 --bb

 int a;
 float b =3D 2.1;
 a =3D b;
 also unambiguous?

I'm not sure what point you're trying to make, but wstring <-> string
<-> dstring are all lossless conversions.  That isn't the case with
int and float.

--bb

Oct 27 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 12:48 PM, Pelle M�nsson <pelle.mansson gmail.com>
wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs (thereby
 making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~
 rhs) in
 case lhs is a string type. If lhs is a character type, the result type
 is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~= b" equivalent
 to
 "a = a ~ b" which is always nice.

 And that kind of suggests to me that even  a = b  should work.
 It has many of the same characteristics as ~=.  It's pretty
 unambiguous what you'd expect to happen if not an error.


 --bb

 int a;
 float b = 2.1;
 a = b;
 also unambiguous?

 
 I'm not sure what point you're trying to make, but wstring <-> string
 <-> dstring are all lossless conversions.  That isn't the case with
 int and float.
 
 --bb

They are?

...Then what is the point of wstring, dstring?

Oct 27 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Oct 27, 2009 at 1:06 PM, Pelle M=E5nsson <pelle.mansson gmail.com> =
wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 12:48 PM, Pelle M=E5nsson <pelle.mansson gmail.c=


om>
 wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs
 (thereby
 making ~ consistent with ~=3D by making lhs ~=3D rhs same as lhs =3D=






 lhs ~
 rhs) in
 case lhs is a string type. If lhs is a character type, the result ty=






pe
 is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~=3D b"
 equivalent
 to
 "a =3D a ~ b" which is always nice.

 And that kind of suggests to me that even =A0a =3D b =A0should work.
 It has many of the same characteristics as ~=3D. =A0It's pretty
 unambiguous what you'd expect to happen if not an error.


 --bb

 int a;
 float b =3D 2.1;
 a =3D b;
 also unambiguous?

 I'm not sure what point you're trying to make, but wstring <-> string
 <-> dstring are all lossless conversions. =A0That isn't the case with
 int and float.

 --bb

 They are?

 ...Then what is the point of wstring, dstring?

They are all just different representations of Unicode.

string, which is unicode in UTF-8, is good because it's the least
wasteful for mostly ASCII text.  And has a nice ASCII backwards
compatibility story.

dstring, which is unicode in UTF-32, is good because you have one
element =3D one character.  So it's good for doing substring and other
text manipulations.

wstring, which is UTF-16, is good because it lets you call Windows
Unicode functions.

Here's Daniel Keep's nice explanation:
http://docs.google.com/View?docid=3Ddtqh79k_1rbxfmb

--bb

Oct 27 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 1:06 PM, Pelle M�nsson <pelle.mansson gmail.com> wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 12:48 PM, Pelle M�nsson <pelle.mansson gmail.com>
 wrote:
 Bill Baxter wrote:
 On Tue, Oct 27, 2009 at 6:56 AM, Michel Fortin
 <michel.fortin michelf.com> wrote:
 On 2009-10-27 09:07:06 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 My current thought is to ascribe lhs ~ rhs the same type as lhs
 (thereby
 making ~ consistent with ~= by making lhs ~= rhs same as lhs = lhs ~
 rhs) in
 case lhs is a string type. If lhs is a character type, the result type
 is
 obviously the same as rhs.

 Seems the most intuitive option to me. Also, it makes "a ~= b"
 equivalent
 to
 "a = a ~ b" which is always nice.

 And that kind of suggests to me that even  a = b  should work.
 It has many of the same characteristics as ~=.  It's pretty
 unambiguous what you'd expect to happen if not an error.


 --bb

 int a;
 float b = 2.1;
 a = b;
 also unambiguous?

 I'm not sure what point you're trying to make, but wstring <-> string
 <-> dstring are all lossless conversions.  That isn't the case with
 int and float.

 --bb

 They are?

 ...Then what is the point of wstring, dstring?

 
 They are all just different representations of Unicode.
 
 string, which is unicode in UTF-8, is good because it's the least
 wasteful for mostly ASCII text.  And has a nice ASCII backwards
 compatibility story.
 
 dstring, which is unicode in UTF-32, is good because you have one
 element = one character.  So it's good for doing substring and other
 text manipulations.
 
 wstring, which is UTF-16, is good because it lets you call Windows
 Unicode functions.
 
 Here's Daniel Keep's nice explanation:
 http://docs.google.com/View?docid=dtqh79k_1rbxfmb
 
 --bb

Thank you, that cleared things up for me :)

Oct 27 2009

Leandro Lucarella <llucax gmail.com> writes:

Bill Baxter, el 27 de octubre a las 13:12 me escribiste:
 They are?

 ...Then what is the point of wstring, dstring?

 
 They are all just different representations of Unicode.
 
 string, which is unicode in UTF-8, is good because it's the least
 wasteful for mostly ASCII text.  And has a nice ASCII backwards
 compatibility story.
 
 dstring, which is unicode in UTF-32, is good because you have one
 element = one character.  So it's good for doing substring and other
 text manipulations.
 
 wstring, which is UTF-16, is good because it lets you call Windows
 Unicode functions.
 
 Here's Daniel Keep's nice explanation:
 http://docs.google.com/View?docid=dtqh79k_1rbxfmb

And here is a nice artible about Unicode and encodings:
http://www.joelonsoftware.com/articles/Unicode.html

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
He cometido pecados, he hecho el mal, he sido víctima de la envidia, el
egoísmo, la ambición, la mentira y la frivolidad, pero siempre he sido
un padre argentino que quiere que su hijo triunfe en la vida.
	-- Ricardo Vaporeso

Oct 27 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Bill Baxter, el 27 de octubre a las 13:12 me escribiste:
 They are?

 ...Then what is the point of wstring, dstring?

 They are all just different representations of Unicode.

 string, which is unicode in UTF-8, is good because it's the least
 wasteful for mostly ASCII text.  And has a nice ASCII backwards
 compatibility story.

 dstring, which is unicode in UTF-32, is good because you have one
 element = one character.  So it's good for doing substring and other
 text manipulations.

 wstring, which is UTF-16, is good because it lets you call Windows
 Unicode functions.

 Here's Daniel Keep's nice explanation:
 http://docs.google.com/View?docid=dtqh79k_1rbxfmb

 
 And here is a nice artible about Unicode and encodings:
 http://www.joelonsoftware.com/articles/Unicode.html
 

Damn guys, with these good explanations, nobody's going to use the one 
in TDPL!

Andrei

Oct 27 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 27 de octubre a las 19:32 me escribiste:
 Leandro Lucarella wrote:
Bill Baxter, el 27 de octubre a las 13:12 me escribiste:
They are?

...Then what is the point of wstring, dstring?

They are all just different representations of Unicode.

string, which is unicode in UTF-8, is good because it's the least
wasteful for mostly ASCII text.  And has a nice ASCII backwards
compatibility story.

dstring, which is unicode in UTF-32, is good because you have one
element = one character.  So it's good for doing substring and other
text manipulations.

wstring, which is UTF-16, is good because it lets you call Windows
Unicode functions.

Here's Daniel Keep's nice explanation:
http://docs.google.com/View?docid=dtqh79k_1rbxfmb

And here is a nice artible about Unicode and encodings:
http://www.joelonsoftware.com/articles/Unicode.html

 
 Damn guys, with these good explanations, nobody's going to use the
 one in TDPL!

:)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Vivimos en una época muy contemporánea, Don Inodoro...
	-- Mendieta

Oct 27 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 27 de octubre a las 19:32 me escribiste:
 Leandro Lucarella wrote:
Bill Baxter, el 27 de octubre a las 13:12 me escribiste:
They are?

...Then what is the point of wstring, dstring?

They are all just different representations of Unicode.

string, which is unicode in UTF-8, is good because it's the least
wasteful for mostly ASCII text.  And has a nice ASCII backwards
compatibility story.

dstring, which is unicode in UTF-32, is good because you have one
element = one character.  So it's good for doing substring and other
text manipulations.

wstring, which is UTF-16, is good because it lets you call Windows
Unicode functions.

Here's Daniel Keep's nice explanation:
http://docs.google.com/View?docid=dtqh79k_1rbxfmb

And here is a nice artible about Unicode and encodings:
http://www.joelonsoftware.com/articles/Unicode.html

 
 Damn guys, with these good explanations, nobody's going to use the
 one in TDPL!

BTW, seeing the explanation about Unicode in your book, one wonders why
UTF-8, UTF-16 and UTF-32 character types are not simply called utf8, utf16
and utf32...

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Ya ni el cielo me quiere, ya ni la muerte me visita
Ya ni el sol me calienta, ya ni el viento me acaricia

Oct 29 2009

Justin Johansson <no spam.com> writes:

Chris Nicholson-Sauls Wrote:

 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and 
 ~=) of
 strings of different character widths. The support library could do 
 all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a 
 dchar is
 already in bugzilla.)

 So a common way to convert wchar to char might then become 
 ""~myWcharString?

 That seems kind of odd.

 
 Well, I guess. In particular, to me it's not clear what type we should 
 assign to a concatenation between a string and a wstring. With ~=, it's 
 much easier...
 

 
 My intuition would be to expect the same as adding an int to a byte: you get
an int. 
 Concatenating a string and a wstring should yield a wstring; ie, encode to the
wider of 
 the two types.
 
 -- Chris Nicholson-Sauls

Though I'm sure Shannon would say that the number of bits of intrinsic
information
contained in the same sequence of Unicode codepoints is exactly the same whether
it be encoded as a string or a wstring.  Accordingly my intuition is that some
rule
based upon left-to-right associativity would be more apt.  You could then
concatenate
a wstring (on the rhs) to an empty string (on the lhs) to convert the wstring
to a string
or vica versa.

Cheers
Justin Johansson

Oct 27 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Justin Johansson wrote:
 Chris Nicholson-Sauls Wrote:
 
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 11:51 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and 
 ~=) of
 strings of different character widths. The support library could do 
 all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a 
 dchar is
 already in bugzilla.)

 So a common way to convert wchar to char might then become 
 ""~myWcharString?

 That seems kind of odd.

 Well, I guess. In particular, to me it's not clear what type we should 
 assign to a concatenation between a string and a wstring. With ~=, it's 
 much easier...

 My intuition would be to expect the same as adding an int to a byte: you get
an int. 
 Concatenating a string and a wstring should yield a wstring; ie, encode to the
wider of 
 the two types.

 -- Chris Nicholson-Sauls

 
 Though I'm sure Shannon would say that the number of bits of intrinsic
information
 contained in the same sequence of Unicode codepoints is exactly the same
whether
 it be encoded as a string or a wstring.  Accordingly my intuition is that some
rule
 based upon left-to-right associativity would be more apt.  You could then
concatenate
 a wstring (on the rhs) to an empty string (on the lhs) to convert the wstring
to a string
 or vica versa.
 
 Cheers
 Justin Johansson
 

Granted LTR is common enough to be expectable and acceptable.  To be perfectly
honest, I 
don't believe I have *ever* even used wchar/wstring.  Char/string gosh yes;
dchar/dstring 
quite a bit as well, where I need the simplicity; but I've yet to feel much
need for the 
"weirdo" middle child of UTF.

I would argue that string ~ wstring returning string is fine, but would suggest
it be a 
warning for those like myself who might have first guessed it would "upscale to
fit". 
Just so long as the foreach(dchar;string) trick is still around, char/string
can cover an 
awful lot of ground.

All that said, though, I don't think I would ever use ""~wstring as a means of
conversion. 
  It just feels like "there wasn't any other way to do this, so here's a cheap
hack" -- 
which just isn't the case.

-- Chris Nicholson-Sauls

Oct 29 2009

Justin Johansson <no spam.com> writes:

Chris Nicholson-Sauls Wrote:

 
 Granted LTR is common enough to be expectable and acceptable.  To be perfectly
honest, I 
 don't believe I have *ever* even used wchar/wstring.  Char/string gosh yes;
dchar/dstring 
 quite a bit as well, where I need the simplicity; but I've yet to feel much
need for the 
 "weirdo" middle child of UTF.
 
 I would argue that string ~ wstring returning string is fine, but would
suggest it be a 
 warning for those like myself who might have first guessed it would "upscale
to fit". 
 Just so long as the foreach(dchar;string) trick is still around, char/string
can cover an 
 awful lot of ground.
 
 All that said, though, I don't think I would ever use ""~wstring as a means of
conversion. 
   It just feels like "there wasn't any other way to do this, so here's a cheap
hack" -- 
 which just isn't the case.

Your overall reply well put.  On last point: agree; cheap hacks should be
avoided.

cheers, Justin

Oct 29 2009

"Nick Sabalausky" <a a.a> writes:

"Chris Nicholson-Sauls" <ibisbasenji gmail.com> wrote in message 
news:hcctuf$140a$1 digitalmars.com...
 Granted LTR is common enough to be expectable and acceptable.  To be 
 perfectly honest, I don't believe I have *ever* even used wchar/wstring. 
 Char/string gosh yes; dchar/dstring quite a bit as well, where I need the 
 simplicity; but I've yet to feel much need for the "weirdo" middle child 
 of UTF.

Given that just about anything outside of D (at least as far as I've seen) 
that attempts to use unicode does so with UTF-16 (or just uses UCS-2 and 
pretends that's UTF-16...), wchar and wstring are great for dealing with 
that. For instance, my Goldie engine for GOLD currently uses wchar in a 
number of places because GOLD's .cfg format stores text in...well, 
presumably UTF-16 (I haven't tested to see if it's really UCS-2). But yea, 
as long as you're not dealing with anything that's already in UTF-16 or that 
expects it, then it does seem to be somewhat questionable.

Oct 29 2009

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Nick Sabalausky wrote:
 "Chris Nicholson-Sauls" <ibisbasenji gmail.com> wrote in message 
 news:hcctuf$140a$1 digitalmars.com...
 Granted LTR is common enough to be expectable and acceptable.  To be 
 perfectly honest, I don't believe I have *ever* even used wchar/wstring. 
 Char/string gosh yes; dchar/dstring quite a bit as well, where I need the 
 simplicity; but I've yet to feel much need for the "weirdo" middle child 
 of UTF.

 
 Given that just about anything outside of D (at least as far as I've seen) 
 that attempts to use unicode does so with UTF-16 (or just uses UCS-2 and 
 pretends that's UTF-16...), wchar and wstring are great for dealing with 
 that. For instance, my Goldie engine for GOLD currently uses wchar in a 
 number of places because GOLD's .cfg format stores text in...well, 
 presumably UTF-16 (I haven't tested to see if it's really UCS-2). But yea, 
 as long as you're not dealing with anything that's already in UTF-16 or that 
 expects it, then it does seem to be somewhat questionable. 

I think this says it all:

http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_systems_and_environments

-Lars :)

Oct 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lars T. Kyllingstad wrote:
 Nick Sabalausky wrote:
 "Chris Nicholson-Sauls" <ibisbasenji gmail.com> wrote in message 
 news:hcctuf$140a$1 digitalmars.com...
 Granted LTR is common enough to be expectable and acceptable.  To be 
 perfectly honest, I don't believe I have *ever* even used 
 wchar/wstring. Char/string gosh yes; dchar/dstring quite a bit as 
 well, where I need the simplicity; but I've yet to feel much need for 
 the "weirdo" middle child of UTF.

 Given that just about anything outside of D (at least as far as I've 
 seen) that attempts to use unicode does so with UTF-16 (or just uses 
 UCS-2 and pretends that's UTF-16...), wchar and wstring are great for 
 dealing with that. For instance, my Goldie engine for GOLD currently 
 uses wchar in a number of places because GOLD's .cfg format stores 
 text in...well, presumably UTF-16 (I haven't tested to see if it's 
 really UCS-2). But yea, as long as you're not dealing with anything 
 that's already in UTF-16 or that expects it, then it does seem to be 
 somewhat questionable. 

 
 I think this says it all:
 
 http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_syst
ms_and_environments 
 
 
 -Lars :)

Yep, there was a frenzy when UCS-2 came about: everybody thought two 
bytes will be enough for everyone. So UCS-2 was widely adopted - who 
wouldn't love to have constant character width? Then, the UTF-16 
surrogate business came about, and the only logical step they could take 
was to migrate to UTF-16, which was upward compatible to UCS-2. I 
personally think UTF-8 is a better overall design though.

Andrei

Oct 30 2009

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu Wrote:

 Lars T. Kyllingstad wrote:
 Nick Sabalausky wrote:
 "Chris Nicholson-Sauls" <ibisbasenji gmail.com> wrote in message 
 news:hcctuf$140a$1 digitalmars.com...
 Granted LTR is common enough to be expectable and acceptable.  To be 
 perfectly honest, I don't believe I have *ever* even used 
 wchar/wstring. Char/string gosh yes; dchar/dstring quite a bit as 
 well, where I need the simplicity; but I've yet to feel much need for 
 the "weirdo" middle child of UTF.

 Given that just about anything outside of D (at least as far as I've 
 seen) that attempts to use unicode does so with UTF-16 (or just uses 
 UCS-2 and pretends that's UTF-16...), wchar and wstring are great for 
 dealing with that. For instance, my Goldie engine for GOLD currently 
 uses wchar in a number of places because GOLD's .cfg format stores 
 text in...well, presumably UTF-16 (I haven't tested to see if it's 
 really UCS-2). But yea, as long as you're not dealing with anything 
 that's already in UTF-16 or that expects it, then it does seem to be 
 somewhat questionable. 

 
 I think this says it all:
 
 http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_syst
ms_and_environments 
 
 
 -Lars :)

 
 Yep, there was a frenzy when UCS-2 came about: everybody thought two 
 bytes will be enough for everyone. So UCS-2 was widely adopted - who 
 wouldn't love to have constant character width? Then, the UTF-16 
 surrogate business came about, and the only logical step they could take 
 was to migrate to UTF-16, which was upward compatible to UCS-2. I 
 personally think UTF-8 is a better overall design though.
 
 Andrei

"I personally think UTF-8 is a better overall design though."


recommending UTF-16 for Processing.

http://unicode.org/notes/tn12/

The major claim in the TN is that Unicode is optimized for UTF-16.  The rest of
the argument looks like a VHS (everyone is using it i.e. UTF-16) versus Beta
argument.

So who's right?  My personal view is that whilst they are the *Unicode
Consortium*,
I have great difficulty in accepting UTF-16 as the one-and-holy encoding.

FWIW, there was a subthread during a discussion about the ordained features of 
programming languages on LtU a while back.

http://lambda-the-ultimate.org/node/3166#comment-46233
What Are The Resolved Debates in General Purpose Language Design?

Its a long discussion so easier to search for UTF or Unicode on the page if
you're interested.

cheers
Justin Johansson

Oct 30 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Justin Johansson wrote:
 Andrei Alexandrescu Wrote:
 
 Lars T. Kyllingstad wrote:
 Nick Sabalausky wrote:
 "Chris Nicholson-Sauls" <ibisbasenji gmail.com> wrote in message 
 news:hcctuf$140a$1 digitalmars.com...
 Granted LTR is common enough to be expectable and acceptable.  To be 
 perfectly honest, I don't believe I have *ever* even used 
 wchar/wstring. Char/string gosh yes; dchar/dstring quite a bit as 
 well, where I need the simplicity; but I've yet to feel much need for 
 the "weirdo" middle child of UTF.

 Given that just about anything outside of D (at least as far as I've 
 seen) that attempts to use unicode does so with UTF-16 (or just uses 
 UCS-2 and pretends that's UTF-16...), wchar and wstring are great for 
 dealing with that. For instance, my Goldie engine for GOLD currently 
 uses wchar in a number of places because GOLD's .cfg format stores 
 text in...well, presumably UTF-16 (I haven't tested to see if it's 
 really UCS-2). But yea, as long as you're not dealing with anything 
 that's already in UTF-16 or that expects it, then it does seem to be 
 somewhat questionable. 

 I think this says it all:

 http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_syst
ms_and_environments 


 -Lars :)

 Yep, there was a frenzy when UCS-2 came about: everybody thought two 
 bytes will be enough for everyone. So UCS-2 was widely adopted - who 
 wouldn't love to have constant character width? Then, the UTF-16 
 surrogate business came about, and the only logical step they could take 
 was to migrate to UTF-16, which was upward compatible to UCS-2. I 
 personally think UTF-8 is a better overall design though.

 Andrei

 
 "I personally think UTF-8 is a better overall design though."
 

 recommending UTF-16 for Processing.
 
 http://unicode.org/notes/tn12/
 
 The major claim in the TN is that Unicode is optimized for UTF-16.  The rest of
 the argument looks like a VHS (everyone is using it i.e. UTF-16) versus Beta
argument.
 
 So who's right?  My personal view is that whilst they are the *Unicode
Consortium*,
 I have great difficulty in accepting UTF-16 as the one-and-holy encoding.
 
 FWIW, there was a subthread during a discussion about the ordained features of 
 programming languages on LtU a while back.
 
 http://lambda-the-ultimate.org/node/3166#comment-46233
 What Are The Resolved Debates in General Purpose Language Design?
 
 Its a long discussion so easier to search for UTF or Unicode on the page if
you're interested.
 
 cheers
 Justin Johansson

Thanks for the pointers. One of the reasons for which I like the design 
of UTF-8 is its generality: it's a variable-length code for any number 
of 31 bits. In contrast, UTF-16 is a relies on specific dead zones 
inside the assigned space. But the authors of the unicode.org article do 
make a few good points, such as there not being any invalid UTF-16 
symbol. But then that actually can be seen as a strength of UTF-8 - the 
binary files that are actually UTF-8 files are statistically so scarce, 
UTF-8 has a very solid method of checking whether a file is UTF-8 or 
something else.


Andrei

Oct 30 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier 
 <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 
 Coming along. I'm writing about strings and Unicode right now. I was 
 wondering what people think about allowing concatenation (with ~ and ~=) 
 of strings of different character widths. The support library could do 
 all of the transcoding.
 
 (I understand that concatenating an array of wchar or char with a dchar 
 is already in bugzilla.)
 
 
 Andrei

I don't know if thats a good idea, its better when string encoding is 
explicit so you know where your reallocations are.

ie if I know some routine will have to convert a utf16 parameter to utf8 
to append it to a string, then ill try and either make it output utf16 
or input utf8. If its implicit its much harder to find and optimize 
these cases.

to!string() is easy enough to use anyways.

But it could be good to add a range type that does this with multiple 
opAppend/opAppendAssign overloads.

Oct 26 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier 
 <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was 
 wondering what people think about allowing concatenation (with ~ and 
 ~=) of strings of different character widths. The support library 
 could do all of the transcoding.

 (I understand that concatenating an array of wchar or char with a 
 dchar is already in bugzilla.)


 Andrei

 
 I don't know if thats a good idea, its better when string encoding is 
 explicit so you know where your reallocations are.

The beauty of it is that reallocation with ~ occurs anyway, and with ~= 
is anyway imminent, regardless of the character width you're reallocating.

Allowing concatenation of strings of different widths is a nice way of 
acknowledging at the language level that all character widths are 
encodings of abstract characters.

 ie if I know some routine will have to convert a utf16 parameter to utf8 
 to append it to a string, then ill try and either make it output utf16 
 or input utf8. If its implicit its much harder to find and optimize 
 these cases.
 
 to!string() is easy enough to use anyways.
 
 But it could be good to add a range type that does this with multiple 
 opAppend/opAppendAssign overloads.

One problem with

s ~= to!string(someDstring);

is that it does two allocations instead of one.


Andrei

Oct 26 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier 
 <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was 
 wondering what people think about allowing concatenation (with ~ and 
 ~=) of strings of different character widths. The support library 
 could do all of the transcoding.

 (I understand that concatenating an array of wchar or char with a 
 dchar is already in bugzilla.)


 Andrei

 I don't know if thats a good idea, its better when string encoding is 
 explicit so you know where your reallocations are.

 
 The beauty of it is that reallocation with ~ occurs anyway, and with ~= 
 is anyway imminent, regardless of the character width you're reallocating.
 
 Allowing concatenation of strings of different widths is a nice way of 
 acknowledging at the language level that all character widths are 
 encodings of abstract characters.
 
 ie if I know some routine will have to convert a utf16 parameter to 
 utf8 to append it to a string, then ill try and either make it output 
 utf16 or input utf8. If its implicit its much harder to find and 
 optimize these cases.

 to!string() is easy enough to use anyways.

 But it could be good to add a range type that does this with multiple 
 opAppend/opAppendAssign overloads.

 
 One problem with
 
 s ~= to!string(someDstring);
 
 is that it does two allocations instead of one.
 
 
 Andrei

Good points, I didn't think of the separation between characters and 
encodings or the extra allocation from to.

You have my vote for this feature then!

Jeremie

Oct 26 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Oct 26, 2009 at 4:05 PM, Jeremie Pelletier <jeremiep gmail.com> wrote:
 Andrei Alexandrescu wrote:
 Jeremie Pelletier wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Mon, Oct 26, 2009 at 8:47 AM, Jeremie Pelletier <jeremiep gmail.com>
 wrote:
 Andrei Alexandrescu wrote:
 303 pages and counting!

 Andrei

 Soon the PI level, or at least 10 times PI!

 A hundred even. ;-)

 Coming along. I'm writing about strings and Unicode right now. I was
 wondering what people think about allowing concatenation (with ~ and ~=) of
 strings of different character widths. The support library could do all of
 the transcoding.

 (I understand that concatenating an array of wchar or char with a dchar
 is already in bugzilla.)


 Andrei

 I don't know if thats a good idea, its better when string encoding is
 explicit so you know where your reallocations are.

 The beauty of it is that reallocation with ~ occurs anyway, and with ~= is
 anyway imminent, regardless of the character width you're reallocating.

 Allowing concatenation of strings of different widths is a nice way of
 acknowledging at the language level that all character widths are encodings
 of abstract characters.

 ie if I know some routine will have to convert a utf16 parameter to utf8
 to append it to a string, then ill try and either make it output utf16 or
 input utf8. If its implicit its much harder to find and optimize these
 cases.

 to!string() is easy enough to use anyways.

 But it could be good to add a range type that does this with multiple
 opAppend/opAppendAssign overloads.

 One problem with

 s ~= to!string(someDstring);

 is that it does two allocations instead of one.


 Andrei

 Good points, I didn't think of the separation between characters and
 encodings or the extra allocation from to.

 You have my vote for this feature then!

 Jeremie

Yeh, me too.  Saving an allocation is good.  And I agree that having
~= do a conversion is much more useful than just getting an error.
Its one of those things you might try just hoping it will work, and
it's always nice when something like that does just what you hope it
will.

I guess the only other thing I could worry about is that in generic
array code it might cause someone headaches that for some T[],   T[]
~= S[] is legal and the length of the result is not the same as the
lengths of the inputs.  But I can't think of any real situation where
that would cause trouble.

--bb

Oct 26 2009

D Programming

C/C++ Programming

Other

digitalmars.D - TDPL reaches Thermopylae level