digitalmars.D.announce - string types: const(char)[] and cstring

Walter Bright (23/23) May 25 2007 Under the new const/invariant/final regime, what are strings going to be...

Daniel Keep (19/50) May 25 2007 Thanks for the update; I'm happy to have const strings, and use char[]

Walter Bright (8/12) May 25 2007 const(char)[] => array of const characters

Myron Alexander (2/15) May 25 2007 Looking mighty fine.

Walter Bright (13/29) May 25 2007 I like it a lot better than the C++ "here a const, there a const,

Howard Berkey (2/33) May 25 2007

Myron Alexander (7/9) May 25 2007 When I first read Walter's post, I also thought null-terminated strings....

Myron Alexander (3/18) May 25 2007 Here's a possibility:

Bill Baxter (30/61) May 25 2007 So basically most functions that take a char[] now would be changed to

Walter Bright (5/8) May 26 2007 If you want to reassign another value, yes. I suggest:
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/15) May 26 2007 I think it would be a problem at the top of the namespace,
Leandro Lucarella (13/14) May 26 2007 What about "text"?

Reiner Pope (14/25) May 25 2007 The thing I don't get about this syntax is what happens when you take

Walter Bright (3/7) May 26 2007 The difference is when they are reference types, such as arrays of const...
Daniel Keep (14/44) May 26 2007 This is what I'm wondering; I thought const and invariant only applied

Walter Bright (6/9) May 26 2007 If you know C++, then const(char)* is the same as:

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/15) May 26 2007 I think cstring is a horrible name. "string" is much better, and in use.

Chris Miller (15/30) May 26 2007 I agree, except I don't care much for "str". I'd prefer it named string....

Marcin Kuszczak (40/44) May 26 2007 Yup. That's my opinion also...

renoX (9/54) May 27 2007 I agree with you, I don't think that the string should be a char[]

Regan Heath (3/11) May 27 2007 I think the class you describe would be useful, but only for certain typ...

renoX (7/28) May 27 2007 Sure, but this makes the code less portable (or less efficient when it's...

Regan Heath (8/32) May 28 2007 No, sadly they aren't. Most existing applications these days deal with ...

Derek Parnell (40/62) May 26 2007 We seem to have different experience. Most of the code I write deals wit...

Marcin Kuszczak (11/19) May 26 2007 The same here. I don't have much experience with Java and really don't k...

Johan Granberg (11/27) May 26 2007 In my experience they are not really usefull at all (const as in constan...
Bill Baxter (4/18) May 26 2007 Ditto here. When I've used java I found it more annoying that strings

gareis (14/32) May 26 2007 I found it more bothersome by far that Integer, Float, etc were immutabl...

Walter Bright (4/6) May 26 2007 You're welcome. Different languages offer different pieces, only D will

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/11) May 27 2007 When using Java (and Objective-C), I've found it very useful that

Walter Bright (3/5) May 27 2007 Being able to treat strings as value types is where the big

Kirk McDonald (22/37) May 26 2007 It might also be educational to look at Python, which also has immutable...

Walter Bright (8/17) May 26 2007 You'll still be able to concatenate and slice invariant strings. You can...

Reiner Pope (14/22) May 26 2007 Will there be something in the type system which enables you to safely

Walter Bright (8/23) May 26 2007 Safely? No. You will be able to explicitly cast to invariant, however,

Chris Nicholson-Sauls (7/37) May 26 2007 That's an interesting syntax, casting to a trait/attribute with the rest...

Walter Bright (2/8) May 26 2007 Both.

Reiner Pope (18/30) May 26 2007 I must have misunderstood what scope specifies. I had thought that, to

Walter Bright (2/4) May 26 2007 Sadly, it currently isn't enforced.

Derek Parnell (39/62) May 26 2007 While that is interesting, it has not much to do with what I was saying.

Walter Bright (43/97) May 27 2007 I'm going to argue that your experience is unusual. I do a lot of string...

Derek Parnell (83/127) May 27 2007 On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote:

Walter Bright (31/85) May 27 2007 Where you're going wrong is that there are two parts to a dynamic array

Derek Parnell (47/110) May 27 2007 I know that you know that I know this about arrays already (did I really

Charles D Hixson (18/155) Jun 10 2007 FWIW, I feel the documentation is going to need LOTS of

Regan Heath (17/22) May 26 2007 I like it all, except the alias. I would prefer 'string'. 'cstring' im...

Sam Phillips (4/11) May 26 2007

janderson (8/19) May 26 2007 [snip]
Derek Parnell (12/23) May 27 2007 const(char)[] // A mutable array of immutable characters?

Walter Bright (2/9) May 27 2007 They'll all fail.

Derek Parnell (8/17) May 27 2007 Good, but when? At run time or compile time?

Walter Bright (3/18) May 27 2007 compile time.

Derek Parnell (11/24) May 27 2007 But didn't you say that "invariant char[]" means that "invariant" applie...

Walter Bright (3/4) Jun 02 2007 There isn't one. Such a construct is appealing in the abstract, but I

Tom S (12/17) Jun 02 2007 Are we only talking strings here or general arrays? Because if general

Walter Bright (4/19) Jun 02 2007 We can all come up with an example, the more interesting case is is it a...

Tom S (6/8) Jun 02 2007 Well, it's based on a true story.
Derek Parnell (12/35) Jun 02 2007 Define 'compelling'.
Sean Kelly (7/30) Jun 03 2007 Most array algorithms would apply. But I'm still not sure I see the

Jarrett Billingsley (8/13) Jun 03 2007 If that array is pointing into some meaningful area of memory (like in t...

Sean Kelly (5/20) Jun 03 2007 Well yeah. I don't personally think this is a problem because it

Walter Bright (1/1) Jun 04 2007 Started new thread with reply: resizeable arrays: T[new]

Bruno Medeiros (9/14) Jun 03 2007 What, there isn't one? Isn't that what final does? Like this:

Walter Bright (3/16) Jun 04 2007 Final only works at the outermost level. There is no way to have a

noSpam (3/13) May 27 2007 I think it's better to return reversed/sorted copy. This will make such

Myron Alexander (3/17) May 27 2007 This makes sense. For immutable arrays, the definition should drop "in

Oskar Linde (9/27) May 27 2007 Which would be very confusing. This is instead a perfect opportunity to

Myron Alexander (4/14) May 28 2007 I see your point and agree.
Bill Baxter (5/35) May 28 2007 No opinion there. What about the special code-point-at-a-time foreach

Aarti_pl (7/44) May 29 2007 IMHO that should not be in language. That's why I am opting for string

Regan Heath (4/48) May 29 2007 I agree. I tend to think there are certain things which some apps don't...

Marcin Kuszczak (14/18) May 29 2007 I am not sure what do you mean with this sentence...=20

Regan Heath (7/18) May 29 2007 I'm lost, what is "dstring"?

Aziz K. (8/11) May 29 2007 I think your method doesn't take compound characters into account.

Regan Heath (5/19) May 29 2007 Can you code that test up (using the \U character literal syntax so that...

Frits van Bommel (7/27) May 29 2007 Unicode defines multiple valid encodings for lots of accented

Regan Heath (10/39) May 29 2007 This is the key issue. I was under the (perhaps mistaken) impression it...

Frits van Bommel (57/78) May 30 2007 ---

Regan Heath (3/3) May 31 2007 Thanks for this. It appears you're right :)

Marcin Kuszczak (29/55) May 29 2007 on,

Regan Heath (9/46) May 29 2007 Ahh, thanks, that clears up the confusion I had. Yes, a string class/st...

Reiner Pope (9/40) May 27 2007 Perhaps I should just wait for the implementation, but I'm interested in...

Frits van Bommel (14/24) May 28 2007 Likely the first will be an error as written, requiring a

Reiner Pope (6/34) May 29 2007 Funny, that's just what I thought of (including the name unique). When I...

Frits van Bommel (20/33) May 29 2007 I'm pretty sure this has been suggested in these newsgroups in the past,...

Walter Bright (8/14) Jun 02 2007 We really tried to figure out a way to make "unique" work. It just

Walter Bright <newshound1 digitalmars.com> writes:

Under the new const/invariant/final regime, what are strings going to be 
? Experience with other languages suggest that strings should be 
immutable. To express an array of const chars, one would write:

	const(char)[]

but while that's clear, it doesn't just flow off the keyboard. Strings 
are so common this needs an alias, so:

	alias const(char)[] cstring;

Why cstring? Because 'string' appears as both a module name and a common 
variable name. cstring also implies wstring for wchar strings, and 
dstring for dchars.

String literals, on the other hand, will be invariant (which means they 
can be stuffed into read-only memory). So,
	typeof("abc")
will be:
	invariant(char)[3]

Invariants can be implicitly cast to const.

In my playing around with source code, using cstring's seems to work out 
rather nicely.

So, why not alias cstring to invariant(char)[] ? That way strings really 
would be immutable. The reason is that mutables cannot be implicitly 
cast to invariant, meaning that there'd be a lot of casts in the code. 
Casts are a sledgehammer, and a coding style that requires too many 
casts is a bad coding style.

May 25 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common
 variable name. cstring also implies wstring for wchar strings, and
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really
 would be immutable. The reason is that mutables cannot be implicitly
 cast to invariant, meaning that there'd be a lot of casts in the code.
 Casts are a sledgehammer, and a coding style that requires too many
 casts is a bad coding style.

Thanks for the update; I'm happy to have const strings, and use char[]
manually when I want to mutate something.

One question though: are the parens necessary?  I was under the
impression that const and invariant applied to reference types, so it
would be const char[] or const(char[]), since char by itself is just a
value type.

...this is going to turn into one of those mega threads where we all run
around in circles trying to work out which one is which, isn't it?

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

May 25 2007

Walter Bright <newshound1 digitalmars.com> writes:

Daniel Keep wrote:
 One question though: are the parens necessary?  I was under the
 impression that const and invariant applied to reference types, so it
 would be const char[] or const(char[]), since char by itself is just a
 value type.

const(char)[] => array of const characters
const char[] => const array of const characters
const(char[]) => const array of const characters

Think of const as if it were a template:

	Const!(T)

which returns a const version of its argument.

const without any parens means it applies to the whole type.

May 25 2007

Myron Alexander <someone somewhere.com> writes:

Walter Bright wrote:
 Daniel Keep wrote:
 
 const(char)[] => array of const characters
 const char[] => const array of const characters
 const(char[]) => const array of const characters
 
 Think of const as if it were a template:
 
     Const!(T)
 
 which returns a const version of its argument.
 
 const without any parens means it applies to the whole type.

Looking mighty fine.

May 25 2007

Walter Bright <newshound1 digitalmars.com> writes:

Myron Alexander wrote:
 Walter Bright wrote:
 Daniel Keep wrote:

 const(char)[] => array of const characters
 const char[] => const array of const characters
 const(char[]) => const array of const characters

 Think of const as if it were a template:

     Const!(T)

 which returns a const version of its argument.

 const without any parens means it applies to the whole type.

 
 Looking mighty fine.

I like it a lot better than the C++ "here a const, there a const, 
everywhere a const const" like:

	const char * const * const p;

etc. instead of:

	const(char**) p;

Const in D is transitive, so const(char**) is equivalent to:

	const(const(const(char)*)*)

And no, it is not possible to have a pointer to const pointer to 
mutable. It is both not possible syntactically to declare it, nor is it 
  semantically allowed. You can force the issue with casts (which allow 
you to do whatever you *need* to do), but the result will be undefined 
behavior.

May 25 2007

Howard Berkey <howard well.com> writes:

Nice idea.  I am only concerned that people will see "cstring" and think
"null-terminated "C" string".  Not that that should be a deciding factor by any
means of course.

Walter Bright Wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
 	typeof("abc")
 will be:
 	invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.

May 25 2007

Myron Alexander <someone somewhere.com> writes:

Howard Berkey wrote:
 Nice idea.  I am only concerned that people will see "cstring" and think
"null-terminated "C" string".  Not that that should be a deciding factor by any
means of course.
 


When I first read Walter's post, I also thought null-terminated strings. 


I even had it as an alias for toString (converting C string to char[]) 
as a means to get around the name conflict with Object but I shortened 
it to "str".

I cannot think of another name but "cstring" will cause confusion and 
defeats the "obvious" rule.

May 25 2007

Myron Alexander <someone somewhere.com> writes:

Myron Alexander wrote:
 Howard Berkey wrote:
 Nice idea.  I am only concerned that people will see "cstring" and 
 think "null-terminated "C" string".  Not that that should be a 
 deciding factor by any means of course.

 
 
 When I first read Walter's post, I also thought null-terminated strings.
 
 I even had it as an alias for toString (converting C string to char[]) 
 as a means to get around the name conflict with Object but I shortened 
 it to "str".
 
 I cannot think of another name but "cstring" will cause confusion and 
 defeats the "obvious" rule.

Here's a possibility:

Instead of cstring, wstring, dstring - charstr, widestr, dblstr.

May 25 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.


So basically most functions that take a char[] now would be changed to 
take a cstring in your thinking?

Is it also correct to say that cstring would be used in the places where 
one would use const char* or const std::string&  in C++?

If so that sounds ok to me.  But about the naming ... I have to agree 
that my first thought was "C compatible null terminated string" too, 
like std::string's  .c_str() method in C++.  I can probably live with 
that but I don't like the inconsistency with c/w/d.

Plain 'string' really does make the most sense.
plain   'w'     'd'
======= =====   =====
char    wchar   dchar
string  wstring dstring

It wouldn't be quite as bad if you uniformly apply the 'c' to all of 
them (using 'c' as a flag for constness):
plain   'w'      'd'
======= =====    =====
char    wchar    dchar
cstring wcstring dcstring
or
cstring cwstring cdstring

Some people already alias char[] to string.  As far as I've heard they 
haven't run into conflicts with the module name, or with people naming 
variables 'string'.

Question: if you have an alias like
alias char[] string;

'const string' automatically applies const to both the char and the [], 
right?  Is that something to be worried about?

--bb

May 25 2007

Walter Bright <newshound1 digitalmars.com> writes:

Bill Baxter wrote:
 'const string' automatically applies const to both the char and the [], 
 right?

Right.

 Is that something to be worried about?

If you want to reassign another value, yes. I suggest:

	const(char)[]

instead.

May 26 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Bill Baxter wrote:

 Some people already alias char[] to string.  As far as I've heard they 
 haven't run into conflicts with the module name, or with people naming 
 variables 'string'.

I think it would be a problem at the top of the namespace,
but it's OK if you use (for instance) "wx.common.string":

module wx.common;
alias char[] string;

Then you can do declarations like:
string string = "string";

At least that's how it has been working for the last couple
of years, and for Christopher E. Miller's dstring.d as well:

module dstring;
struct string { ... }

--anders

May 26 2007

Leandro Lucarella <llucax gmail.com> writes:

Bill Baxter, el 26 de mayo a las 14:59 me escribiste:
 Plain 'string' really does make the most sense.

What about "text"?

Please see "The 'string' types" here[1] for an explanation.

[1] http://xlr.sourceforge.net/concept/diverge.html

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
 .------------------------------------------------------------------------,
  \  GPG: 5F5A8D05 // F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05 /
   '--------------------------------------------------------------------'
En la calle me crucé con un señor muy correcto, que habitualmente anda en
Falcon; iba corriendo con dos valijas en la mano y dijo: "Voy para Miami,
tiene algún mensaje o ..." y le dije: "No, no, no..."
	-- Extra Tato (1983, Triunfo de Alfonsín)

May 26 2007

Reiner Pope <some address.com> writes:

Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 

...
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]

The thing I don't get about this syntax is what happens when you take 
off the [].

1.   invariant(char) c = 'b'; // c is 'b' now, and will never change.
2.   final(char) d = 'b';     // but calling it final means the same...
3.   const(char) e = 'b';     // ummm... what?

It seems like const(char) is a constant char -- one that can't change. 
Does that make final obsolete?

Also, I can't see any difference between const(char) and 
invariant(char), since neither can ever be rebound. In that case, if I 
assume that they are identical types, how can an array of const(char) be 
different from an array of invariant(char)?

-- Reiner

May 25 2007

Walter Bright <newshound1 digitalmars.com> writes:

Reiner Pope wrote:
 Also, I can't see any difference between const(char) and 
 invariant(char), since neither can ever be rebound. In that case, if I 
 assume that they are identical types, how can an array of const(char) be 
 different from an array of invariant(char)?

The difference is when they are reference types, such as arrays of const 
char, or arrays of invariant chars.

May 26 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Reiner Pope wrote:
 Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to
 be ? Experience with other languages suggest that strings should be
 immutable. To express an array of const chars, one would write:

     const(char)[]

 ....
 String literals, on the other hand, will be invariant (which means
 they can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]

 
 The thing I don't get about this syntax is what happens when you take
 off the [].
 
 1.   invariant(char) c = 'b'; // c is 'b' now, and will never change.
 2.   final(char) d = 'b';     // but calling it final means the same...
 3.   const(char) e = 'b';     // ummm... what?
 
 It seems like const(char) is a constant char -- one that can't change.
 Does that make final obsolete?
 
 Also, I can't see any difference between const(char) and
 invariant(char), since neither can ever be rebound. In that case, if I
 assume that they are identical types, how can an array of const(char) be
 different from an array of invariant(char)?
 
 -- Reiner

This is what I'm wondering; I thought const and invariant only applied
to reference types (which is why we have final as storage const), in
which case, const(char)[] doesn't make any sense...

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Daniel Keep wrote:
 This is what I'm wondering; I thought const and invariant only applied
 to reference types (which is why we have final as storage const), in
 which case, const(char)[] doesn't make any sense...

If you know C++, then const(char)* is the same as:
	const char* p;		// C++
and const(char*) is the same as:
	const char * const p;	// C++


(using * because C++ doesn't have dynamic arrays)

May 26 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter Bright wrote:

 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.

I think cstring is a horrible name. "string" is much better, and in use.
(else wouldn't those be wcstring and dcstring or cwstring and cdstring?)

That it is made up of constant characters, and that those aren't really
characters but instead UTF-8 code units is something that can be hidden.

alias const(char)[] string;

But "cstring" both sounds awkward, and also leads the mind to C strings.
Even if those (char*) would probably be "stringz" in the usual D lingo.

If any name conflict with previously existing "string" must be avoided,
then "str" is probably a better name... (character->char, integer->int)

As was discussed earlier.

--anders

May 26 2007

"Chris Miller" <chris dprogramming.com> writes:

On Sat, 26 May 2007 04:35:34 -0400, Anders F Björklund <afb algonet.se>  
wrote:

 Walter Bright wrote:

 Why cstring? Because 'string' appears as both a module name and a  
 common variable name. cstring also implies wstring for wchar strings,  
 and dstring for dchars.

 I think cstring is a horrible name. "string" is much better, and in use.
 (else wouldn't those be wcstring and dcstring or cwstring and cdstring?)

 That it is made up of constant characters, and that those aren't really
 characters but instead UTF-8 code units is something that can be hidden.

 alias const(char)[] string;

 But "cstring" both sounds awkward, and also leads the mind to C strings.
 Even if those (char*) would probably be "stringz" in the usual D lingo.

 If any name conflict with previously existing "string" must be avoided,
 then "str" is probably a better name... (character->char, integer->int)

 As was discussed earlier.

 --anders

I agree, except I don't care much for "str". I'd prefer it named string.  
If it's an alias in object.d and not a keyword, it shouldn't be too bad.

Actually, while we're at a change for strings, why not bring in something  
similar to my dstring module, where slicing and indexing never result in  
an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the  
code may not be ideal, but it's the concept I'm referring to.

While on strings, I'll mention another problem I have with D's string  
handling. "invalid utf8 sequence" (or, if you prefer, "4invalid utf8  
sequence"). Other Unicode implementations I've used do not throw such an  
exception, but interpret the bad parts as replacement characters (U+FFFD).  
I believe I've also heard that the Unicode standard also recommends being  
forgiving in this aspect.

- Chris

May 26 2007

Marcin Kuszczak <aarti interia.pl> writes:

Chris Miller wrote:

 Actually, while we're at a change for strings, why not bring in something
 similar to my dstring module, where slicing and indexing never result in
 an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
 code may not be ideal, but it's the concept I'm referring to.

Yup. That's my opinion also...

For me advantages of such a string are quite obvious:
1. Easy slicing and indexing of utf8 sequences (without corrupting this
sequence - as mention above)
2. Common denominator for char[], wchar[] and dchar[]
3. For classes which doesn't need speed it simplifies API (only one version
of functions instead of 3)
4. With some additional support from language (cast operators to different
types and opImplicitCast) it can be fully interchangeable with every method
taking char[], wchar[], dchar[].

Having another 3 names for string is not very appealing for me. We would
have 9 official versions of string available in D:
char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char),
tango String!(wchar), tango String!(dchar)

To write nice, fully functional library you have to write 3 versions of
every function which takes different string types (I know, templates makes
it a little bit easier). Probably I will not be wrong when I say that
reality is that people just write one version for char[], because it is
convenient (see: SWT ported from Java). It causes that wchar and dchar are
treated as second class citizens in D. Additionally when people design
their program for char[], they mostly don't think about issues with slicing
of char[] utf8 sequence (warning! assumption!), so default way of writing
programs is *NOT SAFE*. When you write code and don't care about bare metal
speed it is just tedious to do this additional work... 

Having one string, which hides differences between char[], wchar[] and
dchar[] would solve problem nicely. Adding constness would also be easy.
And you use only one reserved keyword - string - for everything.

I would be happy to hear some other opinions from people on NG. Maybe I am
wrong with above arguments, so probably someone can give
counterarguments... I think it is very important issue as it seems that
most developers over the world are non-native-english-speakers...

PS. See also thread on DWT NG.

-- 
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/
-------------------------------------

May 26 2007

renoX <renosky free.fr> writes:

Marcin Kuszczak a �crit :
 Chris Miller wrote:
 
 Actually, while we're at a change for strings, why not bring in something
 similar to my dstring module, where slicing and indexing never result in
 an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
 code may not be ideal, but it's the concept I'm referring to.

 
 Yup. That's my opinion also...
 
 For me advantages of such a string are quite obvious:
 1. Easy slicing and indexing of utf8 sequences (without corrupting this
 sequence - as mention above)
 2. Common denominator for char[], wchar[] and dchar[]
 3. For classes which doesn't need speed it simplifies API (only one version
 of functions instead of 3)
 4. With some additional support from language (cast operators to different
 types and opImplicitCast) it can be fully interchangeable with every method
 taking char[], wchar[], dchar[].
 
 Having another 3 names for string is not very appealing for me. We would
 have 9 official versions of string available in D:
 char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char),
 tango String!(wchar), tango String!(dchar)
 
 To write nice, fully functional library you have to write 3 versions of
 every function which takes different string types (I know, templates makes
 it a little bit easier). Probably I will not be wrong when I say that
 reality is that people just write one version for char[], because it is
 convenient (see: SWT ported from Java). It causes that wchar and dchar are
 treated as second class citizens in D. Additionally when people design
 their program for char[], they mostly don't think about issues with slicing
 of char[] utf8 sequence (warning! assumption!), so default way of writing
 programs is *NOT SAFE*. When you write code and don't care about bare metal
 speed it is just tedious to do this additional work... 
 
 Having one string, which hides differences between char[], wchar[] and
 dchar[] would solve problem nicely. Adding constness would also be easy.
 And you use only one reserved keyword - string - for everything.
 
 I would be happy to hear some other opinions from people on NG. Maybe I am
 wrong with above arguments, so probably someone can give
 counterarguments... I think it is very important issue as it seems that
 most developers over the world are non-native-english-speakers...
 
 PS. See also thread on DWT NG.

I agree with you, I don't think that the string should be a char[] 
alias, wether it's const or not but a class with char[],dchar[],wchar[] 
under the hood representation and safe slicing by default.

The difficulty is providing enough flexibility for managing correctly 
the internal representation: there should be a possibility to say use 
UTF8 even though there are multibyte characters for example (a size 
optimization with some CPU cost).

renoX

May 27 2007

Regan Heath <regan netmail.co.nz> writes:

renoX Wrote:
 I agree with you, I don't think that the string should be a char[] 
 alias, wether it's const or not but a class with char[],dchar[],wchar[] 
 under the hood representation and safe slicing by default.
 
 The difficulty is providing enough flexibility for managing correctly 
 the internal representation: there should be a possibility to say use 
 UTF8 even though there are multibyte characters for example (a size 
 optimization with some CPU cost).

I think the class you describe would be useful, but only for certain types of
application.  Many applications (those that deal with ASCII or only one of
UTF8, 16 or 32 for example) wont need the sorts of things this class provides
and can get away with just using 'const(char[])' AKA 'string'.  Basically I
think there is a ample room for both 'string' as an alias and 'String' as a
class to exist at the same time.

Regan

May 27 2007

renoX <renosky free.fr> writes:

Regan Heath a �crit :
 renoX Wrote:
 I agree with you, I don't think that the string should be a char[]
  alias, wether it's const or not but a class with
 char[],dchar[],wchar[] under the hood representation and safe
 slicing by default.
 
 The difficulty is providing enough flexibility for managing
 correctly the internal representation: there should be a
 possibility to say use UTF8 even though there are multibyte
 characters for example (a size optimization with some CPU cost).

 
 I think the class you describe would be useful, but only for certain
 types of application.  Many applications (those that deal with ASCII

Hopefully a rare thing now.

 or only one of UTF8, 16 or 32 for example)

Sure, but this makes the code less portable (or less efficient when it's 
not on its "original" OS): Windows use UTF16, Linux UTF8..

 wont need the sorts of
 things this class provides and can get away with just using
 'const(char[])' AKA 'string'.  Basically I think there is a ample
 room for both 'string' as an alias and 'String' as a class to exist
 at the same time.

Room of course, but IMHO one should almost always use the class (except 
in wrappers of native calls) instead of the alias.

renoX

 
 Regan

May 27 2007

Regan Heath <regan netmail.co.nz> writes:

renoX Wrote:
 Regan Heath a �crit :
 renoX Wrote:
 I agree with you, I don't think that the string should be a char[]
  alias, wether it's const or not but a class with
 char[],dchar[],wchar[] under the hood representation and safe
 slicing by default.
 
 The difficulty is providing enough flexibility for managing
 correctly the internal representation: there should be a
 possibility to say use UTF8 even though there are multibyte
 characters for example (a size optimization with some CPU cost).

 
 I think the class you describe would be useful, but only for certain
 types of application.  Many applications (those that deal with ASCII

 
 Hopefully a rare thing now.

No, sadly they aren't.  Most existing applications these days deal with ASCII
or one of the strange code pages (which youd handle in D with ubyte and
appropriate conversion to one of UTF8, 16 or 32 internally).

Granted in the case of the code page apps you might want a String class which
can be produced by a <codepage>toString() free function which leverages iconv
(which is just what I suggested)

However you may only want to deal with them as UTF-8 internally therefore not
need the functionality provided by the class, opting instead to use 'string'
directly.

Sure, in the future I expect/hope people will move to UTF8, 16, and 32 but I
suspect code pages will be hauting us for many years to come.

 wont need the sorts of
 things this class provides and can get away with just using
 'const(char[])' AKA 'string'.  Basically I think there is a ample
 room for both 'string' as an alias and 'String' as a class to exist
 at the same time.

 
 Room of course, but IMHO one should almost always use the class (except 
 in wrappers of native calls) instead of the alias.

I think that's an invalid assertion, specifically your use of the word
'always'.  There are 'almost certainly' (see, my term leaves room for me to be
wrong) many cases where the alias would be preferred, most likely for
performance reasons, espeically if the added functionality isn't required.

In other words, all I'm saying is; sometimes you want it, sometimes you don't. 
Both can exist, both can be used and both should be interchangable (without too
much trouble).

Regan

May 28 2007

Derek Parnell <derek psych.ward> writes:

On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. 

We seem to have different experience. Most of the code I write deals with
changing strings - in other words, manipulating strings is very very common
in the sorts of programs I write.

 To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.

No it doesn't.

I have rarely seen 'string' used as a variable. In phobos it is used in
boxer.d and regexp.d only. I use it as an alias for 'char[]'. I see 'str'
used fairly often but not so much 'string'.

'cstring' is pronounced C-String which instantly brings to mind the
'string' implementation used by C language. Not something I imagine you
wish to imply.


 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
 	typeof("abc")
 will be:
 	invariant(char)[3]
 
 Invariants can be implicitly cast to const.

So 'const(char)[] x' means that I can change x.ptr and x.length but I
cannot change anything that x.ptr points to, right?

     void func(const(char)[] x)
     {
      x = "def"; // ok
      x.length = 0; // ok
      x[0] = 'd'; // fails
     }

And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
I cannot change anything that x.ptr points to, right?

     void func(invariant(char)[] x)
     {
      x = "def"; // fails
      x.length = 0; // fails
      x[0] = 'd'; // ok
     }

So what syntax is to be used so that x.ptr and x.length cannot be changed
but the characters referred to by 'x' can be changed?

     void func(char const([]) x) ???
     {
      x = "def"; // fails
      x.length = 0; // fails
      x[0] = 'd' // ok
     }

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 26 2007

Marcin Kuszczak <aarti interia.pl> writes:

Derek Parnell wrote:

 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

 
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very
 common in the sorts of programs I write.
 

The same here. I don't have much experience with Java and really don't know
why const strings are so usefull... 

Maybe someone could elaborate a little bit more?

-- 
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/
-------------------------------------

May 26 2007

Johan Granberg <lijat.meREM OVEgmail.com> writes:

Marcin Kuszczak wrote:

 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

 
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very
 common in the sorts of programs I write.
 

 
 The same here. I don't have much experience with Java and really don't
 know why const strings are so usefull...
 
 Maybe someone could elaborate a little bit more?
 

In my experience they are not really usefull at all (const as in constant
that is). Sometimes it does not matter and sometimes it is inconvenient or
a performance problem. (it is mostly append that is needed in my
experience)

If function parameters was const by default (as in the new behavior of in) I
see no use of immutability here. In java I think it is used to prevent
aliased String objects from changing value, something that could create
unexpected bugs if used by programmers not understanding aliasing.

ps. although I'm no fan of java I have used it for most university
assignments for the past two years, so hopfully I'm not totally wrong ;)

May 26 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Marcin Kuszczak wrote:
 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very
 common in the sorts of programs I write.

 
 The same here. I don't have much experience with Java and really don't know
 why const strings are so usefull... 
 
 Maybe someone could elaborate a little bit more?

Ditto here.  When I've used java I found it more annoying that strings 
were immutable than anything else.

--bb

May 26 2007

gareis <dhasenan gmail.com> writes:

== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Marcin Kuszczak wrote:
 Derek Parnell wrote:

 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very
 common in the sorts of programs I write.

 The same here. I don't have much experience with Java and really don't know
 why const strings are so usefull...

 Maybe someone could elaborate a little bit more?

 Ditto here.  When I've used java I found it more annoying that strings
 were immutable than anything else.
 --bb

I found it more bothersome by far that Integer, Float, etc were immutable. Even
after going through all the trouble of getting classes for all these, you
couldn't
use them for out or inout parameters to functions.

Scratch that -- what was really annoying was that you couldn't ever *specify*
how
you wanted your parameter. Even in C, you can pass an address (but then,
anything's possible in C). But in Java, you can only call by reference with a
class or an array, so you end up doing things like:
void foo(int[1] inout_parameter) { inout_parameter[0] += 5; }

And the only way to get scope const final sort of deal on a class is to copy and
then submit the copy as a final parameter -- it's the reference, not the data,
that's final.

In short, thank you, Walter, for allowing us to pass anything by reference, and
by
allowing the data referenced to be made read-only.

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

gareis wrote:
 In short, thank you, Walter, for allowing us to pass anything by reference,
and by
 allowing the data referenced to be made read-only.

You're welcome. Different languages offer different pieces, only D will 
offer the whole customizable shebang. The idea is for programs to be 
more self-documenting, and so make automated analysis more feasible.

May 26 2007

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Bill Baxter wrote:

 The same here. I don't have much experience with Java and really don't 
 know
 why const strings are so usefull...
 Maybe someone could elaborate a little bit more?

 
 Ditto here.  When I've used java I found it more annoying that strings 
 were immutable than anything else.

When using Java (and Objective-C), I've found it very useful that 
strings (and others) are immutable since they are then thread-safe.

--anders

May 27 2007

Walter Bright <newshound1 digitalmars.com> writes:

Anders F Bj�rklund wrote:
 When using Java (and Objective-C), I've found it very useful that 
 strings (and others) are immutable since they are then thread-safe.

Being able to treat strings as value types is where the big 
simplification (in user code) comes, and invariant strings should do that.

May 27 2007

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Marcin Kuszczak wrote:
 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very
 common in the sorts of programs I write.

 
 The same here. I don't have much experience with Java and really don't know
 why const strings are so usefull... 
 
 Maybe someone could elaborate a little bit more?
 

It might also be educational to look at Python, which also has immutable 
strings.

The first, and probably most important reason why strings are immutable 
in Python is so they can be used as hash keys. (Mutating an object being 
used as a hash key is bad, bad, bad.)

Other reasons are addressed here:
http://effbot.org/pyfaq/why-are-python-strings-immutable.htm

However, Python is a very different kind of language from D. Using 
strings as hash keys is extraordinarily important in Python, as the use 
of any identifier is in essence a hash lookup.

Providing immutable strings in D is very useful (so the compiler can 
enforce copy-on-write semantics, for instance), and I don't think anyone 
would dispute that. The issue seems to be whether the "default" string 
alias should be immutable. I would say, since D seems to subscribe to 
copy-on-write semantics, that it should be. And of course, if you need 
mutable strings, you will always be able to declare a char[].

-- 
Kirk McDonald
http://kirkmcdonald.blogspot.com
Pyd: Connecting D and Python
http://pyd.dsource.org

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very common
 in the sorts of programs I write.

You'll still be able to concatenate and slice invariant strings. You can 
also cast a char[] to an invariant, when you're done building it.

 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

Right.

 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

Wrong. The difference between const and invariant is that invariant is 
truly, absolutely, immutable. Const is only immutable through the 
reference - another reference to the same data can change it.

 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?

final char[] x;

May 26 2007

Reiner Pope <some address.com> writes:

Walter Bright wrote:
 Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very 
 common
 in the sorts of programs I write.

 
 You'll still be able to concatenate and slice invariant strings. You can 
 also cast a char[] to an invariant, when you're done building it.

Will there be something in the type system which enables you to safely 
say, "This is the only reference to this data, so it's ok for me to make 
this invariant" ? Does 'scope' happen to have anything to do with that?

invariant(char)[] createJunk()
{
     /* scope? */ char[] val = "aaaaa".dup;
     size_t index = rand() % 5;
     val[index] = rand();

     return cast(invariant(char)[]) val;
}

I mean, do I really need to cast it to invariant there? It's easy to see 
that there's only one copy of val's data in existance.

   -- Reiner

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to make 
 this invariant" ?

Safely? No. You will be able to explicitly cast to invariant, however, 
the programmer will have to ensure it is safe to do so.

 Does 'scope' happen to have anything to do with that?

No. Scope just ensures that the reference does not 'escape' the scope 
it's in.

 invariant(char)[] createJunk()
 {
     /* scope? */ char[] val = "aaaaa".dup;
     size_t index = rand() % 5;
     val[index] = rand();
 
     return cast(invariant(char)[]) val;
 }
 
 I mean, do I really need to cast it to invariant there? It's easy to see 
 that there's only one copy of val's data in existance.

Easy for you to see, not so easy for the compiler to. And besides:

	return cast(invariant)val;

will do the trick more conveniently.

May 26 2007

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Walter Bright wrote:
 Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to 
 make this invariant" ?

 
 Safely? No. You will be able to explicitly cast to invariant, however, 
 the programmer will have to ensure it is safe to do so.
 
 Does 'scope' happen to have anything to do with that?

 
 No. Scope just ensures that the reference does not 'escape' the scope 
 it's in.
 
 invariant(char)[] createJunk()
 {
     /* scope? */ char[] val = "aaaaa".dup;
     size_t index = rand() % 5;
     val[index] = rand();

     return cast(invariant(char)[]) val;
 }

 I mean, do I really need to cast it to invariant there? It's easy to 
 see that there's only one copy of val's data in existance.

 
 Easy for you to see, not so easy for the compiler to. And besides:
 
     return cast(invariant)val;
 
 will do the trick more conveniently.

That's an interesting syntax, casting to a trait/attribute with the rest of the
type 
inferred.  I presume cast(const) works as well.  (Maybe cast(scope)?  Then
again, what's 
the use...)  Given cast(*) where * is invariant/const, is cast(*)T[] the same
as 
cast(*(T)[]) or cast(*(T[]))?  That is, does the trait apply to the element
type, or the 
array?

-- Chris Nicholson-Sauls

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Chris Nicholson-Sauls wrote:
 That's an interesting syntax, casting to a trait/attribute with the rest 
 of the type inferred.  I presume cast(const) works as well.  (Maybe 
 cast(scope)?  Then again, what's the use...)  Given cast(*) where * is 
 invariant/const, is cast(*)T[] the same as cast(*(T)[]) or 
 cast(*(T[]))?  That is, does the trait apply to the element type, or the 
 array?

Both.

May 26 2007

Reiner Pope <some address.com> writes:

Walter Bright wrote:
 Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to 
 make this invariant" ?

 
 Safely? No. You will be able to explicitly cast to invariant, however, 
 the programmer will have to ensure it is safe to do so.
 
 Does 'scope' happen to have anything to do with that?

 
 No. Scope just ensures that the reference does not 'escape' the scope 
 it's in.

I must have misunderstood what scope specifies. I had thought that, to 
avoid being escaped, scope specified that your variable may not be 
aliased by another (non-scope) name. In that case, I thought, can't you 
say: "well, when I leave this function, I'm the only one holding a 
reference to this data, so it would be safe to call it invariant (or 
anything else I choose)." I thought a compiler could have a special case 
saying, "at the end of scope, you can safely turn any scope variables 
into whatever you want".

However, I was surprised to find out that the following code compiled 
fine, although it returns a dead object:

Foo foo()
{
     scope Foo f = new Foo();
     Foo g = f;
     return g;
}


   -- Reiner

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Reiner Pope wrote:
 However, I was surprised to find out that the following code compiled 
 fine, although it returns a dead object:

Sadly, it currently isn't enforced.

May 26 2007

Derek Parnell <derek psych.ward> writes:

On Sat, 26 May 2007 22:27:18 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very common
 in the sorts of programs I write.

 
 You'll still be able to concatenate and slice invariant strings. You can 
 also cast a char[] to an invariant, when you're done building it.

While that is interesting, it has not much to do with what I was saying.

You said "strings should be immutable" and I saying that seems odd because
my experience is that most strings are meant to be changed. 

So now I'm thinking that we are talking about different things when we use
the word "string". I'm guessing you are really referring to compile-time
generated string data (e.g. literals) rather than run-time generated string
data.


 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

 
 Right.
 
 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

 
 Wrong. The difference between const and invariant is that invariant is 
 truly, absolutely, immutable. 

Huh??? Isn't that what I just said? Now I'm even more confused about these
terms. They are just not intuitive, are they?

 Const is only immutable through the 
 reference - another reference to the same data can change it.

Ok ... so this below won't fail ...

  void func(const char[] parm)
  {
      char [] q;
      q = parm;
      q[0] = 'a';
  }

or is the "q = parm" not really permitted.

 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?

 
 final char[] x;


Given the syntax on the form "  void func(<X> char[] parm) ", is the table
below true ...

*-------------------------------------*
| <X>         + parm.ptr  |  parm[0]  |    
|-------------+-----------------------+
| const       | mutable   | immutable |
| final       | immutable | mutable   |
| invariant   | immutable | immutable |
|             | mutable   | mutable   |
*-------------------------------------*


I'm sorry I'm a bit slow on this ... but what is the difference between
"invariant" and "const final" ? Is it that "invariant" is sort of a global
effect but "const final" is only in effect for the specific reference it
occurs on.

I'm not looking forward to reading the docs on this. I hope you get a lot
of people to edit the docs to make it understandable for everyone.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 26 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 You said "strings should be immutable" and I saying that seems odd because
 my experience is that most strings are meant to be changed. 

I'm going to argue that your experience is unusual. I do a lot of string 
manipulation (after all, that's what a compiler does) and the strings, 
once constructed, are essentially always immutable. In conversations 
with many others, my experience is commonplace.

But still, in D, nothing prevents you from using mutable strings.

 So now I'm thinking that we are talking about different things when we use
 the word "string". I'm guessing you are really referring to compile-time
 generated string data (e.g. literals) rather than run-time generated string
 data.

I'm referring to the arrays of characters, generated or literals.


 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

 Right.

 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

 Wrong. The difference between const and invariant is that invariant is 
 truly, absolutely, immutable. 

 
 Huh??? Isn't that what I just said?

No. You said for const you could change x.ptr and x.length, but for 
invariant you could not. For both const and invariant, you can change 
x.ptr and x.length.


 Now I'm even more confused about these
 terms. They are just not intuitive, are they?

The problem is I have failed to explain them. Invariant data can go into 
read-only memory. Const data can be changed by another reference to the 
same data (just like in C++). In other words, const is a read-only 
*view* of the data, whereas invariant data is read-only for all views of it.


 Const is only immutable through the 
 reference - another reference to the same data can change it.

 
 Ok ... so this below won't fail ...
 
   void func(const char[] parm)
   {
       char [] q;
       q = parm;

error, q is not const.
       q[0] = 'a';
   }
 
 or is the "q = parm" not really permitted.

Right.

 
 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?

 final char[] x;

 
 
 Given the syntax on the form "  void func(<X> char[] parm) ", is the table
 below true ...
 
 *-------------------------------------*
 | <X>         + parm.ptr  |  parm[0]  |    
 |-------------+-----------------------+
 | const       | mutable   | immutable |
 | final       | immutable | mutable   |
 | invariant   | immutable | immutable |
 |             | mutable   | mutable   |
 *-------------------------------------*

You've got invariant wrong, it's mutable|immutable.


 I'm sorry I'm a bit slow on this ... but what is the difference between
 "invariant" and "const final" ? Is it that "invariant" is sort of a global
 effect but "const final" is only in effect for the specific reference it
 occurs on.

First differences: final is a *storage class*. const and invariant are 
*type constructors*.

final only refers to the actual value that a symbol has, and it means 
that, once a value is assigned to a symbol, that value can never change. 
If the value is a pointer or reference, what it points to *can* be changed.

int x = 3;
final int* p = &x;
p = null; // error, p is final
*p = 1; // ok

const(int)* q = null;
q = &x;  // ok, q is not const, and now *q is 1
*q = 2;  // error, *q is const
*p = 5;  // ok, but now *q is 5, too!
x = 6;   // ok, but now *q is 6

invariant(int)* s = null;
s = &x;  // error, cannot implicitly convert int* to invariant(int)*
int y = 4;
s = cast(invariant(int)*)&y; // ok, trust programmer that y is immutable
*s = 3;  // error, *s is immutable
y = 5;   // undefined behavior, as y is never supposed to change,
          // and compiler assumes *s is still 4

Note that int* can be implicitly converted to const(int)*, and 
invariant(int)* can be implicitly converted to const(int)*.

 I'm not looking forward to reading the docs on this. I hope you get a lot
 of people to edit the docs to make it understandable for everyone.

The thing is actually rather simple, but I am having trouble finding the 
right words to express it. Certainly, the mishmash of C++ const has 
badly muddied the waters about what const means.

May 27 2007

Derek Parnell <derek psych.ward> writes:

On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote:

Thanks for taking the time out to help me understand the proposed D
changes. I really appreciate it.

I think that I'm going to have to wait until you have an implementation to
try it on; to see how it fits with my terminology and needs.

 Derek Parnell wrote:
 You said "strings should be immutable" and I saying that seems odd because
 my experience is that most strings are meant to be changed. 

 
 I'm going to argue that your experience is unusual. I do a lot of string 
 manipulation (after all, that's what a compiler does) and the strings, 
 once constructed, are essentially always immutable. In conversations 
 with many others, my experience is commonplace.

Ok we'll leave it that then. However the phrase "once constructed" is the
key one I suspect. Its like saying, once I've finished changing things I
don't want them to change anymore - no argument there. So the idea would be
to work with mutable strings until they are finished being constructed and
then cast them to immutable for the rest of the run time. I'm thinking here
of things like changing case, macro expansion, standarizing file names,
constructing message text, etc ...

 But still, in D, nothing prevents you from using mutable strings.

That's why I can see that I'll be continuing to use 'alias char[] string',
unless you make 'string' the immutable beastie of course <g>

 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

 Right.

 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

 Wrong. The difference between const and invariant is that invariant is 
 truly, absolutely, immutable. 

 
 Huh??? Isn't that what I just said?

 
 No. You said for const you could change x.ptr and x.length, but for 
 invariant you could not. For both const and invariant, you can change 
 x.ptr and x.length.

See, this is what is weird ... I can have an invariant string which can be
changed, thus making it not really invariant in the English language sense.
I'm still thinking that "invariant" means "does not change ever". 

But it seems that I'm wrong ...

 invariant char[] x; 
 x = "abc".dup;  // The string 'x' now contains "abc";
 x = "def".dup;  // The string (which is not supposed to change
                 // i.e invariant) has been changed to "def".

Now this is counter-intuitive (read: *WEIRD*), no?

 Now I'm even more confused about these
 terms. They are just not intuitive, are they?

 
 The problem is I have failed to explain them. Invariant data can go into 
 read-only memory. Const data can be changed by another reference to the 
 same data (just like in C++). In other words, const is a read-only 
 *view* of the data, whereas invariant data is read-only for all views of it.

Okay, I've got that now ... but how to remember that two terms that mean
the same in English actually mean different things in D <G>

I think I read that someone suggested that 'const' be a contraction of
'constrained' rather than 'constant' - that might help. And that
'invariant' is longer than 'const' so its effect is 'bigger'.

  invariant char[] x; // The data pointed to by 'x' cannot be changed
                      // by anything anytime during the execution
                      // of the program.
                      // (So how do I populate it then? Hmmmm ...)

  const char[] y;    // The data pointed to by 'y' cannot be changed
                     // by anything anytime during the execution
                     // of the program when using the 'y' variable,
                     // however using another variable that also
                     // refers to y's data, or some of it, is ok.

For example ...

  void func (const char[] a, char[] b)
  {
        a[0] = 'a'; // fails
        b[0] = 'a'; // succeeds
  }

  char[] y = "def".dup;
  func( y, y);
  
 I'm sorry I'm a bit slow on this ... but what is the difference between
 "invariant" and "const final" ? Is it that "invariant" is sort of a global
 effect but "const final" is only in effect for the specific reference it
 occurs on.

 
 First differences: final is a *storage class*. const and invariant are 
 *type constructors*.

Thanks. So 'final' means that it can be changed (from its initial default
value) once and only once.


  final int r;
  r = randomer(); // succeeds
  foo(); // fails 

  int randomer() { 
      // Get a random integer between -100 and 100.
      return cast(int)(std.random.rand() % 201) - 100; 
  }
  void foo() { 
    r = randomer(); // success depends on whether or not 'r' 
                    // has already been set.
  }



  final int r;

  foo(); // succeeds
  r = randomer(); // fails

  int randomer() { 
      // Get a random integer between -100 and 100.
      return cast(int)(std.random.rand() % 201) - 100; 
  }
  void foo() { 
    r = randomer(); // success depends on whether or not 'r' 
                    // has already been set.
  }

Is this a run-time check or a compile time one? If run-time, would it be
possible to somehow 'unfinal' a variable using some implementation
dependant trickery.

 I'm not looking forward to reading the docs on this. I hope you get a lot
 of people to edit the docs to make it understandable for everyone.

 
 The thing is actually rather simple, but I am having trouble finding the 
 right words to express it. 

And thus my comment re editors.

 Certainly, the mishmash of C++ const has 
 badly muddied the waters about what const means.

I have no real knowledge of C++ or its const, and I'm still weirded out by
it all <G>

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

Where you're going wrong is that there are two parts to a dynamic array 
- the contents of the array, and the ptr/length values of the array.

	invariant(char)[]

immutalizes (look ma! I coined a new word!) only the contents of the array.

	invariant(char[])

immutalizes the contents and the ptr/length values.


 But it seems that I'm wrong ...
 
  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".
 
 Now this is counter-intuitive (read: *WEIRD*), no?

The first issue is that you've confused:
	invariant char[] x;
with:
	invariant(char)[] x;

Remember, there are TWO parts to an array, and the invariantness can be 
controlled for either independently, or both. This isn't different from 
in C++ there are two parts to a char*, the char part, and the pointer part.

 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

English is imprecise and ambiguous, that's why we have mathematical 
languages, and programming languages.

   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

You can't populate an invariant(char)[] array (which is what you meant, 
not invariant char[]). The way to get one is to cast an existing array 
to invariant.

   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.

Yes, but here again, const(char)[].

 For example ...
 
   void func (const char[] a, char[] b)
   {
         a[0] = 'a'; // fails
         b[0] = 'a'; // succeeds
   }
 
   char[] y = "def".dup;
   func( y, y);

Yup, that's the aliasing issue with const.


 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.

No. 'final' means it is set only at initialization.


   final int r;
   r = randomer(); // succeeds

Nope, this fails. Try:
	final int r = randomer();
   foo(); // fails 
 
   int randomer() { 
       // Get a random integer between -100 and 100.
       return cast(int)(std.random.rand() % 201) - 100; 
   }
   void foo() { 
     r = randomer(); // success depends on whether or not 'r' 
                     // has already been set.

No, this assignment always fails.
   }
 

 Is this a run-time check or a compile time one?

Compile time.

 If run-time, would it be
 possible to somehow 'unfinal' a variable using some implementation
 dependant trickery.

Yes, but the result is undefined behavior. Just like if you went around 
the typing system and converted an int into a pointer, and tried to 
access data with it. You can do it, but you're on your own with that.


 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

I'm beginning to realize that unless one understands how types are 
represented at run time, one will never understand const.

May 27 2007

Derek Parnell <derek psych.ward> writes:

On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

 
 Where you're going wrong is that there are two parts to a dynamic array 
 - the contents of the array, and the ptr/length values of the array.

 	invariant(char)[]
 
 immutalizes (look ma! I coined a new word!) only the contents of the array.
 
 	invariant(char[])
 
 immutalizes the contents and the ptr/length values.

I know that you know that I know this about arrays already (did I really
just say that!?) so I assume you are talking to the greater audience that
we have here.

So to immutalize (see it must be a real word as someone else is using
it<g>) just the ptr/length parts I'd use ...

  invariant char([])   ?????
  char invariant[]     ????

and 

  invariant char[]

is the same as 

  invariant (char[])

right?


 But it seems that I'm wrong ...
 
  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".
 
 Now this is counter-intuitive (read: *WEIRD*), no?


In my thinking the term 'string' refers to the whole ptr/length/content
group. So when one says that a string is immutable I'm thinking they are
saying that every aspect of the string does not change. This is where I
suspect that we are having terminology problems.
 
 The first issue is that you've confused:
 	invariant char[] x;
 with:
 	invariant(char)[] x;

Yep - guilty as charged, your honour. Actually it is not so much confusion
rather just a poor typing regime, as I really did understand the difference
but I typed in the wrong thing. But let's continue ...

 Remember, there are TWO parts to an array, and the invariantness can be 
 controlled for either independently, or both. This isn't different from 
 in C++ there are two parts to a char*, the char part, and the pointer part.

What is the syntax for controlling *just* the reference part of an array?

 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

 
 English is imprecise and ambiguous, that's why we have mathematical 
 languages, and programming languages.

Has anyone got a dictionary in which "constant" and "invariant" are not
synonyms? Sure I agree that "English is imprecise and ambiguous" when taken
as a whole but not every word is such. So when one uses English words in a
programming language the natural thing is to assume that the programming
language meaning has a high degree of correlation with the English meaning.


   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

 
 You can't populate an invariant(char)[] array (which is what you meant, 
 not invariant char[]). The way to get one is to cast an existing array 
 to invariant.

  char[] name;
  name = GetUserName();

  invariant (char)[] newb = cast(invariant)name;

  void foo() { name[0] = toUpperCase(name[0]); }  // Is this valid?

  foo(); // What about this?

   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.

 
 Yes, but here again, const(char)[].

Yeah yeah yeah ... I can see how an alias is going to be a boon.


 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.

 
 No. 'final' means it is set only at initialization.

And initialization means "on the same statement that declares the
variable"? 

In English, initialization means whenever some thing is initialized rather
than one specific type of initialization.



   final int r;


Ok, so the above "initializes" the symbol to zero, being the default value
of an int, and it cannot be changed to anything else now.

   r = randomer(); // succeeds

 Nope, this fails. Try:
 	final int r = randomer();

Got it.

 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

 
 I'm beginning to realize that unless one understands how types are 
 represented at run time, one will never understand const.

Nah, it's probably just me that's being thick, ... and I *do* understand
the run-time implementation of the D constructs.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27 2007

Charles D Hixson <charleshixsn earthlink.net> writes:

Derek Parnell wrote:
 On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

 Where you're going wrong is that there are two parts to a dynamic array 
 - the contents of the array, and the ptr/length values of the array.

 	invariant(char)[]

 immutalizes (look ma! I coined a new word!) only the contents of the array.

 	invariant(char[])

 immutalizes the contents and the ptr/length values.

 
 I know that you know that I know this about arrays already (did I really
 just say that!?) so I assume you are talking to the greater audience that
 we have here.
 
 So to immutalize (see it must be a real word as someone else is using
 it<g>) just the ptr/length parts I'd use ...
 
   invariant char([])   ?????
   char invariant[]     ????
 
 and 
 
   invariant char[]
 
 is the same as 
 
   invariant (char[])
 
 right?
 
 
 But it seems that I'm wrong ...

  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".

 Now this is counter-intuitive (read: *WEIRD*), no?


 
 In my thinking the term 'string' refers to the whole ptr/length/content
 group. So when one says that a string is immutable I'm thinking they are
 saying that every aspect of the string does not change. This is where I
 suspect that we are having terminology problems.
  
 The first issue is that you've confused:
 	invariant char[] x;
 with:
 	invariant(char)[] x;

 
 Yep - guilty as charged, your honour. Actually it is not so much confusion
 rather just a poor typing regime, as I really did understand the difference
 but I typed in the wrong thing. But let's continue ...
 
 Remember, there are TWO parts to an array, and the invariantness can be 
 controlled for either independently, or both. This isn't different from 
 in C++ there are two parts to a char*, the char part, and the pointer part.

 
 What is the syntax for controlling *just* the reference part of an array?
 
 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

 English is imprecise and ambiguous, that's why we have mathematical 
 languages, and programming languages.

 
 Has anyone got a dictionary in which "constant" and "invariant" are not
 synonyms? Sure I agree that "English is imprecise and ambiguous" when taken
 as a whole but not every word is such. So when one uses English words in a
 programming language the natural thing is to assume that the programming
 language meaning has a high degree of correlation with the English meaning.
 
 
   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

 You can't populate an invariant(char)[] array (which is what you meant, 
 not invariant char[]). The way to get one is to cast an existing array 
 to invariant.

 
   char[] name;
   name = GetUserName();
 
   invariant (char)[] newb = cast(invariant)name;
 
   void foo() { name[0] = toUpperCase(name[0]); }  // Is this valid?
 
   foo(); // What about this?
 
   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.

 Yes, but here again, const(char)[].

 
 Yeah yeah yeah ... I can see how an alias is going to be a boon.
 
 
 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.

 No. 'final' means it is set only at initialization.

 
 And initialization means "on the same statement that declares the
 variable"? 
 
 In English, initialization means whenever some thing is initialized rather
 than one specific type of initialization.
 
 

   final int r;


 
 Ok, so the above "initializes" the symbol to zero, being the default value
 of an int, and it cannot be changed to anything else now.
 
   r = randomer(); // succeeds

 Nope, this fails. Try:
 	final int r = randomer();

 
 Got it.
 
 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

 I'm beginning to realize that unless one understands how types are 
 represented at run time, one will never understand const.

 
 Nah, it's probably just me that's being thick, ... and I *do* understand
 the run-time implementation of the D constructs.
 

FWIW, I feel the documentation is going to need LOTS of 
examples.  The text is sufficient to point folk in the right 
general direction, but the examples will be necessary to 
highlight the minimal distinctions.

And as my C++ is really quite minimal, and predates templates 
being generally available, I don't think I'm being confused by 
how C++ uses it.

OTOH, I fequently need to do things like:
char[] stuff	=	"alperferous";
stuff	=	stuff[0..5] ~ "if" ~ stuff[5..length];
(silly example, but it's short!)
Given what I've read so far I suppose this means that I just 
keep avoiding const & invariant, but I do think of this as 
string manipulation, as thus "Strings are constant by default" 
sets warning bells ringing.  (Probably inappropriately, 
admittedly.  But perhaps this should be said differently in 
the documentation.)

Jun 10 2007

Regan Heath <regan netmail.co.nz> writes:

Walter Bright Wrote:
 	alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.

I like it all, except the alias.  I would prefer 'string'.  'cstring' implies
C's string to me, for example I often alias std.string.toStringz to cstr or
CSTR.  I think wstring and dstring are ok, basically I'd like:

char, string
wchar, wstring
dchar, dstring

Is it really a problem that std.string is a module name?

I don't reckon it's a very common variable name, for example:

1. I wouldn't go to the trouble of typing 'string' for a throw away variable
when I could use 'p', 's', or 'str'.  

2. Likewise for a more long lived variable I would use something more
descriptive i.e. nameString, ageString, promptString, boundaryString, ...

Slightly OT:  I think once we have const etc and 'string' working as desired
then for many applications there will be no need for a additional String class.

Note that I said 'many applications' above;  I think that those applications
that make heavy use of many different text encodings and/or languages may still
want a 'String' (or 'Text') class.

This class would provide the extra functionality that aren't inherent in
string, wstring, dstring, things like:

1. leveraging iconv (or similar) to handle various encodings.

2. choosing the best internal format string, wstring, dstring for the text
based on the language used.

3. slicing on character boundaries regardless of internal format.

.. and probably other things I haven't thought of here.

Regan

May 26 2007

Sam Phillips <dont-spam-sambeau mac.com> writes:

Regan Heath Wrote:
 I like it all, except the alias.  I would prefer 'string'.  'cstring' implies
C's string to me, for example I often alias std.string.toStringz to cstr or
CSTR.  I think wstring and dstring are ok, basically I'd like:
 
 char, string
 wchar, wstring
 dchar, dstring
 
 Is it really a problem that std.string is a module name?

</lurk>
Concur
<lurk>

May 26 2007

janderson <askme me.com> writes:

Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 

[snip]

If you decide on an alias it would be a good idea to add it to phobos 
for DMD 1, except without the const syntax of course.  That way people 
can start using it now and have less problems upgrading to DMD 2. 
Although, on the other hand, it may be slightly confusing on 1.0 coders 
I guess when it doesn't function as a const string.

-Joel

May 26 2007

Derek Parnell <derek psych.ward> writes:

On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 

 const(char)[]  // A mutable array of immutable characters?
 const(char[])  // An immutable array of mutable characters?
 const(const(char)[]) // An immutable array of immutable characters?
 char[]         // A mutable array of mutable characters?

What will happen with the .reverse and .sort array properties when used
with const, invariant, and final qualifiers?

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?
 
 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.

May 27 2007

Derek Parnell <derek psych.ward> writes:

On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:

 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?



Any comment on the above?


 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

 
 They'll all fail.

Good, but when? At run time or compile time?

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?


 
 
 Any comment on the above?

Looks right to me.

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

 They'll all fail.

 
 Good, but when? At run time or compile time?

compile time.

May 27 2007

Derek Parnell <derek psych.ward> writes:

On Sun, 27 May 2007 16:32:57 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?


 
 
 Any comment on the above?

 
 Looks right to me.

But didn't you say that "invariant char[]" means that "invariant" applies
to both the array reference and the contents? In other words its the same
as "invariant (char[])" but above I said that this means that the array is
immutable but the contents are not. 

What is the syntax for an immutable array of mutable characters?

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

May 27 2007

Walter Bright <newshound1 digitalmars.com> writes:

Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I 
haven't run across a legitimate use for it yet.

Jun 02 2007

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 
 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

Are we only talking strings here or general arrays? Because if general 
arrays are concerned, I can come up with an example.

An immutable array of mutable data for... e.g. render to texture in a 
software renderer (or creating data for a hw texture, or whatnot) So you 
basically pass a texture buffer to a function. You don't want it to 
realloc the buffer, just to modify its contents...

What am I missing here? ;)


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Jun 02 2007

Walter Bright <newshound1 digitalmars.com> writes:

Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

 
 Are we only talking strings here or general arrays? Because if general 
 arrays are concerned, I can come up with an example.

In general.


 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So you 
 basically pass a texture buffer to a function. You don't want it to 
 realloc the buffer, just to modify its contents...
 
 What am I missing here? ;)

We can all come up with an example, the more interesting case is is it a 
compelling example? I'm not seeing that.

Jun 02 2007

Tom S <h3r3tic remove.mat.uni.torun.pl> writes:

Walter Bright wrote:
 We can all come up with an example, the more interesting case is is it a 
 compelling example? I'm not seeing that.

Well, it's based on a true story.


-- 
Tomasz Stachowiak
http://h3.team0xf.com/
h3/h3r3tic on #D freenode

Jun 02 2007

Derek Parnell <derek psych.ward> writes:

On Sat, 02 Jun 2007 18:25:46 -0700, Walter Bright wrote:

 Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

 
 Are we only talking strings here or general arrays? Because if general 
 arrays are concerned, I can come up with an example.

 
 In general.
 
 
 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So you 
 basically pass a texture buffer to a function. You don't want it to 
 realloc the buffer, just to modify its contents...
 
 What am I missing here? ;)

 
 We can all come up with an example, the more interesting case is is it a 
 compelling example? I'm not seeing that.

Define 'compelling'. 

The only workaround I can see is bit restrictive ...

  final TextureBuffer t = CreateTextureBuffer();

  RenderToBuffer( t );
  DoLighting(t);
  ...


-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Jun 02 2007

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

 Are we only talking strings here or general arrays? Because if general 
 arrays are concerned, I can come up with an example.

 
 In general.
 
 
 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So 
 you basically pass a texture buffer to a function. You don't want it 
 to realloc the buffer, just to modify its contents...

 What am I missing here? ;)

 
 We can all come up with an example, the more interesting case is is it a 
 compelling example? I'm not seeing that.

Most array algorithms would apply.  But I'm still not sure I see the 
point of having an immutable reference, because it's just passed by 
value anyway.  Who cares if the size of the array is modified within a 
function where it's not passed by reference?  The change is just local 
to the function anyway.


Sean

Jun 03 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:f3uoj5$1b1i$1 digitalmars.com...

 Most array algorithms would apply.  But I'm still not sure I see the point 
 of having an immutable reference, because it's just passed by value 
 anyway.  Who cares if the size of the array is modified within a function 
 where it's not passed by reference?  The change is just local to the 
 function anyway.

If that array is pointing into some meaningful area of memory (like in the 
example, a texture buffer), resizing the array could (probably would) move 
the array around, which I guess isn't illegal but then the function 
operating on the array wouldn't be accessing the correct place.  Prevent 
them from changing the length, it prevents them from accessing anywhere but 
there.

Jun 03 2007

Sean Kelly <sean f4.ca> writes:

Jarrett Billingsley wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:f3uoj5$1b1i$1 digitalmars.com...
 
 Most array algorithms would apply.  But I'm still not sure I see the point 
 of having an immutable reference, because it's just passed by value 
 anyway.  Who cares if the size of the array is modified within a function 
 where it's not passed by reference?  The change is just local to the 
 function anyway.

 
 If that array is pointing into some meaningful area of memory (like in the 
 example, a texture buffer), resizing the array could (probably would) move 
 the array around, which I guess isn't illegal but then the function 
 operating on the array wouldn't be accessing the correct place.  Prevent 
 them from changing the length, it prevents them from accessing anywhere but 
 there. 

Well yeah.  I don't personally think this is a problem because it 
doesn't affect the callee in any way, but I can see how others might 
disagree.  Doesn't 'final' do this now though?


Sean

Jun 03 2007

Walter Bright <newshound1 digitalmars.com> writes:

Started new thread with reply: resizeable arrays: T[new]

Jun 04 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 
 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

What, there isn't one? Isn't that what final does? Like this:
   final char[] charar = new char[](20);

   charar[1] = 'x'; // Allowed
   charar = new char[](20); // Not allowed
   charar.length = 10; // Not allowed


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 03 2007

Walter Bright <newshound1 digitalmars.com> writes:

Bruno Medeiros wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

 There isn't one. Such a construct is appealing in the abstract, but I 
 haven't run across a legitimate use for it yet.

 
 What, there isn't one? Isn't that what final does? Like this:
   final char[] charar = new char[](20);
 
   charar[1] = 'x'; // Allowed
   charar = new char[](20); // Not allowed
   charar.length = 10; // Not allowed

Final only works at the outermost level. There is no way to have a 
mutable pointer to a const pointer to mutable data.

Jun 04 2007

noSpam <""pelekhay\" (noSpam)gmail.com"> writes:

Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

 
 They'll all fail.

I think it's better to return reversed/sorted copy. This will make such 
change more backward compatibile.

May 27 2007

Myron Alexander <someone somewhere.com> writes:

noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

 They'll all fail.

 
 I think it's better to return reversed/sorted copy. This will make such 
 change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in 
place" and just return a copy.

May 27 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

 They'll all fail.

 I think it's better to return reversed/sorted copy. This will make 
 such change more backward compatibile.

 
 This makes sense. For immutable arrays, the definition should drop "in 
 place" and just return a copy.

Which would be very confusing. This is instead a perfect opportunity to 
  take the *much* better path of finally depreciating the .sort and 
.reverse "properties". Equally good or better library implementations 
are possible (and exists). For example, .sort can't take an ordering 
predicate. Also, the special casing of reversing char[] and wchar[] 
arrays, preserving the encoded unicode code points is definitely (imho) 
too specialized to belong in the language (runtime) as opposed to a library.

/ Oskar

May 27 2007

Myron Alexander <someone somewhere.com> writes:

Oskar Linde wrote:
 Which would be very confusing. This is instead a perfect opportunity to 
  take the *much* better path of finally depreciating the .sort and 
 .reverse "properties". Equally good or better library implementations 
 are possible (and exists). For example, .sort can't take an ordering 
 predicate. Also, the special casing of reversing char[] and wchar[] 
 arrays, preserving the encoded unicode code points is definitely (imho) 
 too specialized to belong in the language (runtime) as opposed to a 
 library.
 
 / Oskar

I see your point and agree.

Regards,

Myron.

May 28 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

 They'll all fail.

 I think it's better to return reversed/sorted copy. This will make 
 such change more backward compatibile.

 This makes sense. For immutable arrays, the definition should drop "in 
 place" and just return a copy.

 
 Which would be very confusing. This is instead a perfect opportunity to 
  take the *much* better path of finally depreciating the .sort and 
 .reverse "properties". Equally good or better library implementations 
 are possible (and exists). For example, .sort can't take an ordering 
 predicate. 

+1 (and thanks for your predicate-accepting sort routine, Oskar!)

 Also, the special casing of reversing char[] and wchar[] 
 arrays, preserving the encoded unicode code points is definitely (imho) 
 too specialized to belong in the language (runtime) as opposed to a 
 library.

No opinion there.  What about the special code-point-at-a-time foreach 
for char[]?  Do you dislike that too?

--bb

May 28 2007

Aarti_pl <aarti interia.pl> writes:

Bill Baxter pisze:
 Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

 They'll all fail.

 I think it's better to return reversed/sorted copy. This will make 
 such change more backward compatibile.

 This makes sense. For immutable arrays, the definition should drop 
 "in place" and just return a copy.

 Which would be very confusing. This is instead a perfect opportunity 
 to  take the *much* better path of finally depreciating the .sort and 
 .reverse "properties". Equally good or better library implementations 
 are possible (and exists). For example, .sort can't take an ordering 
 predicate. 

 
 +1 (and thanks for your predicate-accepting sort routine, Oskar!)

+1

 
 Also, the special casing of reversing char[] and wchar[] arrays, 
 preserving the encoded unicode code points is definitely (imho) too 
 specialized to belong in the language (runtime) as opposed to a library.

 
 No opinion there.  What about the special code-point-at-a-time foreach 
 for char[]?  Do you dislike that too?
 

IMHO that should not be in language. That's why I am opting for string 
*library* class/struct which could take care about such cases.

BR
Marcin Kuszczak
(Aarti_pl)

May 29 2007

Regan Heath <regan netmail.co.nz> writes:

Aarti_pl Wrote:
 Bill Baxter pisze:
 Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

 They'll all fail.

 I think it's better to return reversed/sorted copy. This will make 
 such change more backward compatibile.

 This makes sense. For immutable arrays, the definition should drop 
 "in place" and just return a copy.

 Which would be very confusing. This is instead a perfect opportunity 
 to  take the *much* better path of finally depreciating the .sort and 
 .reverse "properties". Equally good or better library implementations 
 are possible (and exists). For example, .sort can't take an ordering 
 predicate. 

 
 +1 (and thanks for your predicate-accepting sort routine, Oskar!)

 
 +1
 
 
 Also, the special casing of reversing char[] and wchar[] arrays, 
 preserving the encoded unicode code points is definitely (imho) too 
 specialized to belong in the language (runtime) as opposed to a library.

 
 No opinion there.  What about the special code-point-at-a-time foreach 
 for char[]?  Do you dislike that too?
 

 
 IMHO that should not be in language. That's why I am opting for string 
 *library* class/struct which could take care about such cases.

I agree.  I tend to think there are certain things which some apps don't need,
in which case they can use the 'string' alias.  Other apps need to do this sort
of thing and want a 'String' class to handle it.  I think there is room for
both in the phobos/tango libraries.

The default language/library support can reverse utf8 and 16 but it's not
ideal, eg.  convert to utf32, reverse, convert back. ;)

Regan

May 29 2007

Marcin Kuszczak <aarti interia.pl> writes:

Regan Heath wrote:

 The default language/library support can reverse utf8 and 16 but it's=

 not
 ideal, eg. =C2=A0convert to utf32, reverse, convert back. ;)
=20
 Regan

I am not sure what do you mean with this sentence...=20

dstring implementation doesn't do things according to your description,=
 so
it's definitely not a case here...


--=20
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/d=
oost/
-------------------------------------

May 29 2007

Regan Heath <regan netmail.co.nz> writes:

Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 The default language/library support can reverse utf8 and 16 but it's not
 ideal, eg.  convert to utf32, reverse, convert back. ;)
 
 Regan

 
 I am not sure what do you mean with this sentence... 
 
 dstring implementation doesn't do things according to your description, so
 it's definitely not a case here...

I'm lost, what is "dstring"?

All I meant was that using std.utf you can say:

char[] text = "<characters which take more than 1 char to represent>";

text = toUTF8(toUTF32(text).reverse);

and the result will be a correctly reversed UTF8 string.  Or am I missing
something?

Regan Heath

May 29 2007

"Aziz K." <aziz.kerim gmail.com> writes:

On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

I think your method doesn't take compound characters into account.

For example:
// The accented é can be represented by a single code-point. But let's  
assume it's a compound character (Ce`a).
writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
// This would print áeC

May 29 2007

Regan Heath <regan netmail.co.nz> writes:

Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

 
 I think your method doesn't take compound characters into account.
 
 For example:
 // The accented é can be represented by a single code-point. But let's  
 assume it's a compound character (Ce`a).

Is it a compound character in UTF32?

 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

Can you code that test up (using the \U character literal syntax so that the
web interface doesn't mangle it) I'd like to play with it.

My statement was based on the assumption that converting UTF8 to UTF32 would
result in all the compound characters being converted/represented by a single
UTF32 codepoint each and would therefore be reversable.

Regan

May 29 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Regan Heath wrote:
 Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

 I think your method doesn't take compound characters into account.

 For example:
 // The accented é can be represented by a single code-point. But let's  
 assume it's a compound character (Ce`a).

 
 Is it a compound character in UTF32?

Unicode defines multiple valid encodings for lots of accented 
characters; typically a single codepoint as well as separate codepoints 
for the accent and the "naked" character that combine when put together.

 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

 
 Can you code that test up (using the \U character literal syntax so that the
web interface doesn't mangle it) I'd like to play with it.
 
 My statement was based on the assumption that converting UTF8 to UTF32 would
result in all the compound characters being converted/represented by a single
UTF32 codepoint each and would therefore be reversable.

I don't think std.utf.toUTF* combine or split accented characters, I'm 
pretty sure it just does codepoint representation conversions (keeping 
the number of codepoints constant).

May 29 2007

Regan Heath <regan netmail.co.nz> writes:

Frits van Bommel Wrote:
 Regan Heath wrote:
 Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

 I think your method doesn't take compound characters into account.

 For example:
 // The accented é can be represented by a single code-point. But let's  
 assume it's a compound character (Ce`a).

 
 Is it a compound character in UTF32?

 
 Unicode defines multiple valid encodings for lots of accented 
 characters; typically a single codepoint as well as separate codepoints 
 for the accent and the "naked" character that combine when put together.

I realise that.  But, the important question is what does toUTF32 do with
compound UTF8 characters (or UTF16 for that matter)?  

 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

 
 Can you code that test up (using the \U character literal syntax so that the
web interface doesn't mangle it) I'd like to play with it.
 
 My statement was based on the assumption that converting UTF8 to UTF32 would
result in all the compound characters being converted/represented by a single
UTF32 codepoint each and would therefore be reversable.

 
 I don't think std.utf.toUTF* combine or split accented characters, I'm 
 pretty sure it just does codepoint representation conversions (keeping 
 the number of codepoints constant).

This is the key issue.  I was under the (perhaps mistaken) impression it
converted them to the single codepoint version (as that was easier), which is
what I based this idea on.  Really a simple test should tell us, can you whip
one up to prove it one way or the other?  

I would, but I don't really use unicode at all and I don't know any compound
characters offhand.  I know, I know, I could google it but I also get the
impression you know a bit more about this and would be able to devise a better
test case, or two.

Ahh.. another thought.  I think I may have based my assumption on the foreach
behaviour, eg.

char[] text = "<compund stuff>";
foreach(dchar d; text) { .. }

this _has_ to give the single codepoint versions, right?

I suspect foreach uses the same code as in std.utf, but I may be wrong.

Regan Heath

May 29 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Regan Heath wrote:
 Frits van Bommel Wrote:
 Regan Heath wrote:
 Aziz K. Wrote:
 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

 Can you code that test up (using the \U character literal syntax so that the
web interface doesn't mangle it) I'd like to play with it.

 My statement was based on the assumption that converting UTF8 to UTF32 would
result in all the compound characters being converted/represented by a single
UTF32 codepoint each and would therefore be reversable.

 I don't think std.utf.toUTF* combine or split accented characters, I'm 
 pretty sure it just does codepoint representation conversions (keeping 
 the number of codepoints constant).

 
 This is the key issue.  I was under the (perhaps mistaken) impression it
converted them to the single codepoint version (as that was easier), which is
what I based this idea on.  Really a simple test should tell us, can you whip
one up to prove it one way or the other?  

---
import std.stdio;
import std.utf;

void main(char[][] args) {
     // Codepoint 0301 is "Combining acute accent".
     // Codepoint 00e9 is "Latin small letter e with acute"
     char[] str = "e\u0301 \u00e9";

     // This doesn't show the combined character on my console.
     // Perhaps my terminal doesn't properly support combining characters.
     // (My encoding is utf-8, so that shouldn't be the problem)
     // The precomposed character (00e9) is displayed properly.
     // When piped to a .html file and wrapped with
     // <html><body>...</body></html> firefox properly displays both.
     writefln(str);
     foreach (dchar c; str) {
         writef("%04x ", c);
     }
     writefln();

     // This produces the exact same output as above code:
     dchar[] dstr = toUTF32(str);
     writefln(dstr);
     foreach (dchar c; dstr) {
         writef("%04x ", c);
     }
     writefln();
}
---

 I would, but I don't really use unicode at all and I don't know any compound
characters offhand.  I know, I know, I could google it but I also get the
impression you know a bit more about this and would be able to devise a better
test case, or two.

I normally have little use for it as well. A few Dutch (my native 
tongue) words need accents, but I'll be damned if I know the codes. Let 
alone those of any combining characters. My usual way of typing those is 
either using the symbol map or just typing it without accents, 
right-click, select spell-check suggestion with accents :).
However, for above test I just looked up the codes in the code charts on 
the unicode website (unicode.org/charts for the precomposed character 
and the "symbols and punctuation" link at the top for the combining 
accent). It's pretty easy to find, actually.

 Ahh.. another thought.  I think I may have based my assumption on the foreach
behaviour, eg.
 
 char[] text = "<compund stuff>";
 foreach(dchar d; text) { .. }
 
 this _has_ to give the single codepoint versions, right?

As demonstrated above, it doesn't. The runtime support for the 
converting foreach statements just imports std.utf and use decode and 
toUTF*[1] (as well as some manual conversion to surrogates in the 
functions dealing with wchar). None of those do anything other than 
decoding and encoding single codepoints.


[1]: The apparently undocumented (buf, dchar) overloads, which don't 
allocate.

 I suspect foreach uses the same code as in std.utf, but I may be wrong.

About this, you're not :P.


I suspect the reason std.utf doesn't do decomposition and/or combining 
is that it would require a lookup table, and possibly quite a big one at 
that. Though generating it shouldn't be a problem; it could be trivially 
extracted from the machine-readable data on the unicode website. Just 
take http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, the sixth 
column is the decomposition of the character in the first column. (It 
may also contain the mapping type between <angle brackets>)
Note that for full decomposition this mapping needs to be applied 
recursively[2], i.e. the characters in the 6th column need to be 
decomposed as well (if possible).

[2]: See the reminder in 
http://www.unicode.org/Public/UNIDATA/UCD.html#Character_Decomposition_Mappings

May 30 2007

Regan Heath <regan netmail.co.nz> writes:

Thanks for this.  It appears you're right :)

I can't get my console to show them either, which is annoying.  I'm on windows,
I set the font to lucida console and typed "chcp 65001" which makes the
precomposed character appear corrently but not the combining character.

Regan Heath

May 31 2007

Marcin Kuszczak <aarti interia.pl> writes:

Regan Heath wrote:

 Marcin Kuszczak Wrote:
 Regan Heath wrote:
=20
 The default language/library support can reverse utf8 and 16 but i=



t's
 not ideal, eg. =C2=A0convert to utf32, reverse, convert back. ;)
=20
 Regan

=20
 I am not sure what do you mean with this sentence...
=20
 dstring implementation doesn't do things according to your descripti=


on,
 so it's definitely not a case here...

=20
 I'm lost, what is "dstring"?
=20
 All I meant was that using std.utf you can say:
=20
 char[] text =3D "<characters which take more than 1 char to represent=
";
=20
 text =3D toUTF8(toUTF32(text).reverse);
=20
 and the result will be a correctly reversed UTF8 string.  Or am I mis=

sing
 something?
=20
 Regan Heath

dstring is implementation of string struct by Chris Miller which takes =
care
about slicing utf8 sequences and is compatible with char[], wchar[] and=

dchar[]. I mentioned it because I think that it's better when foreach k=
now
nothing about slicing utf8 sequence (opposite to way it is implemented
currently). It should be responsibility of string class (like e.g. dstr=
ing)
with proper opApply method. Because my previous e-mail was in context o=
f
dstring, I haven't understood what did you mean... 'reverse' and 'sort'=

could be also implemented in such class in a way which will cope proper=
ly
with utf8 sequences...

http://www.digitalmars.com/d/archives/digitalmars/D/announce/New_string=
_implementation_dstring_1.0_4886.html
http://www.dprogramming.com/dstring.php


--=20
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/d=
oost/
-------------------------------------

May 29 2007

Regan Heath <regan netmail.co.nz> writes:

Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 The default language/library support can reverse utf8 and 16 but it's
 not ideal, eg.  convert to utf32, reverse, convert back. ;)
 
 Regan

 
 I am not sure what do you mean with this sentence...
 
 dstring implementation doesn't do things according to your description,
 so it's definitely not a case here...

 
 I'm lost, what is "dstring"?
 
 All I meant was that using std.utf you can say:
 
 char[] text = "<characters which take more than 1 char to represent>";
 
 text = toUTF8(toUTF32(text).reverse);
 
 and the result will be a correctly reversed UTF8 string.  Or am I missing
 something?
 
 Regan Heath

 
 dstring is implementation of string struct by Chris Miller which takes care
 about slicing utf8 sequences and is compatible with char[], wchar[] and
 dchar[]. I mentioned it because I think that it's better when foreach know
 nothing about slicing utf8 sequence (opposite to way it is implemented
 currently). It should be responsibility of string class (like e.g. dstring)
 with proper opApply method. Because my previous e-mail was in context of
 dstring, I haven't understood what did you mean... 'reverse' and 'sort'
 could be also implemented in such class in a way which will cope properly
 with utf8 sequences...

Ahh, thanks, that clears up the confusion I had.  Yes, a string class/struct
could definately handle the codepoint issue.  It would also be able to handle
it better than the method I suggested, which is a brute force method based on
an assumption which may prove to be false (I suspect toUTF32 it converts UTF8
and 16 to non-compound UTF32 in all cases.  But I could be wrong)

But to respond to your original point (which I didn't address earlier, sorry) I
have no problem with the foreach behaviour:

char[] text = "<compound characters>";
foreach(dchar c; text) { .. }

because, I suspect, the code which handles this is in std.utf (toUTF32)
already.  You seem to want to move the behaviour to a string class, but why
can't it exist in both places?

I guess the problem you might have with it is that it effectively says to
someone implementing a D compiler:  You need to handle conversions from/to
UTF8, 16 and 32 and (assuming I am correct about toUTF32) you need to convert
UTF8 and 16 to non-compound UTF32.

Which might make it harder for someone to implement a D compiler.  I don't know.

Regan Heath

May 29 2007

Reiner Pope <some address.com> writes:

Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.

Perhaps I should just wait for the implementation, but I'm interested in 
knowing what your solution to .dup is. Given

    auto foo = "hello".dup;

what is the type of foo?

How do you support both of

    invariant char[] foo = "hello".dup;
    char[] bar = "hello".dup;

   -- Reiner

May 27 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Reiner Pope wrote:
 Perhaps I should just wait for the implementation, but I'm interested in 
 knowing what your solution to .dup is. Given
 
    auto foo = "hello".dup;
 
 what is the type of foo?

Most likely a plain (mutable) char[].

 How do you support both of
 
    invariant char[] foo = "hello".dup;
    char[] bar = "hello".dup;

Likely the first will be an error as written, requiring a 
cast(invariant) to be inserted.
Of course, since it doesn't make much sense to .dup in the example above 
("hello" is already invariant, and copying an invariant array but not 
modifying the copy isn't typically useful) that shouldn't be much of a 
problem in this case.

For other cases though, I could see how a "unique" (or similar) type 
constructor that would allow implicit conversion to both mutable and 
invariant (and const) types could be useful.
For instance, if the strings in your example were replaced by mutable 
arrays, a "unique char[]" return value of .dup could then be assigned to 
mutable/const/invariant references without needing casts.

May 28 2007

Reiner Pope <some address.com> writes:

Frits van Bommel wrote:
 Reiner Pope wrote:
 Perhaps I should just wait for the implementation, but I'm interested 
 in knowing what your solution to .dup is. Given

    auto foo = "hello".dup;

 what is the type of foo?

 
 Most likely a plain (mutable) char[].
 
 How do you support both of

    invariant char[] foo = "hello".dup;
    char[] bar = "hello".dup;

 
 Likely the first will be an error as written, requiring a 
 cast(invariant) to be inserted.
 Of course, since it doesn't make much sense to .dup in the example above 
 ("hello" is already invariant, and copying an invariant array but not 
 modifying the copy isn't typically useful) that shouldn't be much of a 
 problem in this case.
 
 For other cases though, I could see how a "unique" (or similar) type 
 constructor that would allow implicit conversion to both mutable and 
 invariant (and const) types could be useful.
 For instance, if the strings in your example were replaced by mutable 
 arrays, a "unique char[]" return value of .dup could then be assigned to 
 mutable/const/invariant references without needing casts.

Funny, that's just what I thought of (including the name unique). When I 
  first thought about it, I thought that such a construct would be very 
useful and very powerful, but I can't actually think of any use cases 
except with .dup and other constructor-type functions. (Although 
supporting them should alone be enough motivation).

May 29 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Reiner Pope wrote:
 Frits van Bommel wrote:
 For other cases though, I could see how a "unique" (or similar) type 
 constructor that would allow implicit conversion to both mutable and 
 invariant (and const) types could be useful.
 For instance, if the strings in your example were replaced by mutable 
 arrays, a "unique char[]" return value of .dup could then be assigned 
 to mutable/const/invariant references without needing casts.

 Funny, that's just what I thought of (including the name unique).

I'm pretty sure this has been suggested in these newsgroups in the past, 
including using "unique" as the keyword.

 When I 
  first thought about it, I thought that such a construct would be very 
 useful and very powerful, but I can't actually think of any use cases 
 except with .dup and other constructor-type functions. (Although 
 supporting them should alone be enough motivation).

Some use cases I can think of:
* Obviously, builtin array property .dup, as you mentioned.
* std.utf.toUTF* (except the non-converting ones such as char[] -> char[])
* The result of certain operator overloads (arithmetic in a bignum 
class, opCat in a string class, the result of the builtin ~ operator for 
arrays)
* Lots of stuff in std.string: join, split, maketrans, all the toString 
overloads, format, succ, abbrev. (AFAIK all of these are guaranteed to 
return a unique array)
* toString overloads for classes that return the result of any of the 
above[1] (especially builtin ~ and std.string.format are often useful in 
toString, in my experience).

As you can see, there are plenty of cases where newly allocated objects 
or arrays are returned.


[1]: This one would require the ability to add "unique" in an overridden 
method, since it's a bad idea to require it of all classes. This could 
be considered to fall under the category of covariant return values.

May 29 2007

Walter Bright <newshound1 digitalmars.com> writes:

Frits van Bommel wrote:
 For other cases though, I could see how a "unique" (or similar) type 
 constructor that would allow implicit conversion to both mutable and 
 invariant (and const) types could be useful.
 For instance, if the strings in your example were replaced by mutable 
 arrays, a "unique char[]" return value of .dup could then be assigned to 
 mutable/const/invariant references without needing casts.

We really tried to figure out a way to make "unique" work. It just 
doesn't offer anything useful over a cast(invariant).

The way to create an invariant out of data is to use cast(invariant). As 
with all casts, one has to trust the programmer to use it appropriately. 
After it is cast, the type system will handle enforcement.

You'll be able to cast away invariant, too, but you're on your own if 
you do so.

Jun 02 2007

D Programming

C/C++ Programming

Other

digitalmars.D.announce - string types: const(char)[] and cstring