www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - string types: const(char)[] and cstring

reply Walter Bright <newshound1 digitalmars.com> writes:
Under the new const/invariant/final regime, what are strings going to be 
? Experience with other languages suggest that strings should be 
immutable. To express an array of const chars, one would write:

	const(char)[]

but while that's clear, it doesn't just flow off the keyboard. Strings 
are so common this needs an alias, so:

	alias const(char)[] cstring;

Why cstring? Because 'string' appears as both a module name and a common 
variable name. cstring also implies wstring for wchar strings, and 
dstring for dchars.

String literals, on the other hand, will be invariant (which means they 
can be stuffed into read-only memory). So,
	typeof("abc")
will be:
	invariant(char)[3]

Invariants can be implicitly cast to const.

In my playing around with source code, using cstring's seems to work out 
rather nicely.

So, why not alias cstring to invariant(char)[] ? That way strings really 
would be immutable. The reason is that mutables cannot be implicitly 
cast to invariant, meaning that there'd be a lot of casts in the code. 
Casts are a sledgehammer, and a coding style that requires too many 
casts is a bad coding style.
May 25 2007
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common
 variable name. cstring also implies wstring for wchar strings, and
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really
 would be immutable. The reason is that mutables cannot be implicitly
 cast to invariant, meaning that there'd be a lot of casts in the code.
 Casts are a sledgehammer, and a coding style that requires too many
 casts is a bad coding style.

Thanks for the update; I'm happy to have const strings, and use char[] manually when I want to mutate something. One question though: are the parens necessary? I was under the impression that const and invariant applied to reference types, so it would be const char[] or const(char[]), since char by itself is just a value type. ...this is going to turn into one of those mega threads where we all run around in circles trying to work out which one is which, isn't it? -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
May 25 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Daniel Keep wrote:
 One question though: are the parens necessary?  I was under the
 impression that const and invariant applied to reference types, so it
 would be const char[] or const(char[]), since char by itself is just a
 value type.

const(char)[] => array of const characters const char[] => const array of const characters const(char[]) => const array of const characters Think of const as if it were a template: Const!(T) which returns a const version of its argument. const without any parens means it applies to the whole type.
May 25 2007
parent reply Myron Alexander <someone somewhere.com> writes:
Walter Bright wrote:
 Daniel Keep wrote:
 
 const(char)[] => array of const characters
 const char[] => const array of const characters
 const(char[]) => const array of const characters
 
 Think of const as if it were a template:
 
     Const!(T)
 
 which returns a const version of its argument.
 
 const without any parens means it applies to the whole type.

Looking mighty fine.
May 25 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Myron Alexander wrote:
 Walter Bright wrote:
 Daniel Keep wrote:

 const(char)[] => array of const characters
 const char[] => const array of const characters
 const(char[]) => const array of const characters

 Think of const as if it were a template:

     Const!(T)

 which returns a const version of its argument.

 const without any parens means it applies to the whole type.

Looking mighty fine.

I like it a lot better than the C++ "here a const, there a const, everywhere a const const" like: const char * const * const p; etc. instead of: const(char**) p; Const in D is transitive, so const(char**) is equivalent to: const(const(const(char)*)*) And no, it is not possible to have a pointer to const pointer to mutable. It is both not possible syntactically to declare it, nor is it semantically allowed. You can force the issue with casts (which allow you to do whatever you *need* to do), but the result will be undefined behavior.
May 25 2007
prev sibling next sibling parent reply Howard Berkey <howard well.com> writes:
Nice idea.  I am only concerned that people will see "cstring" and think
"null-terminated "C" string".  Not that that should be a deciding factor by any
means of course.

Walter Bright Wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
 	typeof("abc")
 will be:
 	invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.

May 25 2007
parent reply Myron Alexander <someone somewhere.com> writes:
Howard Berkey wrote:
 Nice idea.  I am only concerned that people will see "cstring" and think
"null-terminated "C" string".  Not that that should be a deciding factor by any
means of course.
 

When I first read Walter's post, I also thought null-terminated strings. I even had it as an alias for toString (converting C string to char[]) as a means to get around the name conflict with Object but I shortened it to "str". I cannot think of another name but "cstring" will cause confusion and defeats the "obvious" rule.
May 25 2007
parent Myron Alexander <someone somewhere.com> writes:
Myron Alexander wrote:
 Howard Berkey wrote:
 Nice idea.  I am only concerned that people will see "cstring" and 
 think "null-terminated "C" string".  Not that that should be a 
 deciding factor by any means of course.

When I first read Walter's post, I also thought null-terminated strings. I even had it as an alias for toString (converting C string to char[]) as a means to get around the name conflict with Object but I shortened it to "str". I cannot think of another name but "cstring" will cause confusion and defeats the "obvious" rule.

Here's a possibility: Instead of cstring, wstring, dstring - charstr, widestr, dblstr.
May 25 2007
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.

So basically most functions that take a char[] now would be changed to take a cstring in your thinking? Is it also correct to say that cstring would be used in the places where one would use const char* or const std::string& in C++? If so that sounds ok to me. But about the naming ... I have to agree that my first thought was "C compatible null terminated string" too, like std::string's .c_str() method in C++. I can probably live with that but I don't like the inconsistency with c/w/d. Plain 'string' really does make the most sense. plain 'w' 'd' ======= ===== ===== char wchar dchar string wstring dstring It wouldn't be quite as bad if you uniformly apply the 'c' to all of them (using 'c' as a flag for constness): plain 'w' 'd' ======= ===== ===== char wchar dchar cstring wcstring dcstring or cstring cwstring cdstring Some people already alias char[] to string. As far as I've heard they haven't run into conflicts with the module name, or with people naming variables 'string'. Question: if you have an alias like alias char[] string; 'const string' automatically applies const to both the char and the [], right? Is that something to be worried about? --bb
May 25 2007
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Bill Baxter wrote:
 'const string' automatically applies const to both the char and the [], 
 right?

Right.
 Is that something to be worried about?

If you want to reassign another value, yes. I suggest: const(char)[] instead.
May 26 2007
prev sibling next sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Bill Baxter wrote:

 Some people already alias char[] to string.  As far as I've heard they 
 haven't run into conflicts with the module name, or with people naming 
 variables 'string'.

I think it would be a problem at the top of the namespace, but it's OK if you use (for instance) "wx.common.string": module wx.common; alias char[] string; Then you can do declarations like: string string = "string"; At least that's how it has been working for the last couple of years, and for Christopher E. Miller's dstring.d as well: module dstring; struct string { ... } --anders
May 26 2007
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Bill Baxter, el 26 de mayo a las 14:59 me escribiste:
 Plain 'string' really does make the most sense.

What about "text"? Please see "The 'string' types" here[1] for an explanation. [1] http://xlr.sourceforge.net/concept/diverge.html -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ .------------------------------------------------------------------------, \ GPG: 5F5A8D05 // F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05 / '--------------------------------------------------------------------' En la calle me crucé con un señor muy correcto, que habitualmente anda en Falcon; iba corriendo con dos valijas en la mano y dijo: "Voy para Miami, tiene algún mensaje o ..." y le dije: "No, no, no..." -- Extra Tato (1983, Triunfo de Alfonsín)
May 26 2007
prev sibling next sibling parent reply Reiner Pope <some address.com> writes:
Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 

 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]

The thing I don't get about this syntax is what happens when you take off the []. 1. invariant(char) c = 'b'; // c is 'b' now, and will never change. 2. final(char) d = 'b'; // but calling it final means the same... 3. const(char) e = 'b'; // ummm... what? It seems like const(char) is a constant char -- one that can't change. Does that make final obsolete? Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)? -- Reiner
May 25 2007
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Reiner Pope wrote:
 Also, I can't see any difference between const(char) and 
 invariant(char), since neither can ever be rebound. In that case, if I 
 assume that they are identical types, how can an array of const(char) be 
 different from an array of invariant(char)?

The difference is when they are reference types, such as arrays of const char, or arrays of invariant chars.
May 26 2007
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Reiner Pope wrote:
 Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to
 be ? Experience with other languages suggest that strings should be
 immutable. To express an array of const chars, one would write:

     const(char)[]

 String literals, on the other hand, will be invariant (which means
 they can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]

The thing I don't get about this syntax is what happens when you take off the []. 1. invariant(char) c = 'b'; // c is 'b' now, and will never change. 2. final(char) d = 'b'; // but calling it final means the same... 3. const(char) e = 'b'; // ummm... what? It seems like const(char) is a constant char -- one that can't change. Does that make final obsolete? Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)? -- Reiner

This is what I'm wondering; I thought const and invariant only applied to reference types (which is why we have final as storage const), in which case, const(char)[] doesn't make any sense... -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
May 26 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Daniel Keep wrote:
 This is what I'm wondering; I thought const and invariant only applied
 to reference types (which is why we have final as storage const), in
 which case, const(char)[] doesn't make any sense...

If you know C++, then const(char)* is the same as: const char* p; // C++ and const(char*) is the same as: const char * const p; // C++ (using * because C++ doesn't have dynamic arrays)
May 26 2007
prev sibling next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter Bright wrote:

 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.

I think cstring is a horrible name. "string" is much better, and in use. (else wouldn't those be wcstring and dcstring or cwstring and cdstring?) That it is made up of constant characters, and that those aren't really characters but instead UTF-8 code units is something that can be hidden. alias const(char)[] string; But "cstring" both sounds awkward, and also leads the mind to C strings. Even if those (char*) would probably be "stringz" in the usual D lingo. If any name conflict with previously existing "string" must be avoided, then "str" is probably a better name... (character->char, integer->int) As was discussed earlier. --anders
May 26 2007
parent reply "Chris Miller" <chris dprogramming.com> writes:
On Sat, 26 May 2007 04:35:34 -0400, Anders F Björklund <afb algonet.se>  
wrote:

 Walter Bright wrote:

 Why cstring? Because 'string' appears as both a module name and a  
 common variable name. cstring also implies wstring for wchar strings,  
 and dstring for dchars.

I think cstring is a horrible name. "string" is much better, and in use. (else wouldn't those be wcstring and dcstring or cwstring and cdstring?) That it is made up of constant characters, and that those aren't really characters but instead UTF-8 code units is something that can be hidden. alias const(char)[] string; But "cstring" both sounds awkward, and also leads the mind to C strings. Even if those (char*) would probably be "stringz" in the usual D lingo. If any name conflict with previously existing "string" must be avoided, then "str" is probably a better name... (character->char, integer->int) As was discussed earlier. --anders

I agree, except I don't care much for "str". I'd prefer it named string. If it's an alias in object.d and not a keyword, it shouldn't be too bad. Actually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to. While on strings, I'll mention another problem I have with D's string handling. "invalid utf8 sequence" (or, if you prefer, "4invalid utf8 sequence"). Other Unicode implementations I've used do not throw such an exception, but interpret the bad parts as replacement characters (U+FFFD). I believe I've also heard that the Unicode standard also recommends being forgiving in this aspect. - Chris
May 26 2007
parent reply Marcin Kuszczak <aarti interia.pl> writes:
Chris Miller wrote:

 Actually, while we're at a change for strings, why not bring in something
 similar to my dstring module, where slicing and indexing never result in
 an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
 code may not be ideal, but it's the concept I'm referring to.

Yup. That's my opinion also... For me advantages of such a string are quite obvious: 1. Easy slicing and indexing of utf8 sequences (without corrupting this sequence - as mention above) 2. Common denominator for char[], wchar[] and dchar[] 3. For classes which doesn't need speed it simplifies API (only one version of functions instead of 3) 4. With some additional support from language (cast operators to different types and opImplicitCast) it can be fully interchangeable with every method taking char[], wchar[], dchar[]. Having another 3 names for string is not very appealing for me. We would have 9 official versions of string available in D: char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char), tango String!(wchar), tango String!(dchar) To write nice, fully functional library you have to write 3 versions of every function which takes different string types (I know, templates makes it a little bit easier). Probably I will not be wrong when I say that reality is that people just write one version for char[], because it is convenient (see: SWT ported from Java). It causes that wchar and dchar are treated as second class citizens in D. Additionally when people design their program for char[], they mostly don't think about issues with slicing of char[] utf8 sequence (warning! assumption!), so default way of writing programs is *NOT SAFE*. When you write code and don't care about bare metal speed it is just tedious to do this additional work... Having one string, which hides differences between char[], wchar[] and dchar[] would solve problem nicely. Adding constness would also be easy. And you use only one reserved keyword - string - for everything. I would be happy to hear some other opinions from people on NG. Maybe I am wrong with above arguments, so probably someone can give counterarguments... I think it is very important issue as it seems that most developers over the world are non-native-english-speakers... PS. See also thread on DWT NG. -- Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/ -------------------------------------
May 26 2007
parent reply renoX <renosky free.fr> writes:
Marcin Kuszczak a crit :
 Chris Miller wrote:
 
 Actually, while we're at a change for strings, why not bring in something
 similar to my dstring module, where slicing and indexing never result in
 an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
 code may not be ideal, but it's the concept I'm referring to.

Yup. That's my opinion also... For me advantages of such a string are quite obvious: 1. Easy slicing and indexing of utf8 sequences (without corrupting this sequence - as mention above) 2. Common denominator for char[], wchar[] and dchar[] 3. For classes which doesn't need speed it simplifies API (only one version of functions instead of 3) 4. With some additional support from language (cast operators to different types and opImplicitCast) it can be fully interchangeable with every method taking char[], wchar[], dchar[]. Having another 3 names for string is not very appealing for me. We would have 9 official versions of string available in D: char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char), tango String!(wchar), tango String!(dchar) To write nice, fully functional library you have to write 3 versions of every function which takes different string types (I know, templates makes it a little bit easier). Probably I will not be wrong when I say that reality is that people just write one version for char[], because it is convenient (see: SWT ported from Java). It causes that wchar and dchar are treated as second class citizens in D. Additionally when people design their program for char[], they mostly don't think about issues with slicing of char[] utf8 sequence (warning! assumption!), so default way of writing programs is *NOT SAFE*. When you write code and don't care about bare metal speed it is just tedious to do this additional work... Having one string, which hides differences between char[], wchar[] and dchar[] would solve problem nicely. Adding constness would also be easy. And you use only one reserved keyword - string - for everything. I would be happy to hear some other opinions from people on NG. Maybe I am wrong with above arguments, so probably someone can give counterarguments... I think it is very important issue as it seems that most developers over the world are non-native-english-speakers... PS. See also thread on DWT NG.

I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default. The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost). renoX
May 27 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
renoX Wrote:
 I agree with you, I don't think that the string should be a char[] 
 alias, wether it's const or not but a class with char[],dchar[],wchar[] 
 under the hood representation and safe slicing by default.
 
 The difficulty is providing enough flexibility for managing correctly 
 the internal representation: there should be a possibility to say use 
 UTF8 even though there are multibyte characters for example (a size 
 optimization with some CPU cost).

I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCII or only one of UTF8, 16 or 32 for example) wont need the sorts of things this class provides and can get away with just using 'const(char[])' AKA 'string'. Basically I think there is a ample room for both 'string' as an alias and 'String' as a class to exist at the same time. Regan
May 27 2007
parent reply renoX <renosky free.fr> writes:
Regan Heath a crit :
 renoX Wrote:
 I agree with you, I don't think that the string should be a char[]
  alias, wether it's const or not but a class with
 char[],dchar[],wchar[] under the hood representation and safe
 slicing by default.
 
 The difficulty is providing enough flexibility for managing
 correctly the internal representation: there should be a
 possibility to say use UTF8 even though there are multibyte
 characters for example (a size optimization with some CPU cost).

I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCII

Hopefully a rare thing now.
 or only one of UTF8, 16 or 32 for example)

Sure, but this makes the code less portable (or less efficient when it's not on its "original" OS): Windows use UTF16, Linux UTF8..
 wont need the sorts of
 things this class provides and can get away with just using
 'const(char[])' AKA 'string'.  Basically I think there is a ample
 room for both 'string' as an alias and 'String' as a class to exist
 at the same time.

Room of course, but IMHO one should almost always use the class (except in wrappers of native calls) instead of the alias. renoX
 
 Regan

May 27 2007
parent Regan Heath <regan netmail.co.nz> writes:
renoX Wrote:
 Regan Heath a crit :
 renoX Wrote:
 I agree with you, I don't think that the string should be a char[]
  alias, wether it's const or not but a class with
 char[],dchar[],wchar[] under the hood representation and safe
 slicing by default.
 
 The difficulty is providing enough flexibility for managing
 correctly the internal representation: there should be a
 possibility to say use UTF8 even though there are multibyte
 characters for example (a size optimization with some CPU cost).

I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCII

Hopefully a rare thing now.

No, sadly they aren't. Most existing applications these days deal with ASCII or one of the strange code pages (which youd handle in D with ubyte and appropriate conversion to one of UTF8, 16 or 32 internally). Granted in the case of the code page apps you might want a String class which can be produced by a <codepage>toString() free function which leverages iconv (which is just what I suggested) However you may only want to deal with them as UTF-8 internally therefore not need the functionality provided by the class, opting instead to use 'string' directly. Sure, in the future I expect/hope people will move to UTF8, 16, and 32 but I suspect code pages will be hauting us for many years to come.
 wont need the sorts of
 things this class provides and can get away with just using
 'const(char[])' AKA 'string'.  Basically I think there is a ample
 room for both 'string' as an alias and 'String' as a class to exist
 at the same time.

Room of course, but IMHO one should almost always use the class (except in wrappers of native calls) instead of the alias.

I think that's an invalid assertion, specifically your use of the word 'always'. There are 'almost certainly' (see, my term leaves room for me to be wrong) many cases where the alias would be preferred, most likely for performance reasons, espeically if the added functionality isn't required. In other words, all I'm saying is; sometimes you want it, sometimes you don't. Both can exist, both can be used and both should be interchangable (without too much trouble). Regan
May 28 2007
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. 

We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
 To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.

No it doesn't. I have rarely seen 'string' used as a variable. In phobos it is used in boxer.d and regexp.d only. I use it as an alias for 'char[]'. I see 'str' used fairly often but not so much 'string'. 'cstring' is pronounced C-String which instantly brings to mind the 'string' implementation used by C language. Not something I imagine you wish to imply.
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
 	typeof("abc")
 will be:
 	invariant(char)[3]
 
 Invariants can be implicitly cast to const.

So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right? void func(const(char)[] x) { x = "def"; // ok x.length = 0; // ok x[0] = 'd'; // fails } And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right? void func(invariant(char)[] x) { x = "def"; // fails x.length = 0; // fails x[0] = 'd'; // ok } So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed? void func(char const([]) x) ??? { x = "def"; // fails x.length = 0; // fails x[0] = 'd' // ok } -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 26 2007
next sibling parent reply Marcin Kuszczak <aarti interia.pl> writes:
Derek Parnell wrote:

 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more? -- Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/ -------------------------------------
May 26 2007
next sibling parent Johan Granberg <lijat.meREM OVEgmail.com> writes:
Marcin Kuszczak wrote:

 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?

In my experience they are not really usefull at all (const as in constant that is). Sometimes it does not matter and sometimes it is inconvenient or a performance problem. (it is mostly append that is needed in my experience) If function parameters was const by default (as in the new behavior of in) I see no use of immutability here. In java I think it is used to prevent aliased String objects from changing value, something that could create unexpected bugs if used by programmers not understanding aliasing. ps. although I'm no fan of java I have used it for most university assignments for the past two years, so hopfully I'm not totally wrong ;)
May 26 2007
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Marcin Kuszczak wrote:
 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?

Ditto here. When I've used java I found it more annoying that strings were immutable than anything else. --bb
May 26 2007
next sibling parent reply gareis <dhasenan gmail.com> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 Marcin Kuszczak wrote:
 Derek Parnell wrote:

 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?

were immutable than anything else. --bb

I found it more bothersome by far that Integer, Float, etc were immutable. Even after going through all the trouble of getting classes for all these, you couldn't use them for out or inout parameters to functions. Scratch that -- what was really annoying was that you couldn't ever *specify* how you wanted your parameter. Even in C, you can pass an address (but then, anything's possible in C). But in Java, you can only call by reference with a class or an array, so you end up doing things like: void foo(int[1] inout_parameter) { inout_parameter[0] += 5; } And the only way to get scope const final sort of deal on a class is to copy and then submit the copy as a final parameter -- it's the reference, not the data, that's final. In short, thank you, Walter, for allowing us to pass anything by reference, and by allowing the data referenced to be made read-only.
May 26 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
gareis wrote:
 In short, thank you, Walter, for allowing us to pass anything by reference,
and by
 allowing the data referenced to be made read-only.

You're welcome. Different languages offer different pieces, only D will offer the whole customizable shebang. The idea is for programs to be more self-documenting, and so make automated analysis more feasible.
May 26 2007
prev sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Bill Baxter wrote:

 The same here. I don't have much experience with Java and really don't 
 know
 why const strings are so usefull...
 Maybe someone could elaborate a little bit more?

Ditto here. When I've used java I found it more annoying that strings were immutable than anything else.

When using Java (and Objective-C), I've found it very useful that strings (and others) are immutable since they are then thread-safe. --anders
May 27 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Anders F Bjrklund wrote:
 When using Java (and Objective-C), I've found it very useful that 
 strings (and others) are immutable since they are then thread-safe.

Being able to treat strings as value types is where the big simplification (in user code) comes, and invariant strings should do that.
May 27 2007
prev sibling parent Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Marcin Kuszczak wrote:
 Derek Parnell wrote:
 
 Under the new const/invariant/final regime, what are strings going to be
 ? Experience with other languages suggest that strings should be
 immutable.

changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.

The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?

It might also be educational to look at Python, which also has immutable strings. The first, and probably most important reason why strings are immutable in Python is so they can be used as hash keys. (Mutating an object being used as a hash key is bad, bad, bad.) Other reasons are addressed here: http://effbot.org/pyfaq/why-are-python-strings-immutable.htm However, Python is a very different kind of language from D. Using strings as hash keys is extraordinarily important in Python, as the use of any identifier is in essence a hash lookup. Providing immutable strings in D is very useful (so the compiler can enforce copy-on-write semantics, for instance), and I don't think anyone would dispute that. The issue seems to be whether the "default" string alias should be immutable. I would say, since D seems to subscribe to copy-on-write semantics, that it should be. And of course, if you need mutable strings, you will always be able to declare a char[]. -- Kirk McDonald http://kirkmcdonald.blogspot.com Pyd: Connecting D and Python http://pyd.dsource.org
May 26 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very common
 in the sorts of programs I write.

You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.
 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

Right.
 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable. Const is only immutable through the reference - another reference to the same data can change it.
 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?

final char[] x;
May 26 2007
next sibling parent reply Reiner Pope <some address.com> writes:
Walter Bright wrote:
 Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very 
 common
 in the sorts of programs I write.

You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.

say, "This is the only reference to this data, so it's ok for me to make this invariant" ? Does 'scope' happen to have anything to do with that? invariant(char)[] createJunk() { /* scope? */ char[] val = "aaaaa".dup; size_t index = rand() % 5; val[index] = rand(); return cast(invariant(char)[]) val; } I mean, do I really need to cast it to invariant there? It's easy to see that there's only one copy of val's data in existance. -- Reiner
May 26 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to make 
 this invariant" ?

Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.
 Does 'scope' happen to have anything to do with that?

No. Scope just ensures that the reference does not 'escape' the scope it's in.
 invariant(char)[] createJunk()
 {
     /* scope? */ char[] val = "aaaaa".dup;
     size_t index = rand() % 5;
     val[index] = rand();
 
     return cast(invariant(char)[]) val;
 }
 
 I mean, do I really need to cast it to invariant there? It's easy to see 
 that there's only one copy of val's data in existance.

Easy for you to see, not so easy for the compiler to. And besides: return cast(invariant)val; will do the trick more conveniently.
May 26 2007
next sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Walter Bright wrote:
 Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to 
 make this invariant" ?

Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.
 Does 'scope' happen to have anything to do with that?

No. Scope just ensures that the reference does not 'escape' the scope it's in.
 invariant(char)[] createJunk()
 {
     /* scope? */ char[] val = "aaaaa".dup;
     size_t index = rand() % 5;
     val[index] = rand();

     return cast(invariant(char)[]) val;
 }

 I mean, do I really need to cast it to invariant there? It's easy to 
 see that there's only one copy of val's data in existance.

Easy for you to see, not so easy for the compiler to. And besides: return cast(invariant)val; will do the trick more conveniently.

That's an interesting syntax, casting to a trait/attribute with the rest of the type inferred. I presume cast(const) works as well. (Maybe cast(scope)? Then again, what's the use...) Given cast(*) where * is invariant/const, is cast(*)T[] the same as cast(*(T)[]) or cast(*(T[]))? That is, does the trait apply to the element type, or the array? -- Chris Nicholson-Sauls
May 26 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Chris Nicholson-Sauls wrote:
 That's an interesting syntax, casting to a trait/attribute with the rest 
 of the type inferred.  I presume cast(const) works as well.  (Maybe 
 cast(scope)?  Then again, what's the use...)  Given cast(*) where * is 
 invariant/const, is cast(*)T[] the same as cast(*(T)[]) or 
 cast(*(T[]))?  That is, does the trait apply to the element type, or the 
 array?

Both.
May 26 2007
prev sibling parent reply Reiner Pope <some address.com> writes:
Walter Bright wrote:
 Reiner Pope wrote:
 Will there be something in the type system which enables you to safely 
 say, "This is the only reference to this data, so it's ok for me to 
 make this invariant" ?

Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.
 Does 'scope' happen to have anything to do with that?

No. Scope just ensures that the reference does not 'escape' the scope it's in.

I must have misunderstood what scope specifies. I had thought that, to avoid being escaped, scope specified that your variable may not be aliased by another (non-scope) name. In that case, I thought, can't you say: "well, when I leave this function, I'm the only one holding a reference to this data, so it would be safe to call it invariant (or anything else I choose)." I thought a compiler could have a special case saying, "at the end of scope, you can safely turn any scope variables into whatever you want". However, I was surprised to find out that the following code compiled fine, although it returns a dead object: Foo foo() { scope Foo f = new Foo(); Foo g = f; return g; } -- Reiner
May 26 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Reiner Pope wrote:
 However, I was surprised to find out that the following code compiled 
 fine, although it returns a dead object:

Sadly, it currently isn't enforced.
May 26 2007
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 26 May 2007 22:27:18 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 We seem to have different experience. Most of the code I write deals with
 changing strings - in other words, manipulating strings is very very common
 in the sorts of programs I write.

You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.

While that is interesting, it has not much to do with what I was saying. You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed. So now I'm thinking that we are talking about different things when we use the word "string". I'm guessing you are really referring to compile-time generated string data (e.g. literals) rather than run-time generated string data.
 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

Right.
 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.

Huh??? Isn't that what I just said? Now I'm even more confused about these terms. They are just not intuitive, are they?
 Const is only immutable through the 
 reference - another reference to the same data can change it.

Ok ... so this below won't fail ... void func(const char[] parm) { char [] q; q = parm; q[0] = 'a'; } or is the "q = parm" not really permitted.
 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?

final char[] x;

Given the syntax on the form " void func(<X> char[] parm) ", is the table below true ... *-------------------------------------* | <X> + parm.ptr | parm[0] | |-------------+-----------------------+ | const | mutable | immutable | | final | immutable | mutable | | invariant | immutable | immutable | | | mutable | mutable | *-------------------------------------* I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on. I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 26 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 You said "strings should be immutable" and I saying that seems odd because
 my experience is that most strings are meant to be changed. 

I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace. But still, in D, nothing prevents you from using mutable strings.
 So now I'm thinking that we are talking about different things when we use
 the word "string". I'm guessing you are really referring to compile-time
 generated string data (e.g. literals) rather than run-time generated string
 data.

I'm referring to the arrays of characters, generated or literals.
 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

truly, absolutely, immutable.

Huh??? Isn't that what I just said?

No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.
 Now I'm even more confused about these
 terms. They are just not intuitive, are they?

The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.
 Const is only immutable through the 
 reference - another reference to the same data can change it.

Ok ... so this below won't fail ... void func(const char[] parm) { char [] q; q = parm;

       q[0] = 'a';
   }
 
 or is the "q = parm" not really permitted.

Right.
 
 So what syntax is to be used so that x.ptr and x.length cannot be changed
 but the characters referred to by 'x' can be changed?


Given the syntax on the form " void func(<X> char[] parm) ", is the table below true ... *-------------------------------------* | <X> + parm.ptr | parm[0] | |-------------+-----------------------+ | const | mutable | immutable | | final | immutable | mutable | | invariant | immutable | immutable | | | mutable | mutable | *-------------------------------------*

You've got invariant wrong, it's mutable|immutable.
 I'm sorry I'm a bit slow on this ... but what is the difference between
 "invariant" and "const final" ? Is it that "invariant" is sort of a global
 effect but "const final" is only in effect for the specific reference it
 occurs on.

First differences: final is a *storage class*. const and invariant are *type constructors*. final only refers to the actual value that a symbol has, and it means that, once a value is assigned to a symbol, that value can never change. If the value is a pointer or reference, what it points to *can* be changed. int x = 3; final int* p = &x; p = null; // error, p is final *p = 1; // ok const(int)* q = null; q = &x; // ok, q is not const, and now *q is 1 *q = 2; // error, *q is const *p = 5; // ok, but now *q is 5, too! x = 6; // ok, but now *q is 6 invariant(int)* s = null; s = &x; // error, cannot implicitly convert int* to invariant(int)* int y = 4; s = cast(invariant(int)*)&y; // ok, trust programmer that y is immutable *s = 3; // error, *s is immutable y = 5; // undefined behavior, as y is never supposed to change, // and compiler assumes *s is still 4 Note that int* can be implicitly converted to const(int)*, and invariant(int)* can be implicitly converted to const(int)*.
 I'm not looking forward to reading the docs on this. I hope you get a lot
 of people to edit the docs to make it understandable for everyone.

The thing is actually rather simple, but I am having trouble finding the right words to express it. Certainly, the mishmash of C++ const has badly muddied the waters about what const means.
May 27 2007
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote:

Thanks for taking the time out to help me understand the proposed D
changes. I really appreciate it.

I think that I'm going to have to wait until you have an implementation to
try it on; to see how it fits with my terminology and needs.

 Derek Parnell wrote:
 You said "strings should be immutable" and I saying that seems odd because
 my experience is that most strings are meant to be changed. 

I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace.

Ok we'll leave it that then. However the phrase "once constructed" is the key one I suspect. Its like saying, once I've finished changing things I don't want them to change anymore - no argument there. So the idea would be to work with mutable strings until they are finished being constructed and then cast them to immutable for the rest of the run time. I'm thinking here of things like changing case, macro expansion, standarizing file names, constructing message text, etc ...
 But still, in D, nothing prevents you from using mutable strings.

That's why I can see that I'll be continuing to use 'alias char[] string', unless you make 'string' the immutable beastie of course <g>
 So 'const(char)[] x' means that I can change x.ptr and x.length but I
 cannot change anything that x.ptr points to, right?

 And  'invariant(char)[] x' means that I cannot change x.ptr or x.length and
 I cannot change anything that x.ptr points to, right?

truly, absolutely, immutable.

Huh??? Isn't that what I just said?

No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.

See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever". But it seems that I'm wrong ... invariant char[] x; x = "abc".dup; // The string 'x' now contains "abc"; x = "def".dup; // The string (which is not supposed to change // i.e invariant) has been changed to "def". Now this is counter-intuitive (read: *WEIRD*), no?
 Now I'm even more confused about these
 terms. They are just not intuitive, are they?

The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.

Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G> I think I read that someone suggested that 'const' be a contraction of 'constrained' rather than 'constant' - that might help. And that 'invariant' is longer than 'const' so its effect is 'bigger'. invariant char[] x; // The data pointed to by 'x' cannot be changed // by anything anytime during the execution // of the program. // (So how do I populate it then? Hmmmm ...) const char[] y; // The data pointed to by 'y' cannot be changed // by anything anytime during the execution // of the program when using the 'y' variable, // however using another variable that also // refers to y's data, or some of it, is ok. For example ... void func (const char[] a, char[] b) { a[0] = 'a'; // fails b[0] = 'a'; // succeeds } char[] y = "def".dup; func( y, y);
 I'm sorry I'm a bit slow on this ... but what is the difference between
 "invariant" and "const final" ? Is it that "invariant" is sort of a global
 effect but "const final" is only in effect for the specific reference it
 occurs on.

First differences: final is a *storage class*. const and invariant are *type constructors*.

Thanks. So 'final' means that it can be changed (from its initial default value) once and only once. /* --- Scenario #1 --- */ final int r; r = randomer(); // succeeds foo(); // fails int randomer() { // Get a random integer between -100 and 100. return cast(int)(std.random.rand() % 201) - 100; } void foo() { r = randomer(); // success depends on whether or not 'r' // has already been set. } /* --- Scenario #2 --- */ final int r; foo(); // succeeds r = randomer(); // fails int randomer() { // Get a random integer between -100 and 100. return cast(int)(std.random.rand() % 201) - 100; } void foo() { r = randomer(); // success depends on whether or not 'r' // has already been set. } Is this a run-time check or a compile time one? If run-time, would it be possible to somehow 'unfinal' a variable using some implementation dependant trickery.
 I'm not looking forward to reading the docs on this. I hope you get a lot
 of people to edit the docs to make it understandable for everyone.

The thing is actually rather simple, but I am having trouble finding the right words to express it.

And thus my comment re editors.
 Certainly, the mishmash of C++ const has 
 badly muddied the waters about what const means.

I have no real knowledge of C++ or its const, and I'm still weirded out by it all <G> -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

Where you're going wrong is that there are two parts to a dynamic array - the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.
 But it seems that I'm wrong ...
 
  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".
 
 Now this is counter-intuitive (read: *WEIRD*), no?

The first issue is that you've confused: invariant char[] x; with: invariant(char)[] x; Remember, there are TWO parts to an array, and the invariantness can be controlled for either independently, or both. This isn't different from in C++ there are two parts to a char*, the char part, and the pointer part.
 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

English is imprecise and ambiguous, that's why we have mathematical languages, and programming languages.
   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

You can't populate an invariant(char)[] array (which is what you meant, not invariant char[]). The way to get one is to cast an existing array to invariant.
   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.

Yes, but here again, const(char)[].
 For example ...
 
   void func (const char[] a, char[] b)
   {
         a[0] = 'a'; // fails
         b[0] = 'a'; // succeeds
   }
 
   char[] y = "def".dup;
   func( y, y);

Yup, that's the aliasing issue with const.
 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.

No. 'final' means it is set only at initialization.
 /* --- Scenario #1 --- */
   final int r;
   r = randomer(); // succeeds

final int r = randomer();
   foo(); // fails 
 
   int randomer() { 
       // Get a random integer between -100 and 100.
       return cast(int)(std.random.rand() % 201) - 100; 
   }
   void foo() { 
     r = randomer(); // success depends on whether or not 'r' 
                     // has already been set.

   }
 

 Is this a run-time check or a compile time one?

Compile time.
 If run-time, would it be
 possible to somehow 'unfinal' a variable using some implementation
 dependant trickery.

Yes, but the result is undefined behavior. Just like if you went around the typing system and converted an int into a pointer, and tried to access data with it. You can do it, but you're on your own with that.
 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

I'm beginning to realize that unless one understands how types are represented at run time, one will never understand const.
May 27 2007
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

Where you're going wrong is that there are two parts to a dynamic array - the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.

I know that you know that I know this about arrays already (did I really just say that!?) so I assume you are talking to the greater audience that we have here. So to immutalize (see it must be a real word as someone else is using it<g>) just the ptr/length parts I'd use ... invariant char([]) ????? char invariant[] ???? and invariant char[] is the same as invariant (char[]) right?
 But it seems that I'm wrong ...
 
  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".
 
 Now this is counter-intuitive (read: *WEIRD*), no?


In my thinking the term 'string' refers to the whole ptr/length/content group. So when one says that a string is immutable I'm thinking they are saying that every aspect of the string does not change. This is where I suspect that we are having terminology problems.
 The first issue is that you've confused:
 	invariant char[] x;
 with:
 	invariant(char)[] x;

Yep - guilty as charged, your honour. Actually it is not so much confusion rather just a poor typing regime, as I really did understand the difference but I typed in the wrong thing. But let's continue ...
 Remember, there are TWO parts to an array, and the invariantness can be 
 controlled for either independently, or both. This isn't different from 
 in C++ there are two parts to a char*, the char part, and the pointer part.

What is the syntax for controlling *just* the reference part of an array?
 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

English is imprecise and ambiguous, that's why we have mathematical languages, and programming languages.

Has anyone got a dictionary in which "constant" and "invariant" are not synonyms? Sure I agree that "English is imprecise and ambiguous" when taken as a whole but not every word is such. So when one uses English words in a programming language the natural thing is to assume that the programming language meaning has a high degree of correlation with the English meaning.
   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

You can't populate an invariant(char)[] array (which is what you meant, not invariant char[]). The way to get one is to cast an existing array to invariant.

char[] name; name = GetUserName(); invariant (char)[] newb = cast(invariant)name; void foo() { name[0] = toUpperCase(name[0]); } // Is this valid? foo(); // What about this?
   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.

Yes, but here again, const(char)[].

Yeah yeah yeah ... I can see how an alias is going to be a boon.
 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.

No. 'final' means it is set only at initialization.

And initialization means "on the same statement that declares the variable"? In English, initialization means whenever some thing is initialized rather than one specific type of initialization.
 /* --- Scenario #1 --- */
   final int r;


Ok, so the above "initializes" the symbol to zero, being the default value of an int, and it cannot be changed to anything else now.
   r = randomer(); // succeeds

final int r = randomer();

Got it.
 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

I'm beginning to realize that unless one understands how types are represented at run time, one will never understand const.

Nah, it's probably just me that's being thick, ... and I *do* understand the run-time implementation of the D constructs. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
parent Charles D Hixson <charleshixsn earthlink.net> writes:
Derek Parnell wrote:
 On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
 See, this is what is weird ... I can have an invariant string which can be
 changed, thus making it not really invariant in the English language sense.
 I'm still thinking that "invariant" means "does not change ever". 

- the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.

I know that you know that I know this about arrays already (did I really just say that!?) so I assume you are talking to the greater audience that we have here. So to immutalize (see it must be a real word as someone else is using it<g>) just the ptr/length parts I'd use ... invariant char([]) ????? char invariant[] ???? and invariant char[] is the same as invariant (char[]) right?
 But it seems that I'm wrong ...

  invariant char[] x; 
  x = "abc".dup;  // The string 'x' now contains "abc";
  x = "def".dup;  // The string (which is not supposed to change
                  // i.e invariant) has been changed to "def".

 Now this is counter-intuitive (read: *WEIRD*), no?


In my thinking the term 'string' refers to the whole ptr/length/content group. So when one says that a string is immutable I'm thinking they are saying that every aspect of the string does not change. This is where I suspect that we are having terminology problems.
 The first issue is that you've confused:
 	invariant char[] x;
 with:
 	invariant(char)[] x;

Yep - guilty as charged, your honour. Actually it is not so much confusion rather just a poor typing regime, as I really did understand the difference but I typed in the wrong thing. But let's continue ...
 Remember, there are TWO parts to an array, and the invariantness can be 
 controlled for either independently, or both. This isn't different from 
 in C++ there are two parts to a char*, the char part, and the pointer part.

What is the syntax for controlling *just* the reference part of an array?
 Okay, I've got that now ... but how to remember that two terms that mean
 the same in English actually mean different things in D <G>

languages, and programming languages.

Has anyone got a dictionary in which "constant" and "invariant" are not synonyms? Sure I agree that "English is imprecise and ambiguous" when taken as a whole but not every word is such. So when one uses English words in a programming language the natural thing is to assume that the programming language meaning has a high degree of correlation with the English meaning.
   invariant char[] x; // The data pointed to by 'x' cannot be changed
                       // by anything anytime during the execution
                       // of the program.
                       // (So how do I populate it then? Hmmmm ...)

not invariant char[]). The way to get one is to cast an existing array to invariant.

char[] name; name = GetUserName(); invariant (char)[] newb = cast(invariant)name; void foo() { name[0] = toUpperCase(name[0]); } // Is this valid? foo(); // What about this?
   const char[] y;    // The data pointed to by 'y' cannot be changed
                      // by anything anytime during the execution
                      // of the program when using the 'y' variable,
                      // however using another variable that also
                      // refers to y's data, or some of it, is ok.


Yeah yeah yeah ... I can see how an alias is going to be a boon.
 Thanks. So 'final' means that it can be changed (from its initial default
 value) once and only once.


And initialization means "on the same statement that declares the variable"? In English, initialization means whenever some thing is initialized rather than one specific type of initialization.
 /* --- Scenario #1 --- */
   final int r;


Ok, so the above "initializes" the symbol to zero, being the default value of an int, and it cannot be changed to anything else now.
   r = randomer(); // succeeds

final int r = randomer();

Got it.
 I have no real knowledge of C++ or its const, and I'm still weirded out by
 it all <G>

represented at run time, one will never understand const.

Nah, it's probably just me that's being thick, ... and I *do* understand the run-time implementation of the D constructs.

examples. The text is sufficient to point folk in the right general direction, but the examples will be necessary to highlight the minimal distinctions. And as my C++ is really quite minimal, and predates templates being generally available, I don't think I'm being confused by how C++ uses it. OTOH, I fequently need to do things like: char[] stuff = "alperferous"; stuff = stuff[0..5] ~ "if" ~ stuff[5..length]; (silly example, but it's short!) Given what I've read so far I suppose this means that I just keep avoiding const & invariant, but I do think of this as string manipulation, as thus "Strings are constant by default" sets warning bells ringing. (Probably inappropriately, admittedly. But perhaps this should be said differently in the documentation.)
Jun 10 2007
prev sibling next sibling parent janderson <askme me.com> writes:
Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 

If you decide on an alias it would be a good idea to add it to phobos for DMD 1, except without the const syntax of course. That way people can start using it now and have less problems upgrading to DMD 2. Although, on the other hand, it may be slightly confusing on 1.0 coders I guess when it doesn't function as a const string. -Joel
May 26 2007
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:

 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
 	const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
 	alias const(char)[] cstring;
 

const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?
 
 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.
May 27 2007
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:

 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?


Any comment on the above?
 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.

Good, but when? At run time or compile time? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?


Any comment on the above?

Looks right to me.
 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?


Good, but when? At run time or compile time?

compile time.
May 27 2007
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 27 May 2007 16:32:57 -0700, Walter Bright wrote:

 Derek Parnell wrote:
 On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:
 
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?


Any comment on the above?

Looks right to me.

But didn't you say that "invariant char[]" means that "invariant" applies to both the array reference and the contents? In other words its the same as "invariant (char[])" but above I said that this means that the array is immutable but the contents are not. What is the syntax for an immutable array of mutable characters? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.
Jun 02 2007
next sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example. An immutable array of mutable data for... e.g. render to texture in a software renderer (or creating data for a hw texture, or whatnot) So you basically pass a texture buffer to a function. You don't want it to realloc the buffer, just to modify its contents... What am I missing here? ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Jun 02 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.

In general.
 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So you 
 basically pass a texture buffer to a function. You don't want it to 
 realloc the buffer, just to modify its contents...
 
 What am I missing here? ;)

We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.
Jun 02 2007
next sibling parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Walter Bright wrote:
 We can all come up with an example, the more interesting case is is it a 
 compelling example? I'm not seeing that.

Well, it's based on a true story. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Jun 02 2007
prev sibling next sibling parent Derek Parnell <derek psych.ward> writes:
On Sat, 02 Jun 2007 18:25:46 -0700, Walter Bright wrote:

 Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.

In general.
 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So you 
 basically pass a texture buffer to a function. You don't want it to 
 realloc the buffer, just to modify its contents...
 
 What am I missing here? ;)

We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.

Define 'compelling'. The only workaround I can see is bit restrictive ... final TextureBuffer t = CreateTextureBuffer(); RenderToBuffer( t ); DoLighting(t); ... -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Jun 02 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 Tom S wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.

In general.
 An immutable array of mutable data for... e.g. render to texture in a 
 software renderer (or creating data for a hw texture, or whatnot) So 
 you basically pass a texture buffer to a function. You don't want it 
 to realloc the buffer, just to modify its contents...

 What am I missing here? ;)

We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.

Most array algorithms would apply. But I'm still not sure I see the point of having an immutable reference, because it's just passed by value anyway. Who cares if the size of the array is modified within a function where it's not passed by reference? The change is just local to the function anyway. Sean
Jun 03 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:f3uoj5$1b1i$1 digitalmars.com...

 Most array algorithms would apply.  But I'm still not sure I see the point 
 of having an immutable reference, because it's just passed by value 
 anyway.  Who cares if the size of the array is modified within a function 
 where it's not passed by reference?  The change is just local to the 
 function anyway.

If that array is pointing into some meaningful area of memory (like in the example, a texture buffer), resizing the array could (probably would) move the array around, which I guess isn't illegal but then the function operating on the array wouldn't be accessing the correct place. Prevent them from changing the length, it prevents them from accessing anywhere but there.
Jun 03 2007
parent reply Sean Kelly <sean f4.ca> writes:
Jarrett Billingsley wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:f3uoj5$1b1i$1 digitalmars.com...
 
 Most array algorithms would apply.  But I'm still not sure I see the point 
 of having an immutable reference, because it's just passed by value 
 anyway.  Who cares if the size of the array is modified within a function 
 where it's not passed by reference?  The change is just local to the 
 function anyway.

If that array is pointing into some meaningful area of memory (like in the example, a texture buffer), resizing the array could (probably would) move the array around, which I guess isn't illegal but then the function operating on the array wouldn't be accessing the correct place. Prevent them from changing the length, it prevents them from accessing anywhere but there.

Well yeah. I don't personally think this is a problem because it doesn't affect the callee in any way, but I can see how others might disagree. Doesn't 'final' do this now though? Sean
Jun 03 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Started new thread with reply: resizeable arrays: T[new]
Jun 04 2007
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

What, there isn't one? Isn't that what final does? Like this: final char[] charar = new char[](20); charar[1] = 'x'; // Allowed charar = new char[](20); // Not allowed charar.length = 10; // Not allowed -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 03 2007
parent Walter Bright <newshound1 digitalmars.com> writes:
Bruno Medeiros wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
 What is the syntax for an immutable array of mutable characters?

There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.

What, there isn't one? Isn't that what final does? Like this: final char[] charar = new char[](20); charar[1] = 'x'; // Allowed charar = new char[](20); // Not allowed charar.length = 10; // Not allowed

Final only works at the outermost level. There is no way to have a mutable pointer to a const pointer to mutable data.
Jun 04 2007
prev sibling parent reply noSpam <""pelekhay\" (noSpam)gmail.com"> writes:
Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.
May 27 2007
parent reply Myron Alexander <someone somewhere.com> writes:
noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.
May 27 2007
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.

Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate. Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library. / Oskar
May 27 2007
next sibling parent Myron Alexander <someone somewhere.com> writes:
Oskar Linde wrote:
 Which would be very confusing. This is instead a perfect opportunity to 
  take the *much* better path of finally depreciating the .sort and 
 .reverse "properties". Equally good or better library implementations 
 are possible (and exists). For example, .sort can't take an ordering 
 predicate. Also, the special casing of reversing char[] and wchar[] 
 arrays, preserving the encoded unicode code points is definitely (imho) 
 too specialized to belong in the language (runtime) as opposed to a 
 library.
 
 / Oskar

I see your point and agree. Regards, Myron.
May 28 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.

Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.

+1 (and thanks for your predicate-accepting sort routine, Oskar!)
 Also, the special casing of reversing char[] and wchar[] 
 arrays, preserving the encoded unicode code points is definitely (imho) 
 too specialized to belong in the language (runtime) as opposed to a 
 library.

No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too? --bb
May 28 2007
parent reply Aarti_pl <aarti interia.pl> writes:
Bill Baxter pisze:
 Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.

Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.

+1 (and thanks for your predicate-accepting sort routine, Oskar!)

+1
 
 Also, the special casing of reversing char[] and wchar[] arrays, 
 preserving the encoded unicode code points is definitely (imho) too 
 specialized to belong in the language (runtime) as opposed to a library.

No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too?

IMHO that should not be in language. That's why I am opting for string *library* class/struct which could take care about such cases. BR Marcin Kuszczak (Aarti_pl)
May 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Aarti_pl Wrote:
 Bill Baxter pisze:
 Oskar Linde wrote:
 Myron Alexander skrev:
 noSpam wrote:
 Walter Bright wrote:
 Derek Parnell wrote:
  const(char)[]  // A mutable array of immutable characters?
  const(char[])  // An immutable array of mutable characters?
  const(const(char)[]) // An immutable array of immutable characters?
  char[]         // A mutable array of mutable characters?

 What will happen with the .reverse and .sort array properties when 
 used
 with const, invariant, and final qualifiers?

They'll all fail.

I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.

This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.

Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.

+1 (and thanks for your predicate-accepting sort routine, Oskar!)

+1
 
 Also, the special casing of reversing char[] and wchar[] arrays, 
 preserving the encoded unicode code points is definitely (imho) too 
 specialized to belong in the language (runtime) as opposed to a library.

No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too?

IMHO that should not be in language. That's why I am opting for string *library* class/struct which could take care about such cases.

I agree. I tend to think there are certain things which some apps don't need, in which case they can use the 'string' alias. Other apps need to do this sort of thing and want a 'String' class to handle it. I think there is room for both in the phobos/tango libraries. The default language/library support can reverse utf8 and 16 but it's not ideal, eg. convert to utf32, reverse, convert back. ;) Regan
May 29 2007
parent reply Marcin Kuszczak <aarti interia.pl> writes:
Regan Heath wrote:

 The default language/library support can reverse utf8 and 16 but it's=

 ideal, eg. =C2=A0convert to utf32, reverse, convert back. ;)
=20
 Regan

I am not sure what do you mean with this sentence...=20 dstring implementation doesn't do things according to your description,= so it's definitely not a case here... --=20 Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/d= oost/ -------------------------------------
May 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 The default language/library support can reverse utf8 and 16 but it's not
 ideal, eg.  convert to utf32, reverse, convert back. ;)
 
 Regan

I am not sure what do you mean with this sentence... dstring implementation doesn't do things according to your description, so it's definitely not a case here...

I'm lost, what is "dstring"? All I meant was that using std.utf you can say: char[] text = "<characters which take more than 1 char to represent>"; text = toUTF8(toUTF32(text).reverse); and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan Heath
May 29 2007
next sibling parent reply "Aziz K." <aziz.kerim gmail.com> writes:
On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

I think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a). writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeC
May 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

I think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).

Is it a compound character in UTF32?
 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

Can you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable. Regan
May 29 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Regan Heath wrote:
 Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).

Is it a compound character in UTF32?

Unicode defines multiple valid encodings for lots of accented characters; typically a single codepoint as well as separate codepoints for the accent and the "naked" character that combine when put together.
 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

Can you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.

I don't think std.utf.toUTF* combine or split accented characters, I'm pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).
May 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Frits van Bommel Wrote:
 Regan Heath wrote:
 Aziz K. Wrote:
 On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz>  
 wrote:
 and the result will be a correctly reversed UTF8 string.  Or am I  
 missing something?

 Regan Heath

For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).

Is it a compound character in UTF32?

Unicode defines multiple valid encodings for lots of accented characters; typically a single codepoint as well as separate codepoints for the accent and the "naked" character that combine when put together.

I realise that. But, the important question is what does toUTF32 do with compound UTF8 characters (or UTF16 for that matter)?
 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

Can you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.

I don't think std.utf.toUTF* combine or split accented characters, I'm pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).

This is the key issue. I was under the (perhaps mistaken) impression it converted them to the single codepoint version (as that was easier), which is what I based this idea on. Really a simple test should tell us, can you whip one up to prove it one way or the other? I would, but I don't really use unicode at all and I don't know any compound characters offhand. I know, I know, I could google it but I also get the impression you know a bit more about this and would be able to devise a better test case, or two. Ahh.. another thought. I think I may have based my assumption on the foreach behaviour, eg. char[] text = "<compund stuff>"; foreach(dchar d; text) { .. } this _has_ to give the single codepoint versions, right? I suspect foreach uses the same code as in std.utf, but I may be wrong. Regan Heath
May 29 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Regan Heath wrote:
 Frits van Bommel Wrote:
 Regan Heath wrote:
 Aziz K. Wrote:
 writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
 // This would print áeC

My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.

pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).

This is the key issue. I was under the (perhaps mistaken) impression it converted them to the single codepoint version (as that was easier), which is what I based this idea on. Really a simple test should tell us, can you whip one up to prove it one way or the other?

--- import std.stdio; import std.utf; void main(char[][] args) { // Codepoint 0301 is "Combining acute accent". // Codepoint 00e9 is "Latin small letter e with acute" char[] str = "e\u0301 \u00e9"; // This doesn't show the combined character on my console. // Perhaps my terminal doesn't properly support combining characters. // (My encoding is utf-8, so that shouldn't be the problem) // The precomposed character (00e9) is displayed properly. // When piped to a .html file and wrapped with // <html><body>...</body></html> firefox properly displays both. writefln(str); foreach (dchar c; str) { writef("%04x ", c); } writefln(); // This produces the exact same output as above code: dchar[] dstr = toUTF32(str); writefln(dstr); foreach (dchar c; dstr) { writef("%04x ", c); } writefln(); } ---
 I would, but I don't really use unicode at all and I don't know any compound
characters offhand.  I know, I know, I could google it but I also get the
impression you know a bit more about this and would be able to devise a better
test case, or two.

I normally have little use for it as well. A few Dutch (my native tongue) words need accents, but I'll be damned if I know the codes. Let alone those of any combining characters. My usual way of typing those is either using the symbol map or just typing it without accents, right-click, select spell-check suggestion with accents :). However, for above test I just looked up the codes in the code charts on the unicode website (unicode.org/charts for the precomposed character and the "symbols and punctuation" link at the top for the combining accent). It's pretty easy to find, actually.
 Ahh.. another thought.  I think I may have based my assumption on the foreach
behaviour, eg.
 
 char[] text = "<compund stuff>";
 foreach(dchar d; text) { .. }
 
 this _has_ to give the single codepoint versions, right?

As demonstrated above, it doesn't. The runtime support for the converting foreach statements just imports std.utf and use decode and toUTF*[1] (as well as some manual conversion to surrogates in the functions dealing with wchar). None of those do anything other than decoding and encoding single codepoints. [1]: The apparently undocumented (buf, dchar) overloads, which don't allocate.
 I suspect foreach uses the same code as in std.utf, but I may be wrong.

About this, you're not :P. I suspect the reason std.utf doesn't do decomposition and/or combining is that it would require a lookup table, and possibly quite a big one at that. Though generating it shouldn't be a problem; it could be trivially extracted from the machine-readable data on the unicode website. Just take http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, the sixth column is the decomposition of the character in the first column. (It may also contain the mapping type between <angle brackets>) Note that for full decomposition this mapping needs to be applied recursively[2], i.e. the characters in the 6th column need to be decomposed as well (if possible). [2]: See the reminder in http://www.unicode.org/Public/UNIDATA/UCD.html#Character_Decomposition_Mappings
May 30 2007
parent Regan Heath <regan netmail.co.nz> writes:
Thanks for this.  It appears you're right :)

I can't get my console to show them either, which is annoying.  I'm on windows,
I set the font to lucida console and typed "chcp 65001" which makes the
precomposed character appear corrently but not the combining character.

Regan Heath
May 31 2007
prev sibling parent reply Marcin Kuszczak <aarti interia.pl> writes:
Regan Heath wrote:

 Marcin Kuszczak Wrote:
 Regan Heath wrote:
=20
 The default language/library support can reverse utf8 and 16 but i=



 not ideal, eg. =C2=A0convert to utf32, reverse, convert back. ;)
=20
 Regan

I am not sure what do you mean with this sentence... =20 dstring implementation doesn't do things according to your descripti=


 so it's definitely not a case here...

I'm lost, what is "dstring"? =20 All I meant was that using std.utf you can say: =20 char[] text =3D "<characters which take more than 1 char to represent= "; =20 text =3D toUTF8(toUTF32(text).reverse); =20 and the result will be a correctly reversed UTF8 string. Or am I mis=

 something?
=20
 Regan Heath

dstring is implementation of string struct by Chris Miller which takes = care about slicing utf8 sequences and is compatible with char[], wchar[] and= dchar[]. I mentioned it because I think that it's better when foreach k= now nothing about slicing utf8 sequence (opposite to way it is implemented currently). It should be responsibility of string class (like e.g. dstr= ing) with proper opApply method. Because my previous e-mail was in context o= f dstring, I haven't understood what did you mean... 'reverse' and 'sort'= could be also implemented in such class in a way which will cope proper= ly with utf8 sequences... http://www.digitalmars.com/d/archives/digitalmars/D/announce/New_string= _implementation_dstring_1.0_4886.html http://www.dprogramming.com/dstring.php --=20 Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/d= oost/ -------------------------------------
May 29 2007
parent Regan Heath <regan netmail.co.nz> writes:
Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 Marcin Kuszczak Wrote:
 Regan Heath wrote:
 
 The default language/library support can reverse utf8 and 16 but it's
 not ideal, eg.  convert to utf32, reverse, convert back. ;)
 
 Regan

I am not sure what do you mean with this sentence... dstring implementation doesn't do things according to your description, so it's definitely not a case here...

I'm lost, what is "dstring"? All I meant was that using std.utf you can say: char[] text = "<characters which take more than 1 char to represent>"; text = toUTF8(toUTF32(text).reverse); and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan Heath

dstring is implementation of string struct by Chris Miller which takes care about slicing utf8 sequences and is compatible with char[], wchar[] and dchar[]. I mentioned it because I think that it's better when foreach know nothing about slicing utf8 sequence (opposite to way it is implemented currently). It should be responsibility of string class (like e.g. dstring) with proper opApply method. Because my previous e-mail was in context of dstring, I haven't understood what did you mean... 'reverse' and 'sort' could be also implemented in such class in a way which will cope properly with utf8 sequences...

Ahh, thanks, that clears up the confusion I had. Yes, a string class/struct could definately handle the codepoint issue. It would also be able to handle it better than the method I suggested, which is a brute force method based on an assumption which may prove to be false (I suspect toUTF32 it converts UTF8 and 16 to non-compound UTF32 in all cases. But I could be wrong) But to respond to your original point (which I didn't address earlier, sorry) I have no problem with the foreach behaviour: char[] text = "<compound characters>"; foreach(dchar c; text) { .. } because, I suspect, the code which handles this is in std.utf (toUTF32) already. You seem to want to move the behaviour to a string class, but why can't it exist in both places? I guess the problem you might have with it is that it effectively says to someone implementing a D compiler: You need to handle conversions from/to UTF8, 16 and 32 and (assuming I am correct about toUTF32) you need to convert UTF8 and 16 to non-compound UTF32. Which might make it harder for someone to implement a D compiler. I don't know. Regan Heath
May 29 2007
prev sibling parent reply Reiner Pope <some address.com> writes:
Walter Bright wrote:
 Under the new const/invariant/final regime, what are strings going to be 
 ? Experience with other languages suggest that strings should be 
 immutable. To express an array of const chars, one would write:
 
     const(char)[]
 
 but while that's clear, it doesn't just flow off the keyboard. Strings 
 are so common this needs an alias, so:
 
     alias const(char)[] cstring;
 
 Why cstring? Because 'string' appears as both a module name and a common 
 variable name. cstring also implies wstring for wchar strings, and 
 dstring for dchars.
 
 String literals, on the other hand, will be invariant (which means they 
 can be stuffed into read-only memory). So,
     typeof("abc")
 will be:
     invariant(char)[3]
 
 Invariants can be implicitly cast to const.
 
 In my playing around with source code, using cstring's seems to work out 
 rather nicely.
 
 So, why not alias cstring to invariant(char)[] ? That way strings really 
 would be immutable. The reason is that mutables cannot be implicitly 
 cast to invariant, meaning that there'd be a lot of casts in the code. 
 Casts are a sledgehammer, and a coding style that requires too many 
 casts is a bad coding style.

Perhaps I should just wait for the implementation, but I'm interested in knowing what your solution to .dup is. Given auto foo = "hello".dup; what is the type of foo? How do you support both of invariant char[] foo = "hello".dup; char[] bar = "hello".dup; -- Reiner
May 27 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Reiner Pope wrote:
 Perhaps I should just wait for the implementation, but I'm interested in 
 knowing what your solution to .dup is. Given
 
    auto foo = "hello".dup;
 
 what is the type of foo?

Most likely a plain (mutable) char[].
 How do you support both of
 
    invariant char[] foo = "hello".dup;
    char[] bar = "hello".dup;

Likely the first will be an error as written, requiring a cast(invariant) to be inserted. Of course, since it doesn't make much sense to .dup in the example above ("hello" is already invariant, and copying an invariant array but not modifying the copy isn't typically useful) that shouldn't be much of a problem in this case. For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.
May 28 2007
next sibling parent reply Reiner Pope <some address.com> writes:
Frits van Bommel wrote:
 Reiner Pope wrote:
 Perhaps I should just wait for the implementation, but I'm interested 
 in knowing what your solution to .dup is. Given

    auto foo = "hello".dup;

 what is the type of foo?

Most likely a plain (mutable) char[].
 How do you support both of

    invariant char[] foo = "hello".dup;
    char[] bar = "hello".dup;

Likely the first will be an error as written, requiring a cast(invariant) to be inserted. Of course, since it doesn't make much sense to .dup in the example above ("hello" is already invariant, and copying an invariant array but not modifying the copy isn't typically useful) that shouldn't be much of a problem in this case. For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.

first thought about it, I thought that such a construct would be very useful and very powerful, but I can't actually think of any use cases except with .dup and other constructor-type functions. (Although supporting them should alone be enough motivation).
May 29 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Reiner Pope wrote:
 Frits van Bommel wrote:
 For other cases though, I could see how a "unique" (or similar) type 
 constructor that would allow implicit conversion to both mutable and 
 invariant (and const) types could be useful.
 For instance, if the strings in your example were replaced by mutable 
 arrays, a "unique char[]" return value of .dup could then be assigned 
 to mutable/const/invariant references without needing casts.


I'm pretty sure this has been suggested in these newsgroups in the past, including using "unique" as the keyword.
 When I 
  first thought about it, I thought that such a construct would be very 
 useful and very powerful, but I can't actually think of any use cases 
 except with .dup and other constructor-type functions. (Although 
 supporting them should alone be enough motivation).

Some use cases I can think of: * Obviously, builtin array property .dup, as you mentioned. * std.utf.toUTF* (except the non-converting ones such as char[] -> char[]) * The result of certain operator overloads (arithmetic in a bignum class, opCat in a string class, the result of the builtin ~ operator for arrays) * Lots of stuff in std.string: join, split, maketrans, all the toString overloads, format, succ, abbrev. (AFAIK all of these are guaranteed to return a unique array) * toString overloads for classes that return the result of any of the above[1] (especially builtin ~ and std.string.format are often useful in toString, in my experience). As you can see, there are plenty of cases where newly allocated objects or arrays are returned. [1]: This one would require the ability to add "unique" in an overridden method, since it's a bad idea to require it of all classes. This could be considered to fall under the category of covariant return values.
May 29 2007
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Frits van Bommel wrote:
 For other cases though, I could see how a "unique" (or similar) type 
 constructor that would allow implicit conversion to both mutable and 
 invariant (and const) types could be useful.
 For instance, if the strings in your example were replaced by mutable 
 arrays, a "unique char[]" return value of .dup could then be assigned to 
 mutable/const/invariant references without needing casts.

We really tried to figure out a way to make "unique" work. It just doesn't offer anything useful over a cast(invariant). The way to create an invariant out of data is to use cast(invariant). As with all casts, one has to trust the programmer to use it appropriately. After it is cast, the type system will handle enforcement. You'll be able to cast away invariant, too, but you're on your own if you do so.
Jun 02 2007