digitalmars.D.announce - string types: const(char)[] and cstring
- Walter Bright (23/23) May 25 2007 Under the new const/invariant/final regime, what are strings going to be...
- Daniel Keep (19/50) May 25 2007 Thanks for the update; I'm happy to have const strings, and use char[]
- Walter Bright (8/12) May 25 2007 const(char)[] => array of const characters
- Myron Alexander (2/15) May 25 2007 Looking mighty fine.
- Walter Bright (13/29) May 25 2007 I like it a lot better than the C++ "here a const, there a const,
- Howard Berkey (2/33) May 25 2007
- Myron Alexander (7/9) May 25 2007 When I first read Walter's post, I also thought null-terminated strings....
- Myron Alexander (3/18) May 25 2007 Here's a possibility:
- Bill Baxter (30/61) May 25 2007 So basically most functions that take a char[] now would be changed to
- Walter Bright (5/8) May 26 2007 If you want to reassign another value, yes. I suggest:
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/15) May 26 2007 I think it would be a problem at the top of the namespace,
- Leandro Lucarella (13/14) May 26 2007 What about "text"?
- Reiner Pope (14/25) May 25 2007 The thing I don't get about this syntax is what happens when you take
- Walter Bright (3/7) May 26 2007 The difference is when they are reference types, such as arrays of const...
- Daniel Keep (14/44) May 26 2007 This is what I'm wondering; I thought const and invariant only applied
- Walter Bright (6/9) May 26 2007 If you know C++, then const(char)* is the same as:
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/15) May 26 2007 I think cstring is a horrible name. "string" is much better, and in use.
- Chris Miller (15/30) May 26 2007 I agree, except I don't care much for "str". I'd prefer it named string....
- Marcin Kuszczak (40/44) May 26 2007 Yup. That's my opinion also...
- renoX (9/54) May 27 2007 I agree with you, I don't think that the string should be a char[]
- Regan Heath (3/11) May 27 2007 I think the class you describe would be useful, but only for certain typ...
- renoX (7/28) May 27 2007 Sure, but this makes the code less portable (or less efficient when it's...
- Regan Heath (8/32) May 28 2007 No, sadly they aren't. Most existing applications these days deal with ...
- Derek Parnell (40/62) May 26 2007 We seem to have different experience. Most of the code I write deals wit...
- Marcin Kuszczak (11/19) May 26 2007 The same here. I don't have much experience with Java and really don't k...
- Johan Granberg (11/27) May 26 2007 In my experience they are not really usefull at all (const as in constan...
- Bill Baxter (4/18) May 26 2007 Ditto here. When I've used java I found it more annoying that strings
- gareis (14/32) May 26 2007 I found it more bothersome by far that Integer, Float, etc were immutabl...
- Walter Bright (4/6) May 26 2007 You're welcome. Different languages offer different pieces, only D will
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/11) May 27 2007 When using Java (and Objective-C), I've found it very useful that
- Walter Bright (3/5) May 27 2007 Being able to treat strings as value types is where the big
- Kirk McDonald (22/37) May 26 2007 It might also be educational to look at Python, which also has immutable...
- Walter Bright (8/17) May 26 2007 You'll still be able to concatenate and slice invariant strings. You can...
- Reiner Pope (14/22) May 26 2007 Will there be something in the type system which enables you to safely
- Walter Bright (8/23) May 26 2007 Safely? No. You will be able to explicitly cast to invariant, however,
- Chris Nicholson-Sauls (7/37) May 26 2007 That's an interesting syntax, casting to a trait/attribute with the rest...
- Walter Bright (2/8) May 26 2007 Both.
- Reiner Pope (18/30) May 26 2007 I must have misunderstood what scope specifies. I had thought that, to
- Walter Bright (2/4) May 26 2007 Sadly, it currently isn't enforced.
- Derek Parnell (39/62) May 26 2007 While that is interesting, it has not much to do with what I was saying.
- Walter Bright (43/97) May 27 2007 I'm going to argue that your experience is unusual. I do a lot of string...
- Derek Parnell (83/127) May 27 2007 On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote:
- Walter Bright (31/85) May 27 2007 Where you're going wrong is that there are two parts to a dynamic array
- Derek Parnell (47/110) May 27 2007 I know that you know that I know this about arrays already (did I really
- Charles D Hixson (18/155) Jun 10 2007 FWIW, I feel the documentation is going to need LOTS of
- Regan Heath (17/22) May 26 2007 I like it all, except the alias. I would prefer 'string'. 'cstring' im...
- Sam Phillips (4/11) May 26 2007
- janderson (8/19) May 26 2007 [snip]
- Derek Parnell (12/23) May 27 2007 const(char)[] // A mutable array of immutable characters?
- Walter Bright (2/9) May 27 2007 They'll all fail.
- Derek Parnell (8/17) May 27 2007 Good, but when? At run time or compile time?
- Walter Bright (3/18) May 27 2007 compile time.
- Derek Parnell (11/24) May 27 2007 But didn't you say that "invariant char[]" means that "invariant" applie...
- Walter Bright (3/4) Jun 02 2007 There isn't one. Such a construct is appealing in the abstract, but I
- Tom S (12/17) Jun 02 2007 Are we only talking strings here or general arrays? Because if general
- Walter Bright (4/19) Jun 02 2007 We can all come up with an example, the more interesting case is is it a...
- Tom S (6/8) Jun 02 2007 Well, it's based on a true story.
- Derek Parnell (12/35) Jun 02 2007 Define 'compelling'.
- Sean Kelly (7/30) Jun 03 2007 Most array algorithms would apply. But I'm still not sure I see the
- Jarrett Billingsley (8/13) Jun 03 2007 If that array is pointing into some meaningful area of memory (like in t...
- Sean Kelly (5/20) Jun 03 2007 Well yeah. I don't personally think this is a problem because it
- Walter Bright (1/1) Jun 04 2007 Started new thread with reply: resizeable arrays: T[new]
- Bruno Medeiros (9/14) Jun 03 2007 What, there isn't one? Isn't that what final does? Like this:
- Walter Bright (3/16) Jun 04 2007 Final only works at the outermost level. There is no way to have a
- noSpam (3/13) May 27 2007 I think it's better to return reversed/sorted copy. This will make such
- Myron Alexander (3/17) May 27 2007 This makes sense. For immutable arrays, the definition should drop "in
- Oskar Linde (9/27) May 27 2007 Which would be very confusing. This is instead a perfect opportunity to
- Myron Alexander (4/14) May 28 2007 I see your point and agree.
- Bill Baxter (5/35) May 28 2007 No opinion there. What about the special code-point-at-a-time foreach
- Aarti_pl (7/44) May 29 2007 IMHO that should not be in language. That's why I am opting for string
- Regan Heath (4/48) May 29 2007 I agree. I tend to think there are certain things which some apps don't...
- Marcin Kuszczak (14/18) May 29 2007 I am not sure what do you mean with this sentence...=20
- Regan Heath (7/18) May 29 2007 I'm lost, what is "dstring"?
- Aziz K. (8/11) May 29 2007 I think your method doesn't take compound characters into account.
- Regan Heath (5/19) May 29 2007 Can you code that test up (using the \U character literal syntax so that...
- Frits van Bommel (7/27) May 29 2007 Unicode defines multiple valid encodings for lots of accented
- Regan Heath (10/39) May 29 2007 This is the key issue. I was under the (perhaps mistaken) impression it...
- Frits van Bommel (57/78) May 30 2007 ---
- Regan Heath (3/3) May 31 2007 Thanks for this. It appears you're right :)
- Marcin Kuszczak (29/55) May 29 2007 on,
- Regan Heath (9/46) May 29 2007 Ahh, thanks, that clears up the confusion I had. Yes, a string class/st...
- Reiner Pope (9/40) May 27 2007 Perhaps I should just wait for the implementation, but I'm interested in...
- Frits van Bommel (14/24) May 28 2007 Likely the first will be an error as written, requiring a
- Reiner Pope (6/34) May 29 2007 Funny, that's just what I thought of (including the name unique). When I...
- Frits van Bommel (20/33) May 29 2007 I'm pretty sure this has been suggested in these newsgroups in the past,...
- Walter Bright (8/14) Jun 02 2007 We really tried to figure out a way to make "unique" work. It just
Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars. String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const. In my playing around with source code, using cstring's seems to work out rather nicely. So, why not alias cstring to invariant(char)[] ? That way strings really would be immutable. The reason is that mutables cannot be implicitly cast to invariant, meaning that there'd be a lot of casts in the code. Casts are a sledgehammer, and a coding style that requires too many casts is a bad coding style.
May 25 2007
Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars. String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const. In my playing around with source code, using cstring's seems to work out rather nicely. So, why not alias cstring to invariant(char)[] ? That way strings really would be immutable. The reason is that mutables cannot be implicitly cast to invariant, meaning that there'd be a lot of casts in the code. Casts are a sledgehammer, and a coding style that requires too many casts is a bad coding style.Thanks for the update; I'm happy to have const strings, and use char[] manually when I want to mutate something. One question though: are the parens necessary? I was under the impression that const and invariant applied to reference types, so it would be const char[] or const(char[]), since char by itself is just a value type. ...this is going to turn into one of those mega threads where we all run around in circles trying to work out which one is which, isn't it? -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
May 25 2007
Daniel Keep wrote:One question though: are the parens necessary? I was under the impression that const and invariant applied to reference types, so it would be const char[] or const(char[]), since char by itself is just a value type.const(char)[] => array of const characters const char[] => const array of const characters const(char[]) => const array of const characters Think of const as if it were a template: Const!(T) which returns a const version of its argument. const without any parens means it applies to the whole type.
May 25 2007
Walter Bright wrote:Daniel Keep wrote: const(char)[] => array of const characters const char[] => const array of const characters const(char[]) => const array of const characters Think of const as if it were a template: Const!(T) which returns a const version of its argument. const without any parens means it applies to the whole type.Looking mighty fine.
May 25 2007
Myron Alexander wrote:Walter Bright wrote:I like it a lot better than the C++ "here a const, there a const, everywhere a const const" like: const char * const * const p; etc. instead of: const(char**) p; Const in D is transitive, so const(char**) is equivalent to: const(const(const(char)*)*) And no, it is not possible to have a pointer to const pointer to mutable. It is both not possible syntactically to declare it, nor is it semantically allowed. You can force the issue with casts (which allow you to do whatever you *need* to do), but the result will be undefined behavior.Daniel Keep wrote: const(char)[] => array of const characters const char[] => const array of const characters const(char[]) => const array of const characters Think of const as if it were a template: Const!(T) which returns a const version of its argument. const without any parens means it applies to the whole type.Looking mighty fine.
May 25 2007
Nice idea. I am only concerned that people will see "cstring" and think "null-terminated "C" string". Not that that should be a deciding factor by any means of course. Walter Bright Wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars. String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const. In my playing around with source code, using cstring's seems to work out rather nicely. So, why not alias cstring to invariant(char)[] ? That way strings really would be immutable. The reason is that mutables cannot be implicitly cast to invariant, meaning that there'd be a lot of casts in the code. Casts are a sledgehammer, and a coding style that requires too many casts is a bad coding style.
May 25 2007
Howard Berkey wrote:Nice idea. I am only concerned that people will see "cstring" and think "null-terminated "C" string". Not that that should be a deciding factor by any means of course.When I first read Walter's post, I also thought null-terminated strings. I even had it as an alias for toString (converting C string to char[]) as a means to get around the name conflict with Object but I shortened it to "str". I cannot think of another name but "cstring" will cause confusion and defeats the "obvious" rule.
May 25 2007
Myron Alexander wrote:Howard Berkey wrote:Here's a possibility: Instead of cstring, wstring, dstring - charstr, widestr, dblstr.Nice idea. I am only concerned that people will see "cstring" and think "null-terminated "C" string". Not that that should be a deciding factor by any means of course.When I first read Walter's post, I also thought null-terminated strings. I even had it as an alias for toString (converting C string to char[]) as a means to get around the name conflict with Object but I shortened it to "str". I cannot think of another name but "cstring" will cause confusion and defeats the "obvious" rule.
May 25 2007
Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars. String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const. In my playing around with source code, using cstring's seems to work out rather nicely. So, why not alias cstring to invariant(char)[] ? That way strings really would be immutable. The reason is that mutables cannot be implicitly cast to invariant, meaning that there'd be a lot of casts in the code. Casts are a sledgehammer, and a coding style that requires too many casts is a bad coding style.So basically most functions that take a char[] now would be changed to take a cstring in your thinking? Is it also correct to say that cstring would be used in the places where one would use const char* or const std::string& in C++? If so that sounds ok to me. But about the naming ... I have to agree that my first thought was "C compatible null terminated string" too, like std::string's .c_str() method in C++. I can probably live with that but I don't like the inconsistency with c/w/d. Plain 'string' really does make the most sense. plain 'w' 'd' ======= ===== ===== char wchar dchar string wstring dstring It wouldn't be quite as bad if you uniformly apply the 'c' to all of them (using 'c' as a flag for constness): plain 'w' 'd' ======= ===== ===== char wchar dchar cstring wcstring dcstring or cstring cwstring cdstring Some people already alias char[] to string. As far as I've heard they haven't run into conflicts with the module name, or with people naming variables 'string'. Question: if you have an alias like alias char[] string; 'const string' automatically applies const to both the char and the [], right? Is that something to be worried about? --bb
May 25 2007
Bill Baxter wrote:'const string' automatically applies const to both the char and the [], right?Right.Is that something to be worried about?If you want to reassign another value, yes. I suggest: const(char)[] instead.
May 26 2007
Bill Baxter wrote:Some people already alias char[] to string. As far as I've heard they haven't run into conflicts with the module name, or with people naming variables 'string'.I think it would be a problem at the top of the namespace, but it's OK if you use (for instance) "wx.common.string": module wx.common; alias char[] string; Then you can do declarations like: string string = "string"; At least that's how it has been working for the last couple of years, and for Christopher E. Miller's dstring.d as well: module dstring; struct string { ... } --anders
May 26 2007
Bill Baxter, el 26 de mayo a las 14:59 me escribiste:Plain 'string' really does make the most sense.What about "text"? Please see "The 'string' types" here[1] for an explanation. [1] http://xlr.sourceforge.net/concept/diverge.html -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ .------------------------------------------------------------------------, \ GPG: 5F5A8D05 // F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05 / '--------------------------------------------------------------------' En la calle me crucé con un señor muy correcto, que habitualmente anda en Falcon; iba corriendo con dos valijas en la mano y dijo: "Voy para Miami, tiene algún mensaje o ..." y le dije: "No, no, no..." -- Extra Tato (1983, Triunfo de AlfonsÃn)
May 26 2007
Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[]...String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3]The thing I don't get about this syntax is what happens when you take off the []. 1. invariant(char) c = 'b'; // c is 'b' now, and will never change. 2. final(char) d = 'b'; // but calling it final means the same... 3. const(char) e = 'b'; // ummm... what? It seems like const(char) is a constant char -- one that can't change. Does that make final obsolete? Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)? -- Reiner
May 25 2007
Reiner Pope wrote:Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)?The difference is when they are reference types, such as arrays of const char, or arrays of invariant chars.
May 26 2007
Reiner Pope wrote:Walter Bright wrote:This is what I'm wondering; I thought const and invariant only applied to reference types (which is why we have final as storage const), in which case, const(char)[] doesn't make any sense... -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[]....String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3]The thing I don't get about this syntax is what happens when you take off the []. 1. invariant(char) c = 'b'; // c is 'b' now, and will never change. 2. final(char) d = 'b'; // but calling it final means the same... 3. const(char) e = 'b'; // ummm... what? It seems like const(char) is a constant char -- one that can't change. Does that make final obsolete? Also, I can't see any difference between const(char) and invariant(char), since neither can ever be rebound. In that case, if I assume that they are identical types, how can an array of const(char) be different from an array of invariant(char)? -- Reiner
May 26 2007
Daniel Keep wrote:This is what I'm wondering; I thought const and invariant only applied to reference types (which is why we have final as storage const), in which case, const(char)[] doesn't make any sense...If you know C++, then const(char)* is the same as: const char* p; // C++ and const(char*) is the same as: const char * const p; // C++ (using * because C++ doesn't have dynamic arrays)
May 26 2007
Walter Bright wrote:Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.I think cstring is a horrible name. "string" is much better, and in use. (else wouldn't those be wcstring and dcstring or cwstring and cdstring?) That it is made up of constant characters, and that those aren't really characters but instead UTF-8 code units is something that can be hidden. alias const(char)[] string; But "cstring" both sounds awkward, and also leads the mind to C strings. Even if those (char*) would probably be "stringz" in the usual D lingo. If any name conflict with previously existing "string" must be avoided, then "str" is probably a better name... (character->char, integer->int) As was discussed earlier. --anders
May 26 2007
On Sat, 26 May 2007 04:35:34 -0400, Anders F Björklund <afb algonet.se> wrote:Walter Bright wrote:I agree, except I don't care much for "str". I'd prefer it named string. If it's an alias in object.d and not a keyword, it shouldn't be too bad. Actually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to. While on strings, I'll mention another problem I have with D's string handling. "invalid utf8 sequence" (or, if you prefer, "4invalid utf8 sequence"). Other Unicode implementations I've used do not throw such an exception, but interpret the bad parts as replacement characters (U+FFFD). I believe I've also heard that the Unicode standard also recommends being forgiving in this aspect. - ChrisWhy cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.I think cstring is a horrible name. "string" is much better, and in use. (else wouldn't those be wcstring and dcstring or cwstring and cdstring?) That it is made up of constant characters, and that those aren't really characters but instead UTF-8 code units is something that can be hidden. alias const(char)[] string; But "cstring" both sounds awkward, and also leads the mind to C strings. Even if those (char*) would probably be "stringz" in the usual D lingo. If any name conflict with previously existing "string" must be avoided, then "str" is probably a better name... (character->char, integer->int) As was discussed earlier. --anders
May 26 2007
Chris Miller wrote:Actually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to.Yup. That's my opinion also... For me advantages of such a string are quite obvious: 1. Easy slicing and indexing of utf8 sequences (without corrupting this sequence - as mention above) 2. Common denominator for char[], wchar[] and dchar[] 3. For classes which doesn't need speed it simplifies API (only one version of functions instead of 3) 4. With some additional support from language (cast operators to different types and opImplicitCast) it can be fully interchangeable with every method taking char[], wchar[], dchar[]. Having another 3 names for string is not very appealing for me. We would have 9 official versions of string available in D: char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char), tango String!(wchar), tango String!(dchar) To write nice, fully functional library you have to write 3 versions of every function which takes different string types (I know, templates makes it a little bit easier). Probably I will not be wrong when I say that reality is that people just write one version for char[], because it is convenient (see: SWT ported from Java). It causes that wchar and dchar are treated as second class citizens in D. Additionally when people design their program for char[], they mostly don't think about issues with slicing of char[] utf8 sequence (warning! assumption!), so default way of writing programs is *NOT SAFE*. When you write code and don't care about bare metal speed it is just tedious to do this additional work... Having one string, which hides differences between char[], wchar[] and dchar[] would solve problem nicely. Adding constness would also be easy. And you use only one reserved keyword - string - for everything. I would be happy to hear some other opinions from people on NG. Maybe I am wrong with above arguments, so probably someone can give counterarguments... I think it is very important issue as it seems that most developers over the world are non-native-english-speakers... PS. See also thread on DWT NG. -- Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/ -------------------------------------
May 26 2007
Marcin Kuszczak a écrit :Chris Miller wrote:I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default. The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost). renoXActually, while we're at a change for strings, why not bring in something similar to my dstring module, where slicing and indexing never result in an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the code may not be ideal, but it's the concept I'm referring to.Yup. That's my opinion also... For me advantages of such a string are quite obvious: 1. Easy slicing and indexing of utf8 sequences (without corrupting this sequence - as mention above) 2. Common denominator for char[], wchar[] and dchar[] 3. For classes which doesn't need speed it simplifies API (only one version of functions instead of 3) 4. With some additional support from language (cast operators to different types and opImplicitCast) it can be fully interchangeable with every method taking char[], wchar[], dchar[]. Having another 3 names for string is not very appealing for me. We would have 9 official versions of string available in D: char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char), tango String!(wchar), tango String!(dchar) To write nice, fully functional library you have to write 3 versions of every function which takes different string types (I know, templates makes it a little bit easier). Probably I will not be wrong when I say that reality is that people just write one version for char[], because it is convenient (see: SWT ported from Java). It causes that wchar and dchar are treated as second class citizens in D. Additionally when people design their program for char[], they mostly don't think about issues with slicing of char[] utf8 sequence (warning! assumption!), so default way of writing programs is *NOT SAFE*. When you write code and don't care about bare metal speed it is just tedious to do this additional work... Having one string, which hides differences between char[], wchar[] and dchar[] would solve problem nicely. Adding constness would also be easy. And you use only one reserved keyword - string - for everything. I would be happy to hear some other opinions from people on NG. Maybe I am wrong with above arguments, so probably someone can give counterarguments... I think it is very important issue as it seems that most developers over the world are non-native-english-speakers... PS. See also thread on DWT NG.
May 27 2007
renoX Wrote:I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default. The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost).I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCII or only one of UTF8, 16 or 32 for example) wont need the sorts of things this class provides and can get away with just using 'const(char[])' AKA 'string'. Basically I think there is a ample room for both 'string' as an alias and 'String' as a class to exist at the same time. Regan
May 27 2007
Regan Heath a écrit :renoX Wrote:Hopefully a rare thing now.I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default. The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost).I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCIIor only one of UTF8, 16 or 32 for example)Sure, but this makes the code less portable (or less efficient when it's not on its "original" OS): Windows use UTF16, Linux UTF8..wont need the sorts of things this class provides and can get away with just using 'const(char[])' AKA 'string'. Basically I think there is a ample room for both 'string' as an alias and 'String' as a class to exist at the same time.Room of course, but IMHO one should almost always use the class (except in wrappers of native calls) instead of the alias. renoXRegan
May 27 2007
renoX Wrote:Regan Heath a écrit :No, sadly they aren't. Most existing applications these days deal with ASCII or one of the strange code pages (which youd handle in D with ubyte and appropriate conversion to one of UTF8, 16 or 32 internally). Granted in the case of the code page apps you might want a String class which can be produced by a <codepage>toString() free function which leverages iconv (which is just what I suggested) However you may only want to deal with them as UTF-8 internally therefore not need the functionality provided by the class, opting instead to use 'string' directly. Sure, in the future I expect/hope people will move to UTF8, 16, and 32 but I suspect code pages will be hauting us for many years to come.renoX Wrote:Hopefully a rare thing now.I agree with you, I don't think that the string should be a char[] alias, wether it's const or not but a class with char[],dchar[],wchar[] under the hood representation and safe slicing by default. The difficulty is providing enough flexibility for managing correctly the internal representation: there should be a possibility to say use UTF8 even though there are multibyte characters for example (a size optimization with some CPU cost).I think the class you describe would be useful, but only for certain types of application. Many applications (those that deal with ASCIII think that's an invalid assertion, specifically your use of the word 'always'. There are 'almost certainly' (see, my term leaves room for me to be wrong) many cases where the alias would be preferred, most likely for performance reasons, espeically if the added functionality isn't required. In other words, all I'm saying is; sometimes you want it, sometimes you don't. Both can exist, both can be used and both should be interchangable (without too much trouble). Reganwont need the sorts of things this class provides and can get away with just using 'const(char[])' AKA 'string'. Basically I think there is a ample room for both 'string' as an alias and 'String' as a class to exist at the same time.Room of course, but IMHO one should almost always use the class (except in wrappers of native calls) instead of the alias.
May 28 2007
On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.No it doesn't. I have rarely seen 'string' used as a variable. In phobos it is used in boxer.d and regexp.d only. I use it as an alias for 'char[]'. I see 'str' used fairly often but not so much 'string'. 'cstring' is pronounced C-String which instantly brings to mind the 'string' implementation used by C language. Not something I imagine you wish to imply.String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const.So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right? void func(const(char)[] x) { x = "def"; // ok x.length = 0; // ok x[0] = 'd'; // fails } And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right? void func(invariant(char)[] x) { x = "def"; // fails x.length = 0; // fails x[0] = 'd'; // ok } So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed? void func(char const([]) x) ??? { x = "def"; // fails x.length = 0; // fails x[0] = 'd' // ok } -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 26 2007
Derek Parnell wrote:The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more? -- Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/ -------------------------------------Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
May 26 2007
Marcin Kuszczak wrote:Derek Parnell wrote:In my experience they are not really usefull at all (const as in constant that is). Sometimes it does not matter and sometimes it is inconvenient or a performance problem. (it is mostly append that is needed in my experience) If function parameters was const by default (as in the new behavior of in) I see no use of immutability here. In java I think it is used to prevent aliased String objects from changing value, something that could create unexpected bugs if used by programmers not understanding aliasing. ps. although I'm no fan of java I have used it for most university assignments for the past two years, so hopfully I'm not totally wrong ;)The same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
May 26 2007
Marcin Kuszczak wrote:Derek Parnell wrote:Ditto here. When I've used java I found it more annoying that strings were immutable than anything else. --bbThe same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
May 26 2007
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleMarcin Kuszczak wrote:I found it more bothersome by far that Integer, Float, etc were immutable. Even after going through all the trouble of getting classes for all these, you couldn't use them for out or inout parameters to functions. Scratch that -- what was really annoying was that you couldn't ever *specify* how you wanted your parameter. Even in C, you can pass an address (but then, anything's possible in C). But in Java, you can only call by reference with a class or an array, so you end up doing things like: void foo(int[1] inout_parameter) { inout_parameter[0] += 5; } And the only way to get scope const final sort of deal on a class is to copy and then submit the copy as a final parameter -- it's the reference, not the data, that's final. In short, thank you, Walter, for allowing us to pass anything by reference, and by allowing the data referenced to be made read-only.Derek Parnell wrote:Ditto here. When I've used java I found it more annoying that strings were immutable than anything else. --bbThe same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
May 26 2007
gareis wrote:In short, thank you, Walter, for allowing us to pass anything by reference, and by allowing the data referenced to be made read-only.You're welcome. Different languages offer different pieces, only D will offer the whole customizable shebang. The idea is for programs to be more self-documenting, and so make automated analysis more feasible.
May 26 2007
Bill Baxter wrote:When using Java (and Objective-C), I've found it very useful that strings (and others) are immutable since they are then thread-safe. --andersThe same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?Ditto here. When I've used java I found it more annoying that strings were immutable than anything else.
May 27 2007
Anders F Björklund wrote:When using Java (and Objective-C), I've found it very useful that strings (and others) are immutable since they are then thread-safe.Being able to treat strings as value types is where the big simplification (in user code) comes, and invariant strings should do that.
May 27 2007
Marcin Kuszczak wrote:Derek Parnell wrote:It might also be educational to look at Python, which also has immutable strings. The first, and probably most important reason why strings are immutable in Python is so they can be used as hash keys. (Mutating an object being used as a hash key is bad, bad, bad.) Other reasons are addressed here: http://effbot.org/pyfaq/why-are-python-strings-immutable.htm However, Python is a very different kind of language from D. Using strings as hash keys is extraordinarily important in Python, as the use of any identifier is in essence a hash lookup. Providing immutable strings in D is very useful (so the compiler can enforce copy-on-write semantics, for instance), and I don't think anyone would dispute that. The issue seems to be whether the "default" string alias should be immutable. I would say, since D seems to subscribe to copy-on-write semantics, that it should be. And of course, if you need mutable strings, you will always be able to declare a char[]. -- Kirk McDonald http://kirkmcdonald.blogspot.com Pyd: Connecting D and Python http://pyd.dsource.orgThe same here. I don't have much experience with Java and really don't know why const strings are so usefull... Maybe someone could elaborate a little bit more?Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.
May 26 2007
Derek Parnell wrote:We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?Right.And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable. Const is only immutable through the reference - another reference to the same data can change it.So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed?final char[] x;
May 26 2007
Walter Bright wrote:Derek Parnell wrote:Will there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ? Does 'scope' happen to have anything to do with that? invariant(char)[] createJunk() { /* scope? */ char[] val = "aaaaa".dup; size_t index = rand() % 5; val[index] = rand(); return cast(invariant(char)[]) val; } I mean, do I really need to cast it to invariant there? It's easy to see that there's only one copy of val's data in existance. -- ReinerWe seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.
May 26 2007
Reiner Pope wrote:Will there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ?Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.Does 'scope' happen to have anything to do with that?No. Scope just ensures that the reference does not 'escape' the scope it's in.invariant(char)[] createJunk() { /* scope? */ char[] val = "aaaaa".dup; size_t index = rand() % 5; val[index] = rand(); return cast(invariant(char)[]) val; } I mean, do I really need to cast it to invariant there? It's easy to see that there's only one copy of val's data in existance.Easy for you to see, not so easy for the compiler to. And besides: return cast(invariant)val; will do the trick more conveniently.
May 26 2007
Walter Bright wrote:Reiner Pope wrote:That's an interesting syntax, casting to a trait/attribute with the rest of the type inferred. I presume cast(const) works as well. (Maybe cast(scope)? Then again, what's the use...) Given cast(*) where * is invariant/const, is cast(*)T[] the same as cast(*(T)[]) or cast(*(T[]))? That is, does the trait apply to the element type, or the array? -- Chris Nicholson-SaulsWill there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ?Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.Does 'scope' happen to have anything to do with that?No. Scope just ensures that the reference does not 'escape' the scope it's in.invariant(char)[] createJunk() { /* scope? */ char[] val = "aaaaa".dup; size_t index = rand() % 5; val[index] = rand(); return cast(invariant(char)[]) val; } I mean, do I really need to cast it to invariant there? It's easy to see that there's only one copy of val's data in existance.Easy for you to see, not so easy for the compiler to. And besides: return cast(invariant)val; will do the trick more conveniently.
May 26 2007
Chris Nicholson-Sauls wrote:That's an interesting syntax, casting to a trait/attribute with the rest of the type inferred. I presume cast(const) works as well. (Maybe cast(scope)? Then again, what's the use...) Given cast(*) where * is invariant/const, is cast(*)T[] the same as cast(*(T)[]) or cast(*(T[]))? That is, does the trait apply to the element type, or the array?Both.
May 26 2007
Walter Bright wrote:Reiner Pope wrote:I must have misunderstood what scope specifies. I had thought that, to avoid being escaped, scope specified that your variable may not be aliased by another (non-scope) name. In that case, I thought, can't you say: "well, when I leave this function, I'm the only one holding a reference to this data, so it would be safe to call it invariant (or anything else I choose)." I thought a compiler could have a special case saying, "at the end of scope, you can safely turn any scope variables into whatever you want". However, I was surprised to find out that the following code compiled fine, although it returns a dead object: Foo foo() { scope Foo f = new Foo(); Foo g = f; return g; } -- ReinerWill there be something in the type system which enables you to safely say, "This is the only reference to this data, so it's ok for me to make this invariant" ?Safely? No. You will be able to explicitly cast to invariant, however, the programmer will have to ensure it is safe to do so.Does 'scope' happen to have anything to do with that?No. Scope just ensures that the reference does not 'escape' the scope it's in.
May 26 2007
Reiner Pope wrote:However, I was surprised to find out that the following code compiled fine, although it returns a dead object:Sadly, it currently isn't enforced.
May 26 2007
On Sat, 26 May 2007 22:27:18 -0700, Walter Bright wrote:Derek Parnell wrote:While that is interesting, it has not much to do with what I was saying. You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed. So now I'm thinking that we are talking about different things when we use the word "string". I'm guessing you are really referring to compile-time generated string data (e.g. literals) rather than run-time generated string data.We seem to have different experience. Most of the code I write deals with changing strings - in other words, manipulating strings is very very common in the sorts of programs I write.You'll still be able to concatenate and slice invariant strings. You can also cast a char[] to an invariant, when you're done building it.Huh??? Isn't that what I just said? Now I'm even more confused about these terms. They are just not intuitive, are they?So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?Right.And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.Const is only immutable through the reference - another reference to the same data can change it.Ok ... so this below won't fail ... void func(const char[] parm) { char [] q; q = parm; q[0] = 'a'; } or is the "q = parm" not really permitted.Given the syntax on the form " void func(<X> char[] parm) ", is the table below true ... *-------------------------------------* | <X> + parm.ptr | parm[0] | |-------------+-----------------------+ | const | mutable | immutable | | final | immutable | mutable | | invariant | immutable | immutable | | | mutable | mutable | *-------------------------------------* I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on. I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellSo what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed?final char[] x;
May 26 2007
Derek Parnell wrote:You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed.I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace. But still, in D, nothing prevents you from using mutable strings.So now I'm thinking that we are talking about different things when we use the word "string". I'm guessing you are really referring to compile-time generated string data (e.g. literals) rather than run-time generated string data.I'm referring to the arrays of characters, generated or literals.No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.Huh??? Isn't that what I just said?So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?Right.And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.Now I'm even more confused about these terms. They are just not intuitive, are they?The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.error, q is not const.Const is only immutable through the reference - another reference to the same data can change it.Ok ... so this below won't fail ... void func(const char[] parm) { char [] q; q = parm;q[0] = 'a'; } or is the "q = parm" not really permitted.Right.You've got invariant wrong, it's mutable|immutable.Given the syntax on the form " void func(<X> char[] parm) ", is the table below true ... *-------------------------------------* | <X> + parm.ptr | parm[0] | |-------------+-----------------------+ | const | mutable | immutable | | final | immutable | mutable | | invariant | immutable | immutable | | | mutable | mutable | *-------------------------------------*So what syntax is to be used so that x.ptr and x.length cannot be changed but the characters referred to by 'x' can be changed?final char[] x;I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on.First differences: final is a *storage class*. const and invariant are *type constructors*. final only refers to the actual value that a symbol has, and it means that, once a value is assigned to a symbol, that value can never change. If the value is a pointer or reference, what it points to *can* be changed. int x = 3; final int* p = &x; p = null; // error, p is final *p = 1; // ok const(int)* q = null; q = &x; // ok, q is not const, and now *q is 1 *q = 2; // error, *q is const *p = 5; // ok, but now *q is 5, too! x = 6; // ok, but now *q is 6 invariant(int)* s = null; s = &x; // error, cannot implicitly convert int* to invariant(int)* int y = 4; s = cast(invariant(int)*)&y; // ok, trust programmer that y is immutable *s = 3; // error, *s is immutable y = 5; // undefined behavior, as y is never supposed to change, // and compiler assumes *s is still 4 Note that int* can be implicitly converted to const(int)*, and invariant(int)* can be implicitly converted to const(int)*.I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone.The thing is actually rather simple, but I am having trouble finding the right words to express it. Certainly, the mishmash of C++ const has badly muddied the waters about what const means.
May 27 2007
On Sun, 27 May 2007 01:09:40 -0700, Walter Bright wrote: Thanks for taking the time out to help me understand the proposed D changes. I really appreciate it. I think that I'm going to have to wait until you have an implementation to try it on; to see how it fits with my terminology and needs.Derek Parnell wrote:Ok we'll leave it that then. However the phrase "once constructed" is the key one I suspect. Its like saying, once I've finished changing things I don't want them to change anymore - no argument there. So the idea would be to work with mutable strings until they are finished being constructed and then cast them to immutable for the rest of the run time. I'm thinking here of things like changing case, macro expansion, standarizing file names, constructing message text, etc ...You said "strings should be immutable" and I saying that seems odd because my experience is that most strings are meant to be changed.I'm going to argue that your experience is unusual. I do a lot of string manipulation (after all, that's what a compiler does) and the strings, once constructed, are essentially always immutable. In conversations with many others, my experience is commonplace.But still, in D, nothing prevents you from using mutable strings.That's why I can see that I'll be continuing to use 'alias char[] string', unless you make 'string' the immutable beastie of course <g>See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever". But it seems that I'm wrong ... invariant char[] x; x = "abc".dup; // The string 'x' now contains "abc"; x = "def".dup; // The string (which is not supposed to change // i.e invariant) has been changed to "def". Now this is counter-intuitive (read: *WEIRD*), no?No. You said for const you could change x.ptr and x.length, but for invariant you could not. For both const and invariant, you can change x.ptr and x.length.Huh??? Isn't that what I just said?So 'const(char)[] x' means that I can change x.ptr and x.length but I cannot change anything that x.ptr points to, right?Right.And 'invariant(char)[] x' means that I cannot change x.ptr or x.length and I cannot change anything that x.ptr points to, right?Wrong. The difference between const and invariant is that invariant is truly, absolutely, immutable.Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G> I think I read that someone suggested that 'const' be a contraction of 'constrained' rather than 'constant' - that might help. And that 'invariant' is longer than 'const' so its effect is 'bigger'. invariant char[] x; // The data pointed to by 'x' cannot be changed // by anything anytime during the execution // of the program. // (So how do I populate it then? Hmmmm ...) const char[] y; // The data pointed to by 'y' cannot be changed // by anything anytime during the execution // of the program when using the 'y' variable, // however using another variable that also // refers to y's data, or some of it, is ok. For example ... void func (const char[] a, char[] b) { a[0] = 'a'; // fails b[0] = 'a'; // succeeds } char[] y = "def".dup; func( y, y);Now I'm even more confused about these terms. They are just not intuitive, are they?The problem is I have failed to explain them. Invariant data can go into read-only memory. Const data can be changed by another reference to the same data (just like in C++). In other words, const is a read-only *view* of the data, whereas invariant data is read-only for all views of it.Thanks. So 'final' means that it can be changed (from its initial default value) once and only once. final int r; r = randomer(); // succeeds foo(); // fails int randomer() { // Get a random integer between -100 and 100. return cast(int)(std.random.rand() % 201) - 100; } void foo() { r = randomer(); // success depends on whether or not 'r' // has already been set. } final int r; foo(); // succeeds r = randomer(); // fails int randomer() { // Get a random integer between -100 and 100. return cast(int)(std.random.rand() % 201) - 100; } void foo() { r = randomer(); // success depends on whether or not 'r' // has already been set. } Is this a run-time check or a compile time one? If run-time, would it be possible to somehow 'unfinal' a variable using some implementation dependant trickery.I'm sorry I'm a bit slow on this ... but what is the difference between "invariant" and "const final" ? Is it that "invariant" is sort of a global effect but "const final" is only in effect for the specific reference it occurs on.First differences: final is a *storage class*. const and invariant are *type constructors*.And thus my comment re editors.I'm not looking forward to reading the docs on this. I hope you get a lot of people to edit the docs to make it understandable for everyone.The thing is actually rather simple, but I am having trouble finding the right words to express it.Certainly, the mishmash of C++ const has badly muddied the waters about what const means.I have no real knowledge of C++ or its const, and I'm still weirded out by it all <G> -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
Derek Parnell wrote:See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever".Where you're going wrong is that there are two parts to a dynamic array - the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.But it seems that I'm wrong ... invariant char[] x; x = "abc".dup; // The string 'x' now contains "abc"; x = "def".dup; // The string (which is not supposed to change // i.e invariant) has been changed to "def". Now this is counter-intuitive (read: *WEIRD*), no?The first issue is that you've confused: invariant char[] x; with: invariant(char)[] x; Remember, there are TWO parts to an array, and the invariantness can be controlled for either independently, or both. This isn't different from in C++ there are two parts to a char*, the char part, and the pointer part.Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G>English is imprecise and ambiguous, that's why we have mathematical languages, and programming languages.invariant char[] x; // The data pointed to by 'x' cannot be changed // by anything anytime during the execution // of the program. // (So how do I populate it then? Hmmmm ...)You can't populate an invariant(char)[] array (which is what you meant, not invariant char[]). The way to get one is to cast an existing array to invariant.const char[] y; // The data pointed to by 'y' cannot be changed // by anything anytime during the execution // of the program when using the 'y' variable, // however using another variable that also // refers to y's data, or some of it, is ok.Yes, but here again, const(char)[].For example ... void func (const char[] a, char[] b) { a[0] = 'a'; // fails b[0] = 'a'; // succeeds } char[] y = "def".dup; func( y, y);Yup, that's the aliasing issue with const.Thanks. So 'final' means that it can be changed (from its initial default value) once and only once.No. 'final' means it is set only at initialization.final int r; r = randomer(); // succeedsNope, this fails. Try: final int r = randomer();foo(); // fails int randomer() { // Get a random integer between -100 and 100. return cast(int)(std.random.rand() % 201) - 100; } void foo() { r = randomer(); // success depends on whether or not 'r' // has already been set.No, this assignment always fails.}Is this a run-time check or a compile time one?Compile time.If run-time, would it be possible to somehow 'unfinal' a variable using some implementation dependant trickery.Yes, but the result is undefined behavior. Just like if you went around the typing system and converted an int into a pointer, and tried to access data with it. You can do it, but you're on your own with that.I have no real knowledge of C++ or its const, and I'm still weirded out by it all <G>I'm beginning to realize that unless one understands how types are represented at run time, one will never understand const.
May 27 2007
On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:Derek Parnell wrote:I know that you know that I know this about arrays already (did I really just say that!?) so I assume you are talking to the greater audience that we have here. So to immutalize (see it must be a real word as someone else is using it<g>) just the ptr/length parts I'd use ... invariant char([]) ????? char invariant[] ???? and invariant char[] is the same as invariant (char[]) right?See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever".Where you're going wrong is that there are two parts to a dynamic array - the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.In my thinking the term 'string' refers to the whole ptr/length/content group. So when one says that a string is immutable I'm thinking they are saying that every aspect of the string does not change. This is where I suspect that we are having terminology problems.But it seems that I'm wrong ... invariant char[] x; x = "abc".dup; // The string 'x' now contains "abc"; x = "def".dup; // The string (which is not supposed to change // i.e invariant) has been changed to "def". Now this is counter-intuitive (read: *WEIRD*), no?The first issue is that you've confused: invariant char[] x; with: invariant(char)[] x;Yep - guilty as charged, your honour. Actually it is not so much confusion rather just a poor typing regime, as I really did understand the difference but I typed in the wrong thing. But let's continue ...Remember, there are TWO parts to an array, and the invariantness can be controlled for either independently, or both. This isn't different from in C++ there are two parts to a char*, the char part, and the pointer part.What is the syntax for controlling *just* the reference part of an array?Has anyone got a dictionary in which "constant" and "invariant" are not synonyms? Sure I agree that "English is imprecise and ambiguous" when taken as a whole but not every word is such. So when one uses English words in a programming language the natural thing is to assume that the programming language meaning has a high degree of correlation with the English meaning.Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G>English is imprecise and ambiguous, that's why we have mathematical languages, and programming languages.char[] name; name = GetUserName(); invariant (char)[] newb = cast(invariant)name; void foo() { name[0] = toUpperCase(name[0]); } // Is this valid? foo(); // What about this?invariant char[] x; // The data pointed to by 'x' cannot be changed // by anything anytime during the execution // of the program. // (So how do I populate it then? Hmmmm ...)You can't populate an invariant(char)[] array (which is what you meant, not invariant char[]). The way to get one is to cast an existing array to invariant.Yeah yeah yeah ... I can see how an alias is going to be a boon.const char[] y; // The data pointed to by 'y' cannot be changed // by anything anytime during the execution // of the program when using the 'y' variable, // however using another variable that also // refers to y's data, or some of it, is ok.Yes, but here again, const(char)[].And initialization means "on the same statement that declares the variable"? In English, initialization means whenever some thing is initialized rather than one specific type of initialization.Thanks. So 'final' means that it can be changed (from its initial default value) once and only once.No. 'final' means it is set only at initialization.Ok, so the above "initializes" the symbol to zero, being the default value of an int, and it cannot be changed to anything else now.final int r;Got it.r = randomer(); // succeedsNope, this fails. Try: final int r = randomer();Nah, it's probably just me that's being thick, ... and I *do* understand the run-time implementation of the D constructs. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellI have no real knowledge of C++ or its const, and I'm still weirded out by it all <G>I'm beginning to realize that unless one understands how types are represented at run time, one will never understand const.
May 27 2007
Derek Parnell wrote:On Sun, 27 May 2007 12:06:06 -0700, Walter Bright wrote:FWIW, I feel the documentation is going to need LOTS of examples. The text is sufficient to point folk in the right general direction, but the examples will be necessary to highlight the minimal distinctions. And as my C++ is really quite minimal, and predates templates being generally available, I don't think I'm being confused by how C++ uses it. OTOH, I fequently need to do things like: char[] stuff = "alperferous"; stuff = stuff[0..5] ~ "if" ~ stuff[5..length]; (silly example, but it's short!) Given what I've read so far I suppose this means that I just keep avoiding const & invariant, but I do think of this as string manipulation, as thus "Strings are constant by default" sets warning bells ringing. (Probably inappropriately, admittedly. But perhaps this should be said differently in the documentation.)Derek Parnell wrote:I know that you know that I know this about arrays already (did I really just say that!?) so I assume you are talking to the greater audience that we have here. So to immutalize (see it must be a real word as someone else is using it<g>) just the ptr/length parts I'd use ... invariant char([]) ????? char invariant[] ???? and invariant char[] is the same as invariant (char[]) right?See, this is what is weird ... I can have an invariant string which can be changed, thus making it not really invariant in the English language sense. I'm still thinking that "invariant" means "does not change ever".Where you're going wrong is that there are two parts to a dynamic array - the contents of the array, and the ptr/length values of the array. invariant(char)[] immutalizes (look ma! I coined a new word!) only the contents of the array. invariant(char[]) immutalizes the contents and the ptr/length values.In my thinking the term 'string' refers to the whole ptr/length/content group. So when one says that a string is immutable I'm thinking they are saying that every aspect of the string does not change. This is where I suspect that we are having terminology problems.But it seems that I'm wrong ... invariant char[] x; x = "abc".dup; // The string 'x' now contains "abc"; x = "def".dup; // The string (which is not supposed to change // i.e invariant) has been changed to "def". Now this is counter-intuitive (read: *WEIRD*), no?The first issue is that you've confused: invariant char[] x; with: invariant(char)[] x;Yep - guilty as charged, your honour. Actually it is not so much confusion rather just a poor typing regime, as I really did understand the difference but I typed in the wrong thing. But let's continue ...Remember, there are TWO parts to an array, and the invariantness can be controlled for either independently, or both. This isn't different from in C++ there are two parts to a char*, the char part, and the pointer part.What is the syntax for controlling *just* the reference part of an array?Has anyone got a dictionary in which "constant" and "invariant" are not synonyms? Sure I agree that "English is imprecise and ambiguous" when taken as a whole but not every word is such. So when one uses English words in a programming language the natural thing is to assume that the programming language meaning has a high degree of correlation with the English meaning.Okay, I've got that now ... but how to remember that two terms that mean the same in English actually mean different things in D <G>English is imprecise and ambiguous, that's why we have mathematical languages, and programming languages.char[] name; name = GetUserName(); invariant (char)[] newb = cast(invariant)name; void foo() { name[0] = toUpperCase(name[0]); } // Is this valid? foo(); // What about this?invariant char[] x; // The data pointed to by 'x' cannot be changed // by anything anytime during the execution // of the program. // (So how do I populate it then? Hmmmm ...)You can't populate an invariant(char)[] array (which is what you meant, not invariant char[]). The way to get one is to cast an existing array to invariant.Yeah yeah yeah ... I can see how an alias is going to be a boon.const char[] y; // The data pointed to by 'y' cannot be changed // by anything anytime during the execution // of the program when using the 'y' variable, // however using another variable that also // refers to y's data, or some of it, is ok.Yes, but here again, const(char)[].And initialization means "on the same statement that declares the variable"? In English, initialization means whenever some thing is initialized rather than one specific type of initialization.Thanks. So 'final' means that it can be changed (from its initial default value) once and only once.No. 'final' means it is set only at initialization.Ok, so the above "initializes" the symbol to zero, being the default value of an int, and it cannot be changed to anything else now.final int r;Got it.r = randomer(); // succeedsNope, this fails. Try: final int r = randomer();Nah, it's probably just me that's being thick, ... and I *do* understand the run-time implementation of the D constructs.I have no real knowledge of C++ or its const, and I'm still weirded out by it all <G>I'm beginning to realize that unless one understands how types are represented at run time, one will never understand const.
Jun 10 2007
Walter Bright Wrote:alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars.I like it all, except the alias. I would prefer 'string'. 'cstring' implies C's string to me, for example I often alias std.string.toStringz to cstr or CSTR. I think wstring and dstring are ok, basically I'd like: char, string wchar, wstring dchar, dstring Is it really a problem that std.string is a module name? I don't reckon it's a very common variable name, for example: 1. I wouldn't go to the trouble of typing 'string' for a throw away variable when I could use 'p', 's', or 'str'. 2. Likewise for a more long lived variable I would use something more descriptive i.e. nameString, ageString, promptString, boundaryString, ... Slightly OT: I think once we have const etc and 'string' working as desired then for many applications there will be no need for a additional String class. Note that I said 'many applications' above; I think that those applications that make heavy use of many different text encodings and/or languages may still want a 'String' (or 'Text') class. This class would provide the extra functionality that aren't inherent in string, wstring, dstring, things like: 1. leveraging iconv (or similar) to handle various encodings. 2. choosing the best internal format string, wstring, dstring for the text based on the language used. 3. slicing on character boundaries regardless of internal format. .. and probably other things I haven't thought of here. Regan
May 26 2007
Regan Heath Wrote:I like it all, except the alias. I would prefer 'string'. 'cstring' implies C's string to me, for example I often alias std.string.toStringz to cstr or CSTR. I think wstring and dstring are ok, basically I'd like: char, string wchar, wstring dchar, dstring Is it really a problem that std.string is a module name?</lurk> Concur <lurk>
May 26 2007
Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring;[snip] If you decide on an alias it would be a good idea to add it to phobos for DMD 1, except without the const syntax of course. That way people can start using it now and have less problems upgrading to DMD 2. Although, on the other hand, it may be slightly confusing on 1.0 coders I guess when it doesn't function as a const string. -Joel
May 26 2007
On Fri, 25 May 2007 19:47:24 -0700, Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring;const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
May 27 2007
Derek Parnell wrote:const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:Derek Parnell wrote:Any comment on the above?const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters?Good, but when? At run time or compile time? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellWhat will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
Derek Parnell wrote:On Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:Looks right to me.Derek Parnell wrote:Any comment on the above?const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters?compile time.Good, but when? At run time or compile time?What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
On Sun, 27 May 2007 16:32:57 -0700, Walter Bright wrote:Derek Parnell wrote:But didn't you say that "invariant char[]" means that "invariant" applies to both the array reference and the contents? In other words its the same as "invariant (char[])" but above I said that this means that the array is immutable but the contents are not. What is the syntax for an immutable array of mutable characters? -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellOn Sun, 27 May 2007 12:14:29 -0700, Walter Bright wrote:Looks right to me.Derek Parnell wrote:Any comment on the above?const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters?
May 27 2007
Derek Parnell wrote:What is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.
Jun 02 2007
Walter Bright wrote:Derek Parnell wrote:Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example. An immutable array of mutable data for... e.g. render to texture in a software renderer (or creating data for a hw texture, or whatnot) So you basically pass a texture buffer to a function. You don't want it to realloc the buffer, just to modify its contents... What am I missing here? ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenodeWhat is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.
Jun 02 2007
Tom S wrote:Walter Bright wrote:In general.Derek Parnell wrote:Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.What is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.An immutable array of mutable data for... e.g. render to texture in a software renderer (or creating data for a hw texture, or whatnot) So you basically pass a texture buffer to a function. You don't want it to realloc the buffer, just to modify its contents... What am I missing here? ;)We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.
Jun 02 2007
Walter Bright wrote:We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.Well, it's based on a true story. -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Jun 02 2007
On Sat, 02 Jun 2007 18:25:46 -0700, Walter Bright wrote:Tom S wrote:Define 'compelling'. The only workaround I can see is bit restrictive ... final TextureBuffer t = CreateTextureBuffer(); RenderToBuffer( t ); DoLighting(t); ... -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellWalter Bright wrote:In general.Derek Parnell wrote:Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.What is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.An immutable array of mutable data for... e.g. render to texture in a software renderer (or creating data for a hw texture, or whatnot) So you basically pass a texture buffer to a function. You don't want it to realloc the buffer, just to modify its contents... What am I missing here? ;)We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.
Jun 02 2007
Walter Bright wrote:Tom S wrote:Most array algorithms would apply. But I'm still not sure I see the point of having an immutable reference, because it's just passed by value anyway. Who cares if the size of the array is modified within a function where it's not passed by reference? The change is just local to the function anyway. SeanWalter Bright wrote:In general.Derek Parnell wrote:Are we only talking strings here or general arrays? Because if general arrays are concerned, I can come up with an example.What is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.An immutable array of mutable data for... e.g. render to texture in a software renderer (or creating data for a hw texture, or whatnot) So you basically pass a texture buffer to a function. You don't want it to realloc the buffer, just to modify its contents... What am I missing here? ;)We can all come up with an example, the more interesting case is is it a compelling example? I'm not seeing that.
Jun 03 2007
"Sean Kelly" <sean f4.ca> wrote in message news:f3uoj5$1b1i$1 digitalmars.com...Most array algorithms would apply. But I'm still not sure I see the point of having an immutable reference, because it's just passed by value anyway. Who cares if the size of the array is modified within a function where it's not passed by reference? The change is just local to the function anyway.If that array is pointing into some meaningful area of memory (like in the example, a texture buffer), resizing the array could (probably would) move the array around, which I guess isn't illegal but then the function operating on the array wouldn't be accessing the correct place. Prevent them from changing the length, it prevents them from accessing anywhere but there.
Jun 03 2007
Jarrett Billingsley wrote:"Sean Kelly" <sean f4.ca> wrote in message news:f3uoj5$1b1i$1 digitalmars.com...Well yeah. I don't personally think this is a problem because it doesn't affect the callee in any way, but I can see how others might disagree. Doesn't 'final' do this now though? SeanMost array algorithms would apply. But I'm still not sure I see the point of having an immutable reference, because it's just passed by value anyway. Who cares if the size of the array is modified within a function where it's not passed by reference? The change is just local to the function anyway.If that array is pointing into some meaningful area of memory (like in the example, a texture buffer), resizing the array could (probably would) move the array around, which I guess isn't illegal but then the function operating on the array wouldn't be accessing the correct place. Prevent them from changing the length, it prevents them from accessing anywhere but there.
Jun 03 2007
Started new thread with reply: resizeable arrays: T[new]
Jun 04 2007
Walter Bright wrote:Derek Parnell wrote:What, there isn't one? Isn't that what final does? Like this: final char[] charar = new char[](20); charar[1] = 'x'; // Allowed charar = new char[](20); // Not allowed charar.length = 10; // Not allowed -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DWhat is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.
Jun 03 2007
Bruno Medeiros wrote:Walter Bright wrote:Final only works at the outermost level. There is no way to have a mutable pointer to a const pointer to mutable data.Derek Parnell wrote:What, there isn't one? Isn't that what final does? Like this: final char[] charar = new char[](20); charar[1] = 'x'; // Allowed charar = new char[](20); // Not allowed charar.length = 10; // Not allowedWhat is the syntax for an immutable array of mutable characters?There isn't one. Such a construct is appealing in the abstract, but I haven't run across a legitimate use for it yet.
Jun 04 2007
Walter Bright wrote:Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
noSpam wrote:Walter Bright wrote:This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
Myron Alexander skrev:noSpam wrote:Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate. Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library. / OskarWalter Bright wrote:This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.
May 27 2007
Oskar Linde wrote:Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate. Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library. / OskarI see your point and agree. Regards, Myron.
May 28 2007
Oskar Linde wrote:Myron Alexander skrev:+1 (and thanks for your predicate-accepting sort routine, Oskar!)noSpam wrote:Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.Walter Bright wrote:This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library.No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too? --bb
May 28 2007
Bill Baxter pisze:Oskar Linde wrote:+1Myron Alexander skrev:+1 (and thanks for your predicate-accepting sort routine, Oskar!)noSpam wrote:Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.Walter Bright wrote:This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.IMHO that should not be in language. That's why I am opting for string *library* class/struct which could take care about such cases. BR Marcin Kuszczak (Aarti_pl)Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library.No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too?
May 29 2007
Aarti_pl Wrote:Bill Baxter pisze:I agree. I tend to think there are certain things which some apps don't need, in which case they can use the 'string' alias. Other apps need to do this sort of thing and want a 'String' class to handle it. I think there is room for both in the phobos/tango libraries. The default language/library support can reverse utf8 and 16 but it's not ideal, eg. convert to utf32, reverse, convert back. ;) ReganOskar Linde wrote:+1Myron Alexander skrev:+1 (and thanks for your predicate-accepting sort routine, Oskar!)noSpam wrote:Which would be very confusing. This is instead a perfect opportunity to take the *much* better path of finally depreciating the .sort and .reverse "properties". Equally good or better library implementations are possible (and exists). For example, .sort can't take an ordering predicate.Walter Bright wrote:This makes sense. For immutable arrays, the definition should drop "in place" and just return a copy.Derek Parnell wrote:I think it's better to return reversed/sorted copy. This will make such change more backward compatibile.const(char)[] // A mutable array of immutable characters? const(char[]) // An immutable array of mutable characters? const(const(char)[]) // An immutable array of immutable characters? char[] // A mutable array of mutable characters? What will happen with the .reverse and .sort array properties when used with const, invariant, and final qualifiers?They'll all fail.IMHO that should not be in language. That's why I am opting for string *library* class/struct which could take care about such cases.Also, the special casing of reversing char[] and wchar[] arrays, preserving the encoded unicode code points is definitely (imho) too specialized to belong in the language (runtime) as opposed to a library.No opinion there. What about the special code-point-at-a-time foreach for char[]? Do you dislike that too?
May 29 2007
Regan Heath wrote:The default language/library support can reverse utf8 and 16 but it's=notideal, eg. =C2=A0convert to utf32, reverse, convert back. ;) =20 ReganI am not sure what do you mean with this sentence...=20 dstring implementation doesn't do things according to your description,= so it's definitely not a case here... --=20 Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/d= oost/ -------------------------------------
May 29 2007
Marcin Kuszczak Wrote:Regan Heath wrote:I'm lost, what is "dstring"? All I meant was that using std.utf you can say: char[] text = "<characters which take more than 1 char to represent>"; text = toUTF8(toUTF32(text).reverse); and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathThe default language/library support can reverse utf8 and 16 but it's not ideal, eg. Â convert to utf32, reverse, convert back. ;) ReganI am not sure what do you mean with this sentence... dstring implementation doesn't do things according to your description, so it's definitely not a case here...
May 29 2007
On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz> wrote:and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathI think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a). writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeC
May 29 2007
Aziz K. Wrote:On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz> wrote:Is it a compound character in UTF32?and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathI think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeCCan you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable. Regan
May 29 2007
Regan Heath wrote:Aziz K. Wrote:Unicode defines multiple valid encodings for lots of accented characters; typically a single codepoint as well as separate codepoints for the accent and the "naked" character that combine when put together.On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz> wrote:Is it a compound character in UTF32?and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathI think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).I don't think std.utf.toUTF* combine or split accented characters, I'm pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeCCan you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.
May 29 2007
Frits van Bommel Wrote:Regan Heath wrote:I realise that. But, the important question is what does toUTF32 do with compound UTF8 characters (or UTF16 for that matter)?Aziz K. Wrote:Unicode defines multiple valid encodings for lots of accented characters; typically a single codepoint as well as separate codepoints for the accent and the "naked" character that combine when put together.On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan netmail.co.nz> wrote:Is it a compound character in UTF32?and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathI think your method doesn't take compound characters into account. For example: // The accented é can be represented by a single code-point. But let's assume it's a compound character (Ce`a).This is the key issue. I was under the (perhaps mistaken) impression it converted them to the single codepoint version (as that was easier), which is what I based this idea on. Really a simple test should tell us, can you whip one up to prove it one way or the other? I would, but I don't really use unicode at all and I don't know any compound characters offhand. I know, I know, I could google it but I also get the impression you know a bit more about this and would be able to devise a better test case, or two. Ahh.. another thought. I think I may have based my assumption on the foreach behaviour, eg. char[] text = "<compund stuff>"; foreach(dchar d; text) { .. } this _has_ to give the single codepoint versions, right? I suspect foreach uses the same code as in std.utf, but I may be wrong. Regan HeathI don't think std.utf.toUTF* combine or split accented characters, I'm pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeCCan you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.
May 29 2007
Regan Heath wrote:Frits van Bommel Wrote:--- import std.stdio; import std.utf; void main(char[][] args) { // Codepoint 0301 is "Combining acute accent". // Codepoint 00e9 is "Latin small letter e with acute" char[] str = "e\u0301 \u00e9"; // This doesn't show the combined character on my console. // Perhaps my terminal doesn't properly support combining characters. // (My encoding is utf-8, so that shouldn't be the problem) // The precomposed character (00e9) is displayed properly. // When piped to a .html file and wrapped with // <html><body>...</body></html> firefox properly displays both. writefln(str); foreach (dchar c; str) { writef("%04x ", c); } writefln(); // This produces the exact same output as above code: dchar[] dstr = toUTF32(str); writefln(dstr); foreach (dchar c; dstr) { writef("%04x ", c); } writefln(); } ---Regan Heath wrote:This is the key issue. I was under the (perhaps mistaken) impression it converted them to the single codepoint version (as that was easier), which is what I based this idea on. Really a simple test should tell us, can you whip one up to prove it one way or the other?Aziz K. Wrote:I don't think std.utf.toUTF* combine or split accented characters, I'm pretty sure it just does codepoint representation conversions (keeping the number of codepoints constant).writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC // This would print áeCCan you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it. My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.I would, but I don't really use unicode at all and I don't know any compound characters offhand. I know, I know, I could google it but I also get the impression you know a bit more about this and would be able to devise a better test case, or two.I normally have little use for it as well. A few Dutch (my native tongue) words need accents, but I'll be damned if I know the codes. Let alone those of any combining characters. My usual way of typing those is either using the symbol map or just typing it without accents, right-click, select spell-check suggestion with accents :). However, for above test I just looked up the codes in the code charts on the unicode website (unicode.org/charts for the precomposed character and the "symbols and punctuation" link at the top for the combining accent). It's pretty easy to find, actually.Ahh.. another thought. I think I may have based my assumption on the foreach behaviour, eg. char[] text = "<compund stuff>"; foreach(dchar d; text) { .. } this _has_ to give the single codepoint versions, right?As demonstrated above, it doesn't. The runtime support for the converting foreach statements just imports std.utf and use decode and toUTF*[1] (as well as some manual conversion to surrogates in the functions dealing with wchar). None of those do anything other than decoding and encoding single codepoints. [1]: The apparently undocumented (buf, dchar) overloads, which don't allocate.I suspect foreach uses the same code as in std.utf, but I may be wrong.About this, you're not :P. I suspect the reason std.utf doesn't do decomposition and/or combining is that it would require a lookup table, and possibly quite a big one at that. Though generating it shouldn't be a problem; it could be trivially extracted from the machine-readable data on the unicode website. Just take http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, the sixth column is the decomposition of the character in the first column. (It may also contain the mapping type between <angle brackets>) Note that for full decomposition this mapping needs to be applied recursively[2], i.e. the characters in the 6th column need to be decomposed as well (if possible). [2]: See the reminder in http://www.unicode.org/Public/UNIDATA/UCD.html#Character_Decomposition_Mappings
May 30 2007
Thanks for this. It appears you're right :) I can't get my console to show them either, which is annoying. I'm on windows, I set the font to lucida console and typed "chcp 65001" which makes the precomposed character appear corrently but not the combining character. Regan Heath
May 31 2007
Regan Heath wrote:Marcin Kuszczak Wrote:t'sRegan Heath wrote: =20The default language/library support can reverse utf8 and 16 but i=on,not ideal, eg. =C2=A0convert to utf32, reverse, convert back. ;) =20 Regan=20 I am not sure what do you mean with this sentence... =20 dstring implementation doesn't do things according to your descripti=singso it's definitely not a case here...=20 I'm lost, what is "dstring"? =20 All I meant was that using std.utf you can say: =20 char[] text =3D "<characters which take more than 1 char to represent= "; =20 text =3D toUTF8(toUTF32(text).reverse); =20 and the result will be a correctly reversed UTF8 string. Or am I mis=something? =20 Regan Heathdstring is implementation of string struct by Chris Miller which takes = care about slicing utf8 sequences and is compatible with char[], wchar[] and= dchar[]. I mentioned it because I think that it's better when foreach k= now nothing about slicing utf8 sequence (opposite to way it is implemented currently). It should be responsibility of string class (like e.g. dstr= ing) with proper opApply method. Because my previous e-mail was in context o= f dstring, I haven't understood what did you mean... 'reverse' and 'sort'= could be also implemented in such class in a way which will cope proper= ly with utf8 sequences... http://www.digitalmars.com/d/archives/digitalmars/D/announce/New_string= _implementation_dstring_1.0_4886.html http://www.dprogramming.com/dstring.php --=20 Regards Marcin Kuszczak (Aarti_pl) ------------------------------------- Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl) Doost (port of few Boost libraries) - http://www.dsource.org/projects/d= oost/ -------------------------------------
May 29 2007
Marcin Kuszczak Wrote:Regan Heath wrote:Ahh, thanks, that clears up the confusion I had. Yes, a string class/struct could definately handle the codepoint issue. It would also be able to handle it better than the method I suggested, which is a brute force method based on an assumption which may prove to be false (I suspect toUTF32 it converts UTF8 and 16 to non-compound UTF32 in all cases. But I could be wrong) But to respond to your original point (which I didn't address earlier, sorry) I have no problem with the foreach behaviour: char[] text = "<compound characters>"; foreach(dchar c; text) { .. } because, I suspect, the code which handles this is in std.utf (toUTF32) already. You seem to want to move the behaviour to a string class, but why can't it exist in both places? I guess the problem you might have with it is that it effectively says to someone implementing a D compiler: You need to handle conversions from/to UTF8, 16 and 32 and (assuming I am correct about toUTF32) you need to convert UTF8 and 16 to non-compound UTF32. Which might make it harder for someone to implement a D compiler. I don't know. Regan HeathMarcin Kuszczak Wrote:dstring is implementation of string struct by Chris Miller which takes care about slicing utf8 sequences and is compatible with char[], wchar[] and dchar[]. I mentioned it because I think that it's better when foreach know nothing about slicing utf8 sequence (opposite to way it is implemented currently). It should be responsibility of string class (like e.g. dstring) with proper opApply method. Because my previous e-mail was in context of dstring, I haven't understood what did you mean... 'reverse' and 'sort' could be also implemented in such class in a way which will cope properly with utf8 sequences...Regan Heath wrote:I'm lost, what is "dstring"? All I meant was that using std.utf you can say: char[] text = "<characters which take more than 1 char to represent>"; text = toUTF8(toUTF32(text).reverse); and the result will be a correctly reversed UTF8 string. Or am I missing something? Regan HeathThe default language/library support can reverse utf8 and 16 but it's not ideal, eg. Â convert to utf32, reverse, convert back. ;) ReganI am not sure what do you mean with this sentence... dstring implementation doesn't do things according to your description, so it's definitely not a case here...
May 29 2007
Walter Bright wrote:Under the new const/invariant/final regime, what are strings going to be ? Experience with other languages suggest that strings should be immutable. To express an array of const chars, one would write: const(char)[] but while that's clear, it doesn't just flow off the keyboard. Strings are so common this needs an alias, so: alias const(char)[] cstring; Why cstring? Because 'string' appears as both a module name and a common variable name. cstring also implies wstring for wchar strings, and dstring for dchars. String literals, on the other hand, will be invariant (which means they can be stuffed into read-only memory). So, typeof("abc") will be: invariant(char)[3] Invariants can be implicitly cast to const. In my playing around with source code, using cstring's seems to work out rather nicely. So, why not alias cstring to invariant(char)[] ? That way strings really would be immutable. The reason is that mutables cannot be implicitly cast to invariant, meaning that there'd be a lot of casts in the code. Casts are a sledgehammer, and a coding style that requires too many casts is a bad coding style.Perhaps I should just wait for the implementation, but I'm interested in knowing what your solution to .dup is. Given auto foo = "hello".dup; what is the type of foo? How do you support both of invariant char[] foo = "hello".dup; char[] bar = "hello".dup; -- Reiner
May 27 2007
Reiner Pope wrote:Perhaps I should just wait for the implementation, but I'm interested in knowing what your solution to .dup is. Given auto foo = "hello".dup; what is the type of foo?Most likely a plain (mutable) char[].How do you support both of invariant char[] foo = "hello".dup; char[] bar = "hello".dup;Likely the first will be an error as written, requiring a cast(invariant) to be inserted. Of course, since it doesn't make much sense to .dup in the example above ("hello" is already invariant, and copying an invariant array but not modifying the copy isn't typically useful) that shouldn't be much of a problem in this case. For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.
May 28 2007
Frits van Bommel wrote:Reiner Pope wrote:Funny, that's just what I thought of (including the name unique). When I first thought about it, I thought that such a construct would be very useful and very powerful, but I can't actually think of any use cases except with .dup and other constructor-type functions. (Although supporting them should alone be enough motivation).Perhaps I should just wait for the implementation, but I'm interested in knowing what your solution to .dup is. Given auto foo = "hello".dup; what is the type of foo?Most likely a plain (mutable) char[].How do you support both of invariant char[] foo = "hello".dup; char[] bar = "hello".dup;Likely the first will be an error as written, requiring a cast(invariant) to be inserted. Of course, since it doesn't make much sense to .dup in the example above ("hello" is already invariant, and copying an invariant array but not modifying the copy isn't typically useful) that shouldn't be much of a problem in this case. For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.
May 29 2007
Reiner Pope wrote:Frits van Bommel wrote:I'm pretty sure this has been suggested in these newsgroups in the past, including using "unique" as the keyword.For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.Funny, that's just what I thought of (including the name unique).When I first thought about it, I thought that such a construct would be very useful and very powerful, but I can't actually think of any use cases except with .dup and other constructor-type functions. (Although supporting them should alone be enough motivation).Some use cases I can think of: * Obviously, builtin array property .dup, as you mentioned. * std.utf.toUTF* (except the non-converting ones such as char[] -> char[]) * The result of certain operator overloads (arithmetic in a bignum class, opCat in a string class, the result of the builtin ~ operator for arrays) * Lots of stuff in std.string: join, split, maketrans, all the toString overloads, format, succ, abbrev. (AFAIK all of these are guaranteed to return a unique array) * toString overloads for classes that return the result of any of the above[1] (especially builtin ~ and std.string.format are often useful in toString, in my experience). As you can see, there are plenty of cases where newly allocated objects or arrays are returned. [1]: This one would require the ability to add "unique" in an overridden method, since it's a bad idea to require it of all classes. This could be considered to fall under the category of covariant return values.
May 29 2007
Frits van Bommel wrote:For other cases though, I could see how a "unique" (or similar) type constructor that would allow implicit conversion to both mutable and invariant (and const) types could be useful. For instance, if the strings in your example were replaced by mutable arrays, a "unique char[]" return value of .dup could then be assigned to mutable/const/invariant references without needing casts.We really tried to figure out a way to make "unique" work. It just doesn't offer anything useful over a cast(invariant). The way to create an invariant out of data is to use cast(invariant). As with all casts, one has to trust the programmer to use it appropriately. After it is cast, the type system will handle enforcement. You'll be able to cast away invariant, too, but you're on your own if you do so.
Jun 02 2007