www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Higher level built-in strings

reply bearophile <bearophileHUGS lycos.com> writes:
This odd post comes from reading the nice part about strings of chapter 4 of
TDPL. In the last few years I have seen changes in how D strings are meant and
managed, changes that make them less and less like arrays (random-access
sequences of mutable code units) and more and more what they are at high level
(immutable bidirectional sequences of code points).

So a final jump is to make string types something different from normal arrays.
This frees them to behave better as high level strings. Probably other people
have had similar ideas, so if you think this post is boring or useless, please
ignore it.

So strings can have some differences compared to arrays:

1) length returns code points, but it's immutable and stored in the string, so
it's an O(1) operation (returns what std.utf.count() returns).

2) Assigning a string to another one is allowed:
string s1 = s2;
But changing the length manually is not allowed:
s1.length += 2; // error
This makes them more close to immutable, more similar to Python strings.

3) Another immutable string-specific attribute can be added that returns the
number of code units in the string, for example .codeunits or .nunits or
something similar.

4) s1[i] is not O(1), it's generally slower, and returns the i-th code point
(there are ways to speed this operation up to O(log n) with some internal
indexes). (Code points in dstrings can be accessed in O(1)).

5) foreach(c; s1) yields a code point dchars regardless if s1 is string,
dstring, wstring.
But you can use foreach(char c; s1) if s1 is a string and you are sure s1 uses
only 7 bit chars. But in such cases you can also use a immutable(ubyte)[].
Python3 does something similar, its has a str type that's always UTF16 (or
UTF32) and a bytes type that is similar to a ubyte[]. I think D can do
something similar. std.string functions can made to work on ubyte[] and
immutable(ubyte)[] too.


Some more things, I am not sure about them:

6) Strings can contain their hash value. This field is initialized with a empty
value (like -1) and computed lazily on-demand. (as done in Python). This can
make them faster when they often are put inside associative arrays and sets,
avoiding to compute their hash value again and again. So strings are not fully
immutable, because this value gets initialized. But it's a pure value, it's
determined only by the immutable contents of the string, so I don't think this
can cause big problems in multi-thread programs. If two threads update it, they
find the same result.

7) the validation (std.utf.validat(), or a cast) can be necessary to create a
string. This means that the type string/dstring/wstring implies it's validated
:-)

8) If strings are immutable, then the append can always create a new string. So
the extra information at the end of the memory block (used for appending to
dynamic arrays) is not necessary.

9)
- Today memory, even RAM, is cheap, but moving RAM to the CPU caches is not so
fast, so a program that works on just the cache is faster.
- UFT8 and UTF16 make strings bidirectional ranges.
- UTF encodings are just data compression, but it's not so efficient.
So a smarter compression scheme can compress strings more in memory, and the
decrease in cache misses can compensate for the increased CPU work to
decompress them. (But keeping strings compressed can turn them from
bidirectional ranges to forward ranges, this is not so bad).
There is a compressor that gives a decompression speed of about 3 times slower
than memcpy():
http://www.oberhumer.com/opensource/lzo/
LZO can be used transparently to compress strings in RAM when strings become
long enough.
Hash computation and equality tests done among compressed strings are faster.


Turning strings into something different from arrays looks like a loss, but in
practice they already are not arrays, thinking about them as normal arrays is
no so useful and it's UTF-unsafe, using [] to read the code units doesn't seem
so useful. Code that manages strings/wstrings as normal arrays is not correct
in general.

Bye,
bearophile
Jul 18 2010
next sibling parent reply %u <ae au.com> writes:
 5) foreach(c; s1) yields a code point dchars regardless if s1 is string,

 But you can use foreach(char c; s1) if s1 is a string and you are sure s1 uses

does something similar, its has a str type that's always UTF16 (or UTF32) and a bytes type that is similar to a ubyte[]. I think D can do something similar. std.string functions can made to work on ubyte[] and immutable(ubyte)[] too. Actually I think D should depreciate char and wchar in user code except inside externals for compatibility with C and Windows functions. When you are working with individual characters you almost always want either a dchar or a byte.
Jul 19 2010
parent bearophile <bearophileHUGS lycos.com> writes:
%u:
 When you are working with individual characters you almost always want either a
 dchar or a byte.

A dchar and an ubyte are better. This is almost what Python3 does and I think it can be good. Maybe other people will give their opinions about my original post. Bye, bearophile
Jul 19 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 This odd post comes from reading the nice part about strings of chapter 4 of
 TDPL. In the last few years I have seen changes in how D strings are meant
 and managed, changes that make them less and less like arrays (random-access
 sequences of mutable code units) and more and more what they are at high
 level (immutable bidirectional sequences of code points).

Strings in D are deliberately meant to be arrays, not special things. Other languages make them special because they have insufficiently powerful arrays. As for indexing by code point, I also believe this is a mistake. It is proposed often, but overlooks: 1. most string operations, such as copying and searching, even regular expressions, work just fine using regular indices. 2. doing the operations in (1) using code points and having to continually decode the strings would result in disastrously slow code. 3. the user can always layer a code point interface over the strings, but going the other way is not so practical.
Jul 19 2010
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound2 digitalmars.com)'s article
 bearophile wrote:
 This odd post comes from reading the nice part about strings of chapter 4 of
 TDPL. In the last few years I have seen changes in how D strings are meant
 and managed, changes that make them less and less like arrays (random-access
 sequences of mutable code units) and more and more what they are at high
 level (immutable bidirectional sequences of code points).

languages make them special because they have insufficiently powerful arrays. As for indexing by code point, I also believe this is a mistake. It is proposed often, but overlooks: 1. most string operations, such as copying and searching, even regular expressions, work just fine using regular indices. 2. doing the operations in (1) using code points and having to continually decode the strings would result in disastrously slow code. 3. the user can always layer a code point interface over the strings, but going the other way is not so practical.

4. Sometimes one can make valid assumptions about the contents of a string. For example, in an internal utility app that will never be internationalized you may get away with assuming a character is an ASCII byte. If you know your input will be in the Basic Multilingual Plane (for example if working with pre-sanitized input), you can use wstrings and always assume a character is 2 bytes. 5. For dchar strings, a code unit equals a code point. Should the interface for dchar strings be completely different than that for char and wchar strings?
Jul 19 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 19 Jul 2010 16:04:21 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 bearophile wrote:
 This odd post comes from reading the nice part about strings of chapter  
 4 of
 TDPL. In the last few years I have seen changes in how D strings are  
 meant
 and managed, changes that make them less and less like arrays  
 (random-access
 sequences of mutable code units) and more and more what they are at high
 level (immutable bidirectional sequences of code points).

Strings in D are deliberately meant to be arrays, not special things. Other languages make them special because they have insufficiently powerful arrays.

Andrei is changing that. Already, isRandomAccessRange!(string) == false. I kind of don't like this direction, even though its clever. What you end up with is phobos refusing to believe that a string or char[] is an array, but the compiler saying it is. What I'd prefer is something where the compiler types string literals as string, a type defined by phobos which contains as its first member an immutable(char)[] (where the compiler puts the literal). Then we can properly limit the other operations.
 As for indexing by code point, I also believe this is a mistake. It is  
 proposed often, but overlooks:

 1. most string operations, such as copying and searching, even regular  
 expressions, work just fine using regular indices.

 2. doing the operations in (1) using code points and having to  
 continually decode the strings would result in disastrously slow code.

 3. the user can always layer a code point interface over the strings,  
 but going the other way is not so practical.

I agree here. Anything that uses indexing to perform a linear operation is bound for the scrap heap. But what about this: foreach(c; str) which types c as char (or immutable char), not dchar. These are the subtle problems that we have with the dichotomy of phobos refusing to believe a string is an array, but the compiler believing it is. I think the default inference for this should be dchar, and phobos can make that true as long as it controls the string type. There are other points to consider: 1) a string *could be* indexed by character and return the code point being pointed to. 2) even slicing could be valid as long as the slice operator jumps back to the start of the dchar being encoded. This might make for very tricky code, but then again, such is the cost of trying to slice something like a utf-8 string :) But having the compiler force the string type to be an array, when it clearly isn't, doesn't help. Give the runtime the choice, like it's done for AA's, and I think we may have something that is workable, and doesn't suck performance-wise. -Steve
Jul 19 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:
 On Mon, 19 Jul 2010 16:04:21 -0400, Walter Bright 
 <newshound2 digitalmars.com> wrote:
 Strings in D are deliberately meant to be arrays, not special things. 
 Other languages make them special because they have insufficiently 
 powerful arrays.

Andrei is changing that. Already, isRandomAccessRange!(string) == false. I kind of don't like this direction, even though its clever.

That decision may be a mistake.
 I agree here.  Anything that uses indexing to perform a linear operation 
 is bound for the scrap heap.  But what about this:
 
 foreach(c; str)
 
 which types c as char (or immutable char), not dchar.

Probably too late to change that one.
Jul 19 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/19/2010 11:31 PM, Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Mon, 19 Jul 2010 16:04:21 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:
 Strings in D are deliberately meant to be arrays, not special things.
 Other languages make them special because they have insufficiently
 powerful arrays.

Andrei is changing that. Already, isRandomAccessRange!(string) == false. I kind of don't like this direction, even though its clever.

That decision may be a mistake.

I think otherwise. In fact I confess I am extremely excited. The current state of affairs described built-in strings very accurately: they are formally bidirectional ranges, yet they offer random access for code units that you can freely use if you so wish. It's modeling reality very accurately. As far as I know, all algorithms in std.algorithm that work well without decoding are special-cased for strings to work fast and yield correct results. Andrei
Jul 19 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I think otherwise. In fact I confess I am extremely excited. The current 
 state of affairs described built-in strings very accurately: they are 
 formally bidirectional ranges, yet they offer random access for code 
 units that you can freely use if you so wish. It's modeling reality very 
 accurately.
 
 As far as I know, all algorithms in std.algorithm that work well without 
 decoding are special-cased for strings to work fast and yield correct 
 results.

That's good to hear.
Jul 20 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

As it turns out, indexing by byte is *far* more common than indexing by code
unit, in fact, I've never ever needed to index by code unit.<

OK.
Probably too late to change that one.

There is very little D2 code around, so little changes as this one are possible still. As alternative see also this enhancement request, partially coming from a post of mine in D.learn: http://d.puremagic.com/issues/show_bug.cgi?id=4483 (If you refuse this fallback idea too, then it's better to close this bug report. Keeping too much dead wood in Bugzilla will eventually cause small troubles.) In my original post my points 2, 6, 8 and 9 are still valid. To be nice they need strings to be a little different from standard arrays. Thanks for the comments, bearophile
Jul 20 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 Probably too late to change that one.

There is very little D2 code around, so little changes as this one are possible still.

It's a D1 feature, and has been there since nearly the beginning.
Jul 20 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Steven Schveighoffer wrote:
 On Tue, 20 Jul 2010 14:29:43 -0400, Walter Bright 
 <newshound2 digitalmars.com> wrote:
 It's a D1 feature, and has been there since nearly the beginning.

Since when did we care about D1 compatibility?

We care about incompatibilities that silently break code.
 const, inout, array appending, final, typeof(string), TLS globals just 
 to name a few...
 
 If you expect D1 code to compile fine and run on D2, you are deluding 
 yourself.

No argument there, but we do try to avoid silent breakage.
 The worst that happens is that code starts using dchar instead of char, 
 and either a compiler error occurs and it's fixed simply by doing:
 
 foreach(char c; str)
 
 or it compiles fine because the type is never explicitly stated, and 
 what's the big deal there?  The code just becomes more utf compatible 
 for free :)

I don't think it's necessarily true that the user really wanted the decoded character rather than the byte, or even that wanting the decoded character is more likely to be desired.
Jul 20 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 On Tue, 20 Jul 2010 14:29:43 -0400, Walter Bright 
 <newshound2 digitalmars.com> wrote:
 It's a D1 feature, and has been there since nearly the beginning.

Since when did we care about D1 compatibility?

We care about incompatibilities that silently break code.
 const, inout, array appending, final, typeof(string), TLS globals just 
 to name a few...

 If you expect D1 code to compile fine and run on D2, you are deluding 
 yourself.

No argument there, but we do try to avoid silent breakage.
 The worst that happens is that code starts using dchar instead of 
 char, and either a compiler error occurs and it's fixed simply by doing:

 foreach(char c; str)

 or it compiles fine because the type is never explicitly stated, and 
 what's the big deal there?  The code just becomes more utf compatible 
 for free :)

I don't think it's necessarily true that the user really wanted the decoded character rather than the byte, or even that wanting the decoded character is more likely to be desired.

Unfortunately it's inconsistent. foreach for ranges operates in terms of front, empty, popFront - just not for strings. I avoid foreach in std.algorithm and in generic code. For my money I'd be okay if foreach (c; str) wouldn't even compile - the user would be asked to specify the type. But if the implicit type is allowed, I'm afraid I believe that dchar would be the best choice. Andrei
Jul 20 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-07-20 00:31:34 -0400, Walter Bright <newshound2 digitalmars.com> said:

 Steven Schveighoffer wrote:
 I agree here.  Anything that uses indexing to perform a linear 
 operation is bound for the scrap heap.  But what about this:
 
 foreach(c; str)
 
 which types c as char (or immutable char), not dchar.

Probably too late to change that one.

Sad. That's one of the first things I tried when I first learned D and the result did surprise me. I expected foreach to iterate on characters (code points), not code units. Then I saw I could add 'dchar' to get that behaviour and found that to be not too bad. The big problem here is that ranges and foreach behave differently. A range that doesn't work with foreach isn't a good range. That's even worse when that range is at the core of the language because it'll look bad on both ranges and the language. As it stands now, when doing generic programming we'd have to write foreach like this so it works the same with foreach as it does with the range APIs, just in case the range is a string: foreach (ElementType!(typeof(range)) c; range) {} Something needs to change so the above always work the same as not specifying the type! Either foreach should be adapted or ranges should let go the idea of iterating on code points for the default string type. As for the "too late to change" stance, I'm not sure. It'll certainly be to late to change in a year, but right now D2 is still pretty new. What makes you say it's too late? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jul 20 2010
next sibling parent DCoder <anon ym.ous> writes:
== Quote from Michel Fortin (michel.fortin michelf.com)'s article
 On 2010-07-20 00:31:34 -0400, Walter Bright

 Steven Schveighoffer wrote:
 I agree here.  Anything that uses indexing to perform a linear
 operation is bound for the scrap heap.  But what about this:

 foreach(c; str)

 which types c as char (or immutable char), not dchar.

Probably too late to change that one.


 the result did surprise me. I expected foreach to iterate on

 (code points), not code units. Then I saw I could add 'dchar' to

 that behaviour and found that to be not too bad.
 The big problem here is that ranges and foreach behave

 range that doesn't work with foreach isn't a good range. That's

 worse when that range is at the core of the language because it'll

 bad on both ranges and the language.
 As it stands now, when doing generic programming we'd have to

 foreach like this so it works the same with foreach as it does

 range APIs, just in case the range is a string:
 	foreach (ElementType!(typeof(range)) c; range) {}
 Something needs to change so the above always work the same as not
 specifying the type! Either foreach should be adapted or ranges

 let go the idea of iterating on code points for the default string

I'm wondering how bad would it be introduce a schar (short char, 1 byte) type and then let char simply map to a "default" char type: dchar, wchar, or whatever we tell the compiler. By default, char would map to dchar. alias char dchar; Coupling this with implicit cast of schar/char (from imported C code) to dchar, I think this might even help fix many such situations.
Jul 20 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Tuesday, July 20, 2010 05:30:51 DCoder wrote:
 I'm wondering how bad would it be introduce a schar (short char, 1
 byte) type and then let char simply map to a "default" char type:
 dchar, wchar, or whatever we tell the compiler. By default, char
 would map to dchar.
 
 alias char dchar;
 
 Coupling this with implicit cast of schar/char (from imported C
 code) to dchar, I think this might even help fix many such
 situations.

That doesn't really gain us anything. It would likely just make it so that char would mean dchar and be used by default. And honestly, using dchar is not really the solution. If you wan to do that, you can. However, using dchar in the general case is not necessarily a good idea since it wastes so much space. On top of all that, TDPL was _very_ clear on the differences between char, wchar, and dchar and TDPL is supposed to stay accurate. So, any changes which would contradict TDPL need a _very_ good reason for being made, or they won't be. Overall, strings in D work great. The only issue really is making it so that you properly deal with the cases where you need to treat them as code points vs when you can treat them as code units. Any programmer who wants to entirely avoid the problem can just using dchar and dstring. For the rest, you need to understand how string and wstring work and just handle them appropriately. There may be a few places where things should be smoothed out, but overall, I really do think that they work well as they are. - Jonathan M Davis
Jul 20 2010
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Michel Fortin wrote:
 As for the "too late to change" stance, I'm not sure. It'll certainly be 
 to late to change in a year, but right now D2 is still pretty new. What 
 makes you say it's too late?

As I said to bearophile, it's a D1 feature.
Jul 20 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 1. most string operations, such as copying and searching, even regular 
 expressions, work just fine using regular indices.
 
 2. doing the operations in (1) using code points and having to continually 
 decode the strings would result in disastrously slow code.

In my original post I have forgotten another difference over arrays: 5b) a method like ".unit()" that allows to index code units. So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays. Copying is done on the bytes themselves, with a memcpy, no decoding necessary. If the point (9) (automatic LZO encoding) is used, then copying can be 2-3 times faster for long strings (because there is less data and you don't need to uncompress it to copy). (if such compression is added, then strings can need a third accessor method, to the true bytes).
 3. the user can always layer a code point interface over the strings, but going
 the other way is not so practical.

This is true. But it makes the string usage unnecessarily low-level and hard... A better design in a smart system language as D is to give strings a default high level "interface" that sees strings as what they are at high level, and add a second lower level interface when you need faster lower-level fiddling (so they have [] that returns code points and unit() that returns code units). Bye, bearophile
Jul 19 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 1. most string operations, such as copying and searching, even regular 
 expressions, work just fine using regular indices.
 
 2. doing the operations in (1) using code points and having to continually
  decode the strings would result in disastrously slow code.

In my original post I have forgotten another difference over arrays: 5b) a method like ".unit()" that allows to index code units. So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays.

This is backwards. The [i] should behave as expected for arrays. As it turns out, indexing by byte is *far* more common than indexing by code unit, in fact, I've never ever needed to index by code unit. (Though it is sometimes necessary to step through by code unit, that's different from indexing by code unit.)
 3. the user can always layer a code point interface over the strings, but
 going the other way is not so practical.

This is true. But it makes the string usage unnecessarily low-level and hard...

I don't believe that manipulating strings in D is hard, even if you do have to work with multibyte characters. You do have to be aware they are multibyte, but I think that just comes with being a programmer. A better design in a smart system language as D is to give strings a
 default high level "interface" that sees strings as what they are at high
 level, and add a second lower level interface when you need faster
 lower-level fiddling (so they have [] that returns code points and unit()
 that returns code units).

I have some moderate experience with using utf. First there's the D javascript engine, which is fully utf'd. The D string design fits in with it perfectly. Then there are chunks of C++ ascii-only code I've translated to D, and it then worked with utf-8 without further modification. Based on that, I believe the D string design hits the sweet spot between efficiency and utility.
Jul 19 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/19/2010 11:29 PM, Walter Bright wrote:
 bearophile wrote:
 Walter Bright:
 1. most string operations, such as copying and searching, even
 regular expressions, work just fine using regular indices.

 2. doing the operations in (1) using code points and having to
 continually
 decode the strings would result in disastrously slow code.

In my original post I have forgotten another difference over arrays: 5b) a method like ".unit()" that allows to index code units. So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays.

This is backwards. The [i] should behave as expected for arrays. As it turns out, indexing by byte is *far* more common than indexing by code unit, in fact, I've never ever needed to index by code unit. (Though it is sometimes necessary to step through by code unit, that's different from indexing by code unit.)

Exactly. And that's what the bidirectional range interface is doing for strings.
 3. the user can always layer a code point interface over the strings,
 but
 going the other way is not so practical.

This is true. But it makes the string usage unnecessarily low-level and hard...

I don't believe that manipulating strings in D is hard, even if you do have to work with multibyte characters. You do have to be aware they are multibyte, but I think that just comes with being a programmer. A better design in a smart system language as D is to give strings a
 default high level "interface" that sees strings as what they are at high
 level, and add a second lower level interface when you need faster
 lower-level fiddling (so they have [] that returns code points and unit()
 that returns code units).

I have some moderate experience with using utf. First there's the D javascript engine, which is fully utf'd. The D string design fits in with it perfectly. Then there are chunks of C++ ascii-only code I've translated to D, and it then worked with utf-8 without further modification. Based on that, I believe the D string design hits the sweet spot between efficiency and utility.

I agree. In fact there is no language I know that can compete with D at UTF string handling. Andrei
Jul 19 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright Wrote:

 bearophile wrote:
 Walter Bright:
 1. most string operations, such as copying and searching, even regular 
 expressions, work just fine using regular indices.
 
 2. doing the operations in (1) using code points and having to continually
  decode the strings would result in disastrously slow code.

In my original post I have forgotten another difference over arrays: 5b) a method like ".unit()" that allows to index code units. So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays.

This is backwards. The [i] should behave as expected for arrays. As it turns out, indexing by byte is *far* more common than indexing by code unit, in fact, I've never ever needed to index by code unit. (Though it is sometimes necessary to step through by code unit, that's different from indexing by code unit.)

I've had the same experience. The proposed changes would make string useless to me, even for Unicode work. I'd end up using ubyte[] instead.
Jul 20 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Sean Kelly wrote:
 I've had the same experience.  The proposed changes would make string useless
 to me, even for Unicode work.  I'd end up using ubyte[] instead.

At this point, the experience with D strings is that D gets it right. To change it would require someone who has spent a *lot* of hours programming in utf making a very compelling argument.
Jul 20 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 20 Jul 2010 14:29:43 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 bearophile wrote:
 Probably too late to change that one.

possible still.

It's a D1 feature, and has been there since nearly the beginning.

Since when did we care about D1 compatibility? const, inout, array appending, final, typeof(string), TLS globals just to name a few... If you expect D1 code to compile fine and run on D2, you are deluding yourself. The worst that happens is that code starts using dchar instead of char, and either a compiler error occurs and it's fixed simply by doing: foreach(char c; str) or it compiles fine because the type is never explicitly stated, and what's the big deal there? The code just becomes more utf compatible for free :) -Steve
Jul 20 2010