www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - New string implementation: dstring 1.0

reply "Chris Miller" <chris dprogramming.com> writes:
Check out the FAQ at http://www.dprogramming.com/dstring.php and give it a  
spin.
Documentation is online at  
http://www.dprogramming.com/docs/dstring/dstring.html

Let me know what you think!
Oct 11 2006
next sibling parent reply Chad J <""gamerChad\" spamIsBad gmail.com"> writes:
Chris Miller wrote:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give it 
 a  spin.
 Documentation is online at  
 http://www.dprogramming.com/docs/dstring/dstring.html
 
 Let me know what you think!

Looks cool! I'm to strapped on time to try it out right now, but I will when I get the chance. I do have a couple questions and a comment based on the documentation: - Is toString() the same as toUTF8()? If so, I'd like to see something in the documentation to say they are the same. - Are there plans to extend this to act as a string manipulating library as well as a string type, adding stuff from phobos like toUpper(), toLower(), capitalize(), split(), etc? That would be cool. - Seems like it would be handy to have functions for converting to C style null terminated strings. Something like char* toUTF8C() and wchar* toUTF16C(). This looks like a very useful string type. I just hope that if people like it, it becomes part of the standard library or some such so that newbs aren't caught by indexing/slicing in the current string-as-array approach. Anyhow, thanks for doing that.
Oct 11 2006
parent "Chris Miller" <chris dprogramming.com> writes:
On Wed, 11 Oct 2006 17:27:52 -0400, Chad J spamIsBad gmail.com">  
<"<""gamerChad" "> wrote:

 Chris Miller wrote:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give  
 it a  spin.
 Documentation is online at   
 http://www.dprogramming.com/docs/dstring/dstring.html
  Let me know what you think!

Looks cool! I'm to strapped on time to try it out right now, but I will when I get the chance. I do have a couple questions and a comment based on the documentation: - Is toString() the same as toUTF8()? If so, I'd like to see something in the documentation to say they are the same.

Yes, they are the same; they both state returning UTF-8 so I didn't think it was necessary. It mainly only has toString for consistency with other D types and so it will work directly with writef.
 - Are there plans to extend this to act as a string manipulating library  
 as well as a string type, adding stuff from phobos like toUpper(),  
 toLower(), capitalize(), split(), etc?  That would be cool.

If there is enough interest, yes.
 - Seems like it would be handy to have functions for converting to C  
 style null terminated strings.  Something like char* toUTF8C() and  
 wchar* toUTF16C().

I wasn't sure about this because it's not guaranteed that C uses Unicode; but I guess toUTF8z(), toUTF16z() and toUTF32z() would be fine, which only mean zero-terminated, not necessarily compatible with C.
 This looks like a very useful string type.  I just hope that if people  
 like it, it becomes part of the standard library or some such so that  
 newbs aren't caught by indexing/slicing in the current string-as-array  
 approach.  Anyhow, thanks for doing that.

Thanks
Oct 12 2006
prev sibling next sibling parent Kristian <kjkilpi gmail.com> writes:
On Wed, 11 Oct 2006 20:40:18 +0300, Chris Miller <chris dprogramming.com>  
wrote:

 Check out the FAQ at http://www.dprogramming.com/dstring.php and give it  
 a spin.
 Documentation is online at  
 http://www.dprogramming.com/docs/dstring/dstring.html

 Let me know what you think!

Good work indeed. Now slicing can be done... Thanks. :) I am wondering if D itself should support something like this. Well, even if 'char[]' will be aliased to 'string', it would make code clearer for everybody (IMHO). It would tell that this is a string, not an array of characters. (It alone would make the aliasing meaningful, not to mention that you cannot slice char[] safely.) I think one should be aware of dstring's worst case memory consumption. For example, a huge file is read to a dstring. There is only one character that would make the whole string to use dchars instead of char. The string would then take four times more space than char[] would. Of course, if one would use char[], he/she couldn't slice it (in O(1) time). :) (And it would be probably necessary to write special routines for special cases anyway.)
Oct 13 2006
prev sibling next sibling parent reply Fredrik Olsson <peylow gmail.com> writes:
Chris Miller skrev:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give it 
 a spin.
 Documentation is online at 
 http://www.dprogramming.com/docs/dstring/dstring.html
 
 Let me know what you think!

I love it! This is very much needed and should go into Phobos yesterday! Solves the problem of: char[] foo = "hög"; assert(foo.length == 3); // Sorry UTF-8, this is == 4 assert(foo[1] == 'ö'); // Not a chance! You implementation of string could be a perfect wrapper that makes the fact that UTF-8 is of variable char size, invisible to the programmer. // Fredrik Olsson
Oct 24 2006
parent reply Aarti_pl <aarti interia.pl> writes:
Fredrik Olsson napisał(a):
 Chris Miller skrev:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give 
 it a spin.
 Documentation is online at 
 http://www.dprogramming.com/docs/dstring/dstring.html

 Let me know what you think!

I love it! This is very much needed and should go into Phobos yesterday! Solves the problem of: char[] foo = "hög"; assert(foo.length == 3); // Sorry UTF-8, this is == 4 assert(foo[1] == 'ö'); // Not a chance! You implementation of string could be a perfect wrapper that makes the fact that UTF-8 is of variable char size, invisible to the programmer. // Fredrik Olsson

I didn't write it before when DString was introduced, but I got also very positive feelings about it. As a programmer in common cases I should not be bothered about implementation details of string. It should not matter if I work with char[], wchar[] or dchar[]. I agree that it should be putted in Phobos immediately! (Maybe just some optimalizations with string size could be added, so adding one dchar to char[] string will not cause conversion from char[] to dchar[], but rather dchar to char). Regards Marcin Kuszczak Aarti_pl
Oct 24 2006
parent reply "Chris Miller" <chris dprogramming.com> writes:
On Tue, 24 Oct 2006 05:12:21 -0400, Aarti_pl <aarti interia.pl> wrote:

 Fredrik Olsson napisał(a):
 Chris Miller skrev:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give  
 it a spin.
 Documentation is online at  
 http://www.dprogramming.com/docs/dstring/dstring.html

 Let me know what you think!

yesterday! Solves the problem of: char[] foo = "hög"; assert(foo.length == 3); // Sorry UTF-8, this is == 4 assert(foo[1] == 'ö'); // Not a chance! You implementation of string could be a perfect wrapper that makes the fact that UTF-8 is of variable char size, invisible to the programmer. // Fredrik Olsson

I didn't write it before when DString was introduced, but I got also very positive feelings about it. As a programmer in common cases I should not be bothered about implementation details of string. It should not matter if I work with char[], wchar[] or dchar[].

Thanks guys.
 I agree that it should be putted in Phobos immediately! (Maybe just some  
 optimalizations with string size could be added, so adding one dchar to  
 char[] string will not cause conversion from char[] to dchar[], but  
 rather dchar to char).

But then it won't be ultra fast at finding dchar codepoints.
Oct 24 2006
parent reply Fredrik Olsson <peylow gmail.com> writes:
Chris Miller skrev:
<snip>
 I agree that it should be putted in Phobos immediately! (Maybe just 
 some optimalizations with string size could be added, so adding one 
 dchar to char[] string will not cause conversion from char[] to 
 dchar[], but rather dchar to char).

But then it won't be ultra fast at finding dchar codepoints.

A little thought. Two bits are now used to represent the internal format, but there are only three formats available. Maybe the fourth format code could be "size optimal, but slightly slower"? I mean, just what else should quad 2ghz machines do? :) // Fredrik Olsson
Oct 24 2006
parent "Chris Miller" <chris dprogramming.com> writes:
On Tue, 24 Oct 2006 06:20:44 -0400, Fredrik Olsson <peylow gmail.com>  
wrote:

 Chris Miller skrev:
 <snip>
 I agree that it should be putted in Phobos immediately! (Maybe just  
 some optimalizations with string size could be added, so adding one  
 dchar to char[] string will not cause conversion from char[] to  
 dchar[], but rather dchar to char).


A little thought. Two bits are now used to represent the internal format, but there are only three formats available. Maybe the fourth format code could be "size optimal, but slightly slower"? I mean, just what else should quad 2ghz machines do? :)

Neat idea. Currently it uses that 4th state to represent an uninitialized string, but that's not so important. I'll think about this.
Oct 24 2006
prev sibling parent reply Olli Aalto <olli.aalto cardinal.fi> writes:
Chris Miller wrote:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give it 
 a spin.
 Documentation is online at 
 http://www.dprogramming.com/docs/dstring/dstring.html
 
 Let me know what you think!

Hi! The dstring module is very nice, but it's lacking one thing that I, at least personally, am gotten used to. It's that you cannot assign a null to it. string str = null; or string getAStringWhichMightBeNull() { return null; } It's not a big thing but makes using the string a bit clumsy. One another thing. Not necessarily dstring's fault, but I tried compiling it with C::B in release mode. I had set every compiler option available for the release build and when I compiled I started getting errors about functions not returning any values. They were functions which had switches, which default cases had return statements. I don't know which flag causes this behavior, but a debug build works just fine. O.
Oct 24 2006
parent reply "Chris Miller" <chris dprogramming.com> writes:
On Tue, 24 Oct 2006 06:30:48 -0400, Olli Aalto <olli.aalto cardinal.fi> =
 =

wrote:

 Chris Miller wrote:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give=


 it a spin.
 Documentation is online at  =


 http://www.dprogramming.com/docs/dstring/dstring.html
  Let me know what you think!

Hi! The dstring module is very nice, but it's lacking one thing that I, at=

 least personally, am gotten used to. It's that you cannot assign a nul=

 to it.

 string str =3D null;

I don't think this is possible. The string is also automatically initialized to an empty/null string.
 or

 string getAStringWhichMightBeNull() { return null; }

There is string(), such as: string mystring =3D string(); or: myfunction(string());
 It's not a big thing but makes using the string a bit clumsy.

 One another thing. Not necessarily dstring's fault, but I tried  =

 compiling it with C::B in release mode. I had set every compiler optio=

 available for the release build and when I compiled I started getting =

 errors about functions not returning any values. They were functions  =

 which had switches, which default cases had return statements.
 I don't know which flag causes this behavior, but a debug build works =

 just fine.

I'm only getting this when using -w to get DMD to output warnings. I thi= nk = DMD's warnings are terrible and I never use or consider them. Please don= 't = use this switch.
Oct 24 2006
parent Olli Aalto <oaalto gmail.com> writes:
Chris Miller wrote:
 On Tue, 24 Oct 2006 06:30:48 -0400, Olli Aalto <olli.aalto cardinal.fi> 
 wrote:
 
 Chris Miller wrote:
 Check out the FAQ at http://www.dprogramming.com/dstring.php and give 
 it a spin.
 Documentation is online at 
 http://www.dprogramming.com/docs/dstring/dstring.html
  Let me know what you think!

Hi! The dstring module is very nice, but it's lacking one thing that I, at least personally, am gotten used to. It's that you cannot assign a null to it. string str = null;

I don't think this is possible. The string is also automatically initialized to an empty/null string.

Yes, and it's what makes the usefulness of the module pretty limited. Empty string is not the same as a null string. Maybe this is just my Java background.
 or

 string getAStringWhichMightBeNull() { return null; }

There is string(), such as: string mystring = string(); or: myfunction(string());

I didn't get this part.
 It's not a big thing but makes using the string a bit clumsy.

 One another thing. Not necessarily dstring's fault, but I tried 
 compiling it with C::B in release mode. I had set every compiler 
 option available for the release build and when I compiled I started 
 getting errors about functions not returning any values. They were 
 functions which had switches, which default cases had return statements.
 I don't know which flag causes this behavior, but a debug build works 
 just fine.

I'm only getting this when using -w to get DMD to output warnings. I think DMD's warnings are terrible and I never use or consider them. Please don't use this switch.

I didn't have the -w switch on. I used -inline -O -release switches. O.
Oct 24 2006