www.digitalmars.com         C & C++   DMDScript  

D - pleading for String

reply Helmut Leitner <helmut.leitner chello.at> writes:
Currently there is no official String identifier in the D language.
One can only guess why this is so: I would assume that this is a 
void left for an object String to come. For now "char []" fills its
place for all practical purposes.

I would plead for an official

  alias char [] String;

do fill this void.

I'll try to add a few arguments. 

First, it's effortless. Anyone can define it on its own and use 
it seemlessly even now, as in:

  int main(String [] args)

there are no hidden problems. I used "Str" for my Venus library
and had no problems anywhere. I could rename it to "String" but
I would prefer an official solution.

Second, the "String" type is already deeply engraved in the
current API system. There are Phobos identifiers like 
   - toString
   - writeString
   - readString
   - ...
and DIG identifiers like
   - getString
   - saveString
   - colorString
   - ...
that all use "char []". If some new String-class would be defined,
these APIs would have be to renamed or left with a serious 
inconsistency.

On the other hand: a big and powerful String class might look 
attractive. It could include lots of functions usable by calls
like 
   s.cvtUpper(); or s.toDouble();

But: 

  - Using
      cvtUpper(s); or toDouble(s);
    isn't much worse. Technically its identical. You wouldn't be
    able to inherit from "String", but you also can't from "int".
    An alias would just give "String" the status of a primitive.

  - A String class can never be complete. You may provide a hundred
    functions and people will still add utilities of their own.
    And people will cry because they can't use this functionality 
    easily for some StringBuffer (outbuffer) class that they need 
    for performance reasons.

  - Such a class would bloat the code. As far as I know, the 
    compiler / linker / system has no way to get rid of unneeded
    methods. It's clear that this is hard, because the method 
    addresses must be part of some vtable thats needed in case
    of inheritance. So the linker would have to know about vtables
    and clean them up and strip methods during linking.

    So the situation is: any method of a String class would add to 
    the footprint of almost any statically linked D executable.

====

Therefore I think "alias char [] String;" is the way to go. 
I suggest to add it to the Phobos library as soon as possible.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com
Sep 13 2003
next sibling parent reply "Matthew Wilson" <matthew stlsoft.org> writes:
I could be convinced, if it was string_t. The reason is that this would then
be unambiguously a type(def) rather than a fully-fledged class.

Especially so, since String would be the first name of any future string
class.

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3F62F30A.73C49276 chello.at...
 Currently there is no official String identifier in the D language.
 One can only guess why this is so: I would assume that this is a
 void left for an object String to come. For now "char []" fills its
 place for all practical purposes.

 I would plead for an official

   alias char [] String;

 do fill this void.

 I'll try to add a few arguments.

 First, it's effortless. Anyone can define it on its own and use
 it seemlessly even now, as in:

   int main(String [] args)

 there are no hidden problems. I used "Str" for my Venus library
 and had no problems anywhere. I could rename it to "String" but
 I would prefer an official solution.

 Second, the "String" type is already deeply engraved in the
 current API system. There are Phobos identifiers like
    - toString
    - writeString
    - readString
    - ...
 and DIG identifiers like
    - getString
    - saveString
    - colorString
    - ...
 that all use "char []". If some new String-class would be defined,
 these APIs would have be to renamed or left with a serious
 inconsistency.

 On the other hand: a big and powerful String class might look
 attractive. It could include lots of functions usable by calls
 like
    s.cvtUpper(); or s.toDouble();

 But:

   - Using
       cvtUpper(s); or toDouble(s);
     isn't much worse. Technically its identical. You wouldn't be
     able to inherit from "String", but you also can't from "int".
     An alias would just give "String" the status of a primitive.

   - A String class can never be complete. You may provide a hundred
     functions and people will still add utilities of their own.
     And people will cry because they can't use this functionality
     easily for some StringBuffer (outbuffer) class that they need
     for performance reasons.

   - Such a class would bloat the code. As far as I know, the
     compiler / linker / system has no way to get rid of unneeded
     methods. It's clear that this is hard, because the method
     addresses must be part of some vtable thats needed in case
     of inheritance. So the linker would have to know about vtables
     and clean them up and strip methods during linking.

     So the situation is: any method of a String class would add to
     the footprint of almost any statically linked D executable.

 ====

 Therefore I think "alias char [] String;" is the way to go.
 I suggest to add it to the Phobos library as soon as possible.

 -- 
 Helmut Leitner    leitner hls.via.at
 Graz, Austria   www.hls-software.com

Sep 13 2003
next sibling parent Mike Wynn <mike l8night.co.uk> writes:
 "Helmut Leitner" <helmut.leitner chello.at> wrote in message
 news:3F62F30A.73C49276 chello.at...
 
Currently there is no official String identifier in the D language.
One can only guess why this is so: I would assume that this is a
void left for an object String to come. For now "char []" fills its
place for all practical purposes.

I would plead for an official

  alias char [] String;

do fill this void.

I'll try to add a few arguments.

First, it's effortless. Anyone can define it on its own and use
it seemlessly even now, as in:

  int main(String [] args)


 I could be convinced, if it was string_t. The reason is that this 

 be unambiguously a type(def) rather than a fully-fledged class.

 Especially so, since String would be the first name of any future string
 class.

I would like to see a true "string" type not just an alias to char[] so string can be unicode (UTF8,16,32 internally as required) idealy with a format function (sprintf/delphi format) something useable as x = String.format( "%d, %x", a, b ); x = String.format( "%d, %x", [a, b] ); or even a memebr function x = "%d, %x".format( a, b ); x = "%d, %x".format( [a, b] );
Sep 13 2003
prev sibling parent reply "Riccardo De Agostini" <riccardo.de.agostini email.it> writes:
"Matthew Wilson" <matthew stlsoft.org> ha scritto nel messaggio
news:bjutpt$n4f$1 digitaldaemon.com...
 I could be convinced, if it was string_t. The reason is that this would

 be unambiguously a type(def) rather than a fully-fledged class.

If string was a base type I don't think there would be any need for the _t suffix, just as int is not int_t. Furthermore, the reason why String classes exist is mainly because there's not as string type... Ric
Sep 15 2003
next sibling parent reply "Matthew Wilson" <matthew stlsoft.org> writes:
 I could be convinced, if it was string_t. The reason is that this would

 be unambiguously a type(def) rather than a fully-fledged class.

If string was a base type I don't think there would be any need for the _t suffix, just as int is not int_t.

Sure. My point was that because what Helmut wanted was specifically not a unique type, that the _t was appropriate, a visual reminder to all users that they're using an alias. However, today I've hypocrited myself by defining a "boolean" alias (from int, of course :) ) in the SynSoft libraries. What can ya do??
 Furthermore, the reason why String classes exist is mainly because there's
 not as string type...

I don't know enough about the various localisation issues to comment on that side of things, but I'm very nervous about having a string type, purely out of a fear of feature-creep.
Sep 15 2003
parent "Riccardo De Agostini" <riccardo.de.agostini email.it> writes:
"Matthew Wilson" <matthew stlsoft.org> ha scritto nel messaggio
news:bk448q$1j2a$2 digitaldaemon.com...
 However, today I've hypocrited myself by defining a "boolean" alias (from
 int, of course :) ) in the SynSoft libraries.

 What can ya do??

Now _that_'s another story... I wish there were bool8, bool16 and bool32, and surely not as aliases of bit... They are simply ugly, but a great help to serialization and interfacing to APIs.
 I don't know enough about the various localisation issues to comment on

 side of things, but I'm very nervous about having a string type, purely

 of a fear of feature-creep.

If it were a type (not a class) and functions acting on it were simply global functions, there shouldn't be much to worry about. Localization (case insensivity, collating order, non-Latin alphabets...) must be dealt with anyway; IMO a string type would even help the development of localization functions. Ric
Sep 15 2003
prev sibling parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
Why should you care if it's a builtin or a typedef?  Works the same either
way, it's opaque to the programmer.  You need to visit the declaration for
the type and familiarize yourself with it.  This is always true.. you can't
just assume anything about it.  Little 'hints' like _t are often misleading;
they're just going to encourage people like me to "alias string_t string;"

People are moving away from hungarian notation, decorated names.  It's hard
to maintain, and it's an eyesore, and nowadays you just put your cursor on a
symbol and hit a button and it brings you right to the declaration... how
much more can it hold your hand than that?  Plus then "typedef char[]
string_t;" is incompatible with char[] without a cast, you'd want "alias
char[] string_a;"  hehe and if string was a class instead, it'd have to be
what, "class CString {}" ?!    Here we go with the name proliferation again.

I just don't see much point in cluttering up the type names.  If *you* want
to do it, well there's always alias... that's what people tell me when I
bitch about not liking the identifiers.

Sean

"Riccardo De Agostini" <riccardo.de.agostini email.it> wrote in message
news:bk429b$1gi5$10 digitaldaemon.com...
 "Matthew Wilson" <matthew stlsoft.org> ha scritto nel messaggio
 news:bjutpt$n4f$1 digitaldaemon.com...
 I could be convinced, if it was string_t. The reason is that this would

 be unambiguously a type(def) rather than a fully-fledged class.

If string was a base type I don't think there would be any need for the _t suffix, just as int is not int_t. Furthermore, the reason why String classes exist is mainly because there's not as string type... Ric

Sep 15 2003
prev sibling parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
I would want it to be called "string" not "String" just to be consistent
with the rest of the basic types.  It's for that same reason that I don't
like "string_t".  If you have _t on the end of the type names it should be
on all the type names, and I don't think that is a good idea.

I think I'm with Mike though;   a string should be more than a simple
typedef for char[].  It should support unicode for one.  We should make all
the string functions work on string instead of char[], and have an implicit
conversion from char[] to string.  String literals should be of type string
instead of char[] as well.

If you want user extendability and no bloat, then all the "methods" of the
string should in fact be global functions taking a string as argument.

I'd like to get away from the printf-style formatting and go to something
more like:

string format(string formatstring, formatobject[]);

used like

string res = format("The %0 is %1 %2.", "moon", "very", "bright");

Then you can translate it:

string res = format("La %0 es %1 %2.", "luna", "muy", "brillante");

Even rearrange the text during translation:

string res = format("%1 %2 la %0 es.", "luna", "muy", "brillante");

And formatobject can be made to support any kind of formatting.

string res = format("%0", rightjustify("foo",12));

string res = format("%0", floatprecision(math.pi,16,12));

Sean

"Helmut Leitner" <helmut.leitner chello.at> wrote in message
news:3F62F30A.73C49276 chello.at...
 Currently there is no official String identifier in the D language.
 One can only guess why this is so: I would assume that this is a
 void left for an object String to come. For now "char []" fills its
 place for all practical purposes.

 I would plead for an official

   alias char [] String;

 do fill this void.

 I'll try to add a few arguments.

 First, it's effortless. Anyone can define it on its own and use
 it seemlessly even now, as in:

   int main(String [] args)

 there are no hidden problems. I used "Str" for my Venus library
 and had no problems anywhere. I could rename it to "String" but
 I would prefer an official solution.

 Second, the "String" type is already deeply engraved in the
 current API system. There are Phobos identifiers like
    - toString
    - writeString
    - readString
    - ...
 and DIG identifiers like
    - getString
    - saveString
    - colorString
    - ...
 that all use "char []". If some new String-class would be defined,
 these APIs would have be to renamed or left with a serious
 inconsistency.

 On the other hand: a big and powerful String class might look
 attractive. It could include lots of functions usable by calls
 like
    s.cvtUpper(); or s.toDouble();

 But:

   - Using
       cvtUpper(s); or toDouble(s);
     isn't much worse. Technically its identical. You wouldn't be
     able to inherit from "String", but you also can't from "int".
     An alias would just give "String" the status of a primitive.

   - A String class can never be complete. You may provide a hundred
     functions and people will still add utilities of their own.
     And people will cry because they can't use this functionality
     easily for some StringBuffer (outbuffer) class that they need
     for performance reasons.

   - Such a class would bloat the code. As far as I know, the
     compiler / linker / system has no way to get rid of unneeded
     methods. It's clear that this is hard, because the method
     addresses must be part of some vtable thats needed in case
     of inheritance. So the linker would have to know about vtables
     and clean them up and strip methods during linking.

     So the situation is: any method of a String class would add to
     the footprint of almost any statically linked D executable.

 ====

 Therefore I think "alias char [] String;" is the way to go.
 I suggest to add it to the Phobos library as soon as possible.

 -- 
 Helmut Leitner    leitner hls.via.at
 Graz, Austria   www.hls-software.com

Sep 13 2003
next sibling parent Helmut Leitner <helmut.leitner chello.at> writes:
"Sean L. Palmer" wrote:
 
 I would want it to be called "string" not "String" just to be consistent
 with the rest of the basic types.  It's for that same reason that I don't
 like "string_t".  If you have _t on the end of the type names it should be
 on all the type names, and I don't think that is a good idea.

Ok. It doesn't make a difference between "string" and "String" as long as we agree that there should never exist a situation where a - string primitive and a - String class should exist at the same time.
 I think I'm with Mike though;   a string should be more than a simple
 typedef for char[].  It should support unicode for one.  We should make all
 the string functions work on string instead of char[], and have an implicit
 conversion from char[] to string.  String literals should be of type string
 instead of char[] as well.

But that means that you project a major redesign of the language that will effect almost all existing code! D is just becoming popular. When do you want to do this? Why not add another, more powerful string class, named e. g. Ustring at any time, without any problems and allow for a gradual transition.
 If you want user extendability and no bloat, then all the "methods" of the
 string should in fact be global functions taking a string as argument.

That's right.
 I'd like to get away from the printf-style formatting and go to something
 more like:
 
 string format(string formatstring, formatobject[]);
 
 used like
 
 string res = format("The %0 is %1 %2.", "moon", "very", "bright");
 
 Then you can translate it:
 
 string res = format("La %0 es %1 %2.", "luna", "muy", "brillante");
 
 Even rearrange the text during translation:
 
 string res = format("%1 %2 la %0 es.", "luna", "muy", "brillante");
 
 And formatobject can be made to support any kind of formatting.
 
 string res = format("%0", rightjustify("foo",12));
 
 string res = format("%0", floatprecision(math.pi,16,12));

I like this too, but I think it has nothing to do with the current "String" discussion. There will always be a need to do this on a low level (to a char [], to an outbuffer) as well. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Sep 13 2003
prev sibling parent "Matthew Wilson" <matthew stlsoft.org> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bjvnjq$1opq$1 digitaldaemon.com...
 I would want it to be called "string" not "String" just to be consistent
 with the rest of the basic types.  It's for that same reason that I don't
 like "string_t".  If you have _t on the end of the type names it should be
 on all the type names, and I don't think that is a good idea.

_t is for typedef, not for type (at least in this case)
 I think I'm with Mike though;   a string should be more than a simple
 typedef for char[].  It should support unicode for one.  We should make

 the string functions work on string instead of char[], and have an

 conversion from char[] to string.  String literals should be of type

 instead of char[] as well.

 If you want user extendability and no bloat, then all the "methods" of the
 string should in fact be global functions taking a string as argument.

Yes, this would have to be the way. There will always be the "one essential method that is missing", and it's just instinctively wrong to keep lumping stuff into the one class. Look at std::basic_string!
Sep 13 2003