digitalmars.D - toString issue

Johan Granberg (11/11) Sep 29 2006 As a result of the discussion about char[] above I have been converting

Vladimir Kulev (2/4) Sep 30 2006 I agree, and the same about toHash. Naming consistency is the right thin...

Hasan Aljudy (4/9) Sep 30 2006 I totally disagree, what consistency are you talking about?

Johan Granberg (5/18) Sep 30 2006 The prefix is not important the name collision issue is. The problem is
Vladimir Kulev (6/9) Sep 30 2006 This methods are implied for all objects, so you can use them as well as

Sean Kelly (3/15) Sep 30 2006 How about toUtf8() for classes and structs :-)

Chris Nicholson-Sauls (5/25) Sep 30 2006 Gets my vote. Note that Mango classes typically already do this (with t...

Charlie (2/29) Oct 01 2006

Hasan Aljudy (43/48) Oct 01 2006 I think there's a fundamental problem with the way D deals with strings.

Derek Parnell (12/40) Oct 02 2006 foreach(int i, dchar c; text)

Hasan Aljudy (9/45) Oct 02 2006 I know, but that's still a work-around. What if you need to iterate back...

Oskar Linde (3/19) Oct 02 2006 see std.utf.decode and std.utf.stride.

Hasan Aljudy (5/26) Oct 02 2006 I have .. and I know the functions are all there. but hey, the C

Johan Granberg <lijat.meREM OVEgmail.com> writes:

As a result of the discussion about char[] above I have been converting 
some of my code from dchar[] to char[], but that reminded me of an issue 
i have with the current state of phobos. in object their is the method 
toString that happened to have the same name as the COMMONLY used 
function std.string.toString this causes objects toString to shadow 
std.strings to string inside class methods. I know that FQN can be used 
as a workaround but it makes the code unnecessary hard to read and I 
think that name clashes such as this should be avoided in the standard 
library.

PROPOSAL. change all methods in object to have some prefix for to string 
I suggest opString as the op prefix is already in use.

Sep 29 2006

Vladimir Kulev <me lightoze.net> writes:

Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.

I agree, and the same about toHash. Naming consistency is the right thing.

Sep 30 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Vladimir Kulev wrote:
 Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.

 
 I agree, and the same about toHash. Naming consistency is the right thing.

I totally disagree, what consistency are you talking about?
toString and toHash are *not* operators, so prefixing them with op is 
misleading and inconsistent.

Sep 30 2006

Johan Granberg <lijat.meREM OVEgmail.com> writes:

Hasan Aljudy wrote:
 
 
 Vladimir Kulev wrote:
 Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.

 I agree, and the same about toHash. Naming consistency is the right 
 thing.

 
 I totally disagree, what consistency are you talking about?
 toString and toHash are *not* operators, so prefixing them with op is 
 misleading and inconsistent.

The prefix is not important the name collision issue is. The problem is 
that two commonly used identifiers collide and the use of an op prefix 
is one way to solve that (and would open up fore making them operator if 
desired at some later time)

Sep 30 2006

Vladimir Kulev <me lightoze.net> writes:

Hasan Aljudy wrote:
 I totally disagree, what consistency are you talking about?
 toString and toHash are *not* operators, so prefixing them with op is
 misleading and inconsistent.

This methods are implied for all objects, so you can use them as well as
other unary operators like ~, excepting there are no special symbols for
them.

Anyway, Object.toString and std.string.toString collision should be
resolved, and renaming second one is also suitable for me.

Sep 30 2006

Sean Kelly <sean f4.ca> writes:

Johan Granberg wrote:
 As a result of the discussion about char[] above I have been converting 
 some of my code from dchar[] to char[], but that reminded me of an issue 
 i have with the current state of phobos. in object their is the method 
 toString that happened to have the same name as the COMMONLY used 
 function std.string.toString this causes objects toString to shadow 
 std.strings to string inside class methods. I know that FQN can be used 
 as a workaround but it makes the code unnecessary hard to read and I 
 think that name clashes such as this should be avoided in the standard 
 library.
 
 PROPOSAL. change all methods in object to have some prefix for to string 
 I suggest opString as the op prefix is already in use.

How about toUtf8() for classes and structs :-)


Sean

Sep 30 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Sean Kelly wrote:
 Johan Granberg wrote:
 
 As a result of the discussion about char[] above I have been 
 converting some of my code from dchar[] to char[], but that reminded 
 me of an issue i have with the current state of phobos. in object 
 their is the method toString that happened to have the same name as 
 the COMMONLY used function std.string.toString this causes objects 
 toString to shadow std.strings to string inside class methods. I know 
 that FQN can be used as a workaround but it makes the code unnecessary 
 hard to read and I think that name clashes such as this should be 
 avoided in the standard library.

 PROPOSAL. change all methods in object to have some prefix for to 
 string I suggest opString as the op prefix is already in use.

 
 
 How about toUtf8() for classes and structs :-)
 
 
 Sean

Gets my vote.  Note that Mango classes typically already do this (with toString
just 
calling toUtf8 in most cases), and provide toUtf16/toUtf32 counterparts.  It is
indeed 
effective.  :)

-- Chris Nicholson-Sauls

Sep 30 2006

Charlie <charlies nowhere.com> writes:

Gets my vote too , it's also more descriptive than 'toString' .

Chris Nicholson-Sauls wrote:
 Sean Kelly wrote:
 Johan Granberg wrote:

 As a result of the discussion about char[] above I have been 
 converting some of my code from dchar[] to char[], but that reminded 
 me of an issue i have with the current state of phobos. in object 
 their is the method toString that happened to have the same name as 
 the COMMONLY used function std.string.toString this causes objects 
 toString to shadow std.strings to string inside class methods. I know 
 that FQN can be used as a workaround but it makes the code 
 unnecessary hard to read and I think that name clashes such as this 
 should be avoided in the standard library.

 PROPOSAL. change all methods in object to have some prefix for to 
 string I suggest opString as the op prefix is already in use.


 How about toUtf8() for classes and structs :-)


 Sean

 
 Gets my vote.  Note that Mango classes typically already do this (with 
 toString just calling toUtf8 in most cases), and provide toUtf16/toUtf32 
 counterparts.  It is indeed effective.  :)
 
 -- Chris Nicholson-Sauls

Oct 01 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Sean Kelly wrote:
 
 How about toUtf8() for classes and structs :-)
 
 
 Sean

I think there's a fundamental problem with the way D deals with strings.
The spec claims that D natively supports strings through char[], at the 
same time, claims that D fully supports Unicode.
The fundamental issue is that UTF-8 is one encoding for Unicode strings, 
but it's not always the best choice. Phobos mostly only deals with 
char[], and mixing code that uses wchar[] with code that uses char[] 
isn't very straight forward.

Consider the simple case of reading a text file and detecting "words". 
To detect a word, you must first recognize letters, no .. not English 
letters; letters of any language, and for that purpose, we have 
isUniAlpha function. Now, If you encode the string as char[], then how 
are you gonna determine whether or not the next character is a Unicode 
alpha or not?

The following definitely shouldn't work:
//assuming text is char[]
for( int i = 0; i < text.length; i++ )
{
     bool isLetter = isUniAlpha( text[i] );
     ....
}

because isUniAlpha takes a dchar parameter, and of course, because a 
single char doesn't necessarily encode a Unicode character just by 
itself; if you're dealing with non-English text, then most likely a 
single char will only hold half the encoding for that letter.
Surprisingly, the compiler allows this kind of code, but that's not the 
point. The point is, this code will never work, because char[] is not a 
very good way to hold a Unicode string.
Of course there are ways around this, but they are still just "workarounds".

Should you choose wchar[] (or dchar[]) to represent strings, you will 
get into all kinds of troubles dealing with phobos. The standard library 
always deals with strings using char[], this includes std.string and 
std.regexp, and even the Exception class. So, if you're using wchar[] to 
represent strings, and you want to throw an exception, you can't just say:

because the compiler will complain (can't cast wchar[] to char[]), so 
you'll need toUtf8( myString ), and you're code can quickly become full 
of calls to toUtf* functions.

Personally, I think D needs a proper String class built into the 
language and the standard library.

or at least, casting between the different encodings should be seamless 
to the coder; just let the compiler call the appropriate toUtf* function 
  and allow implicit casting.

Oct 01 2006

Derek Parnell <derek nomail.afraid.org> writes:

On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:

 Sean Kelly wrote:
 
 How about toUtf8() for classes and structs :-)
 
 Sean

 
 I think there's a fundamental problem with the way D deals with strings.
 The spec claims that D natively supports strings through char[], at the 
 same time, claims that D fully supports Unicode.
 The fundamental issue is that UTF-8 is one encoding for Unicode strings, 
 but it's not always the best choice. Phobos mostly only deals with 
 char[], and mixing code that uses wchar[] with code that uses char[] 
 isn't very straight forward.
 
 Consider the simple case of reading a text file and detecting "words". 
 To detect a word, you must first recognize letters, no .. not English 
 letters; letters of any language, and for that purpose, we have 
 isUniAlpha function. Now, If you encode the string as char[], then how 
 are you gonna determine whether or not the next character is a Unicode 
 alpha or not?
 
 The following definitely shouldn't work:
 //assuming text is char[]
 for( int i = 0; i < text.length; i++ )
 {
      bool isLetter = isUniAlpha( text[i] );
      ....
 }

  foreach(int i, dchar c; text)
  {
       bool isLetter = isUniAlpha( c );
       ...
  }


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
2/10/2006 5:10:26 PM

Oct 02 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Derek Parnell wrote:
 On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:
 
 Sean Kelly wrote:
 How about toUtf8() for classes and structs :-)

 Sean

 I think there's a fundamental problem with the way D deals with strings.
 The spec claims that D natively supports strings through char[], at the 
 same time, claims that D fully supports Unicode.
 The fundamental issue is that UTF-8 is one encoding for Unicode strings, 
 but it's not always the best choice. Phobos mostly only deals with 
 char[], and mixing code that uses wchar[] with code that uses char[] 
 isn't very straight forward.

 Consider the simple case of reading a text file and detecting "words". 
 To detect a word, you must first recognize letters, no .. not English 
 letters; letters of any language, and for that purpose, we have 
 isUniAlpha function. Now, If you encode the string as char[], then how 
 are you gonna determine whether or not the next character is a Unicode 
 alpha or not?

 The following definitely shouldn't work:
 //assuming text is char[]
 for( int i = 0; i < text.length; i++ )
 {
      bool isLetter = isUniAlpha( text[i] );
      ....
 }

 
   foreach(int i, dchar c; text)
   {
        bool isLetter = isUniAlpha( c );
        ...
   }
 
 

I know, but that's still a work-around. What if you need to iterate back 
and forth? You're gonna need to convert it to dchar[] (or wchar[]).

However, that brings up a good point:
Notice how foreach allows to iterate a string by Unicode characters 
(a.k.a code-points)? Shouldn't this kind of iteration be supported 
outside of foreach as well?
Sure I know, you can write you're own String class and even an iterator, 
but that just proves that string support isn't really/fully built-in.

Oct 02 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Hasan Aljudy wrote:
 Derek Parnell wrote:
   foreach(int i, dchar c; text)
   {
        bool isLetter = isUniAlpha( c );
        ...
   }

 
 I know, but that's still a work-around. What if you need to iterate back 
 and forth? You're gonna need to convert it to dchar[] (or wchar[]).
 
 However, that brings up a good point:
 Notice how foreach allows to iterate a string by Unicode characters 
 (a.k.a code-points)? Shouldn't this kind of iteration be supported 
 outside of foreach as well?

see std.utf.decode and std.utf.stride.

/Oskar

Oct 02 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Oskar Linde wrote:
 Hasan Aljudy wrote:
 Derek Parnell wrote:
   foreach(int i, dchar c; text)
   {
        bool isLetter = isUniAlpha( c );
        ...
   }

 I know, but that's still a work-around. What if you need to iterate 
 back and forth? You're gonna need to convert it to dchar[] (or wchar[]).

 However, that brings up a good point:
 Notice how foreach allows to iterate a string by Unicode characters 
 (a.k.a code-points)? Shouldn't this kind of iteration be supported 
 outside of foreach as well?

 
 see std.utf.decode and std.utf.stride.
 
 /Oskar

I have .. and I know the functions are all there. but hey, the C 
standard library also has all sorts of string processing functions.

I'm talking about the "built-in" string type, which doesn't really 
exist, even though the spec claims it does.

Oct 02 2006

D Programming

C/C++ Programming

Other

digitalmars.D - toString issue