www.digitalmars.com         C & C++   DMDScript  

D - drop ASCII characters from D?

reply "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
Walter's comment in the "Delegates" thread about the code

    void foo(char[]);
    void foo(wchar[]);
    ...
    foo("hello");
being ambiguous made me wonder about the point of supporting ASCII
characters in the first place.



Why not drop "wchar" and make "char" always means a 2-byte UNICODE character
(or even a 4-byte ISO10646 character).  With the release of Windows XP last
fall, the need for ASCII support is going to diminish as WinXP replaces
Win98/WinME.



If you really need a single byte character to interface with legacy APIs,
use "ubyte" (or "ulong" for "wchar_t" APIs on *IX) convert to/from "char" as
needed.  Yea, that makes such code more difficult, but it should all be
burried in some class anyway.



It seems that supporting both in D goes against current trends (Java, VB and
C# are all UNICODE-only); it also implicitly encourages the continued use of
ASCII which is a decision that is usually regretted in many real-world
applications.



   Dan
Apr 05 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
news:a8kprh$4h2$1 digitaldaemon.com...

 Walter's comment in the "Delegates" thread about the code

     void foo(char[]);
     void foo(wchar[]);
     ...
     foo("hello");
 being ambiguous made me wonder about the point of supporting ASCII
 characters in the first place.



 Why not drop "wchar" and make "char" always means a 2-byte UNICODE

 (or even a 4-byte ISO10646 character).  With the release of Windows XP

 fall, the need for ASCII support is going to diminish as WinXP replaces
 Win98/WinME.

I believe no more than 10% of my friends have WinNT, 2K or XP. Your suggestion would make it very hard to write programs that run on 9x series, which is still most popular.
 If you really need a single byte character to interface with legacy APIs,
 use "ubyte" (or "ulong" for "wchar_t" APIs on *IX) convert to/from "char"

 needed.  Yea, that makes such code more difficult, but it should all be
 burried in some class anyway.

You can convert a single char; but what about strings? D doesn't convert arrays, AFAIK...
 It seems that supporting both in D goes against current trends (Java, VB

 C# are all UNICODE-only); it also implicitly encourages the continued use

VB is bloated, partially because of UNICODE-only strings. Java is platform-independent and doesn't really care of underlying system. C# is Microsoft's reply to Java, and is bloated as well. D is a practical tool. Since most systems and most programs today still work with ASCII strings, they should be in the language.
Apr 05 2002
parent reply "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
Today there are still a lot of Win9x/WinME boxes out there, but that's not
going to be the case for long.  I don't know what Walter's timeline is for
officially releasing D to the world, but let's just say it's 1-Jan-2003.
Add another year onto that for people to actually start adopting the
language and developing/shipping programs en-masse and we're to 2004.  I
think the Win9x/WinME numbers will look a lot different in 18+ months.  I
don't think it's much of a stretch to say that in the not too distant
future, ASCII will largely be considered legacy.

If you don't want to drop ASCII support completely from D, how about making
it (much) more difficult to use by making UNICODE the default? "char" is
UNICODE, "achar" is ASCII; a string/character literal is UNICODE, you have
to use an ugly A prefix to get ASCII; there are no implicit conversions
between UNICODE/ASCII - you've got to call some library routine (or maybe
cast) instead.

D is a new language, it should look to the future.

   Dan

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a8kttp$ued$1 digitaldaemon.com...
 "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
 news:a8kprh$4h2$1 digitaldaemon.com...

 Walter's comment in the "Delegates" thread about the code

     void foo(char[]);
     void foo(wchar[]);
     ...
     foo("hello");
 being ambiguous made me wonder about the point of supporting ASCII
 characters in the first place.



 Why not drop "wchar" and make "char" always means a 2-byte UNICODE

 (or even a 4-byte ISO10646 character).  With the release of Windows XP

 fall, the need for ASCII support is going to diminish as WinXP replaces
 Win98/WinME.

I believe no more than 10% of my friends have WinNT, 2K or XP. Your suggestion would make it very hard to write programs that run on 9x series, which is still most popular.
 If you really need a single byte character to interface with legacy


 use "ubyte" (or "ulong" for "wchar_t" APIs on *IX) convert to/from


 as
 needed.  Yea, that makes such code more difficult, but it should all be
 burried in some class anyway.

You can convert a single char; but what about strings? D doesn't convert arrays, AFAIK...
 It seems that supporting both in D goes against current trends (Java, VB

 C# are all UNICODE-only); it also implicitly encourages the continued


 of

 VB is bloated, partially because of UNICODE-only strings. Java is
 platform-independent and doesn't really care of underlying system.
 C# is Microsoft's reply to Java, and is bloated as well.

 D is a practical tool. Since most systems and most programs today still
 work with ASCII strings, they should be in the language.

Apr 05 2002
next sibling parent reply roland <nancyetroland free.fr> writes:
"J. Daniel Smith" a écrit :

 Today there are still a lot of Win9x/WinME boxes out there, but that's not
 going to be the case for long.  I don't know what Walter's timeline is for
 officially releasing D to the world, but let's just say it's 1-Jan-2003.
 Add another year onto that for people to actually start adopting the
 language and developing/shipping programs en-masse and we're to 2004.  I
 think the Win9x/WinME numbers will look a lot different in 18+ months.  I
 don't think it's much of a stretch to say that in the not too distant
 future, ASCII will largely be considered legacy.

 If you don't want to drop ASCII support completely from D, how about making
 it (much) more difficult to use by making UNICODE the default? "char" is
 UNICODE, "achar" is ASCII; a string/character literal is UNICODE, you have
 to use an ugly A prefix to get ASCII; there are no implicit conversions
 between UNICODE/ASCII - you've got to call some library routine (or maybe
 cast) instead.

 D is a new language, it should look to the future.

    Dan

 "Pavel Minayev" <evilone omen.ru> wrote in message
 news:a8kttp$ued$1 digitaldaemon.com...
 "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
 news:a8kprh$4h2$1 digitaldaemon.com...

 Walter's comment in the "Delegates" thread about the code

     void foo(char[]);
     void foo(wchar[]);
     ...
     foo("hello");
 being ambiguous made me wonder about the point of supporting ASCII
 characters in the first place.



 Why not drop "wchar" and make "char" always means a 2-byte UNICODE

 (or even a 4-byte ISO10646 character).  With the release of Windows XP

 fall, the need for ASCII support is going to diminish as WinXP replaces
 Win98/WinME.

I believe no more than 10% of my friends have WinNT, 2K or XP. Your suggestion would make it very hard to write programs that run on 9x series, which is still most popular.
 If you really need a single byte character to interface with legacy


 use "ubyte" (or "ulong" for "wchar_t" APIs on *IX) convert to/from


 as
 needed.  Yea, that makes such code more difficult, but it should all be
 burried in some class anyway.

You can convert a single char; but what about strings? D doesn't convert arrays, AFAIK...
 It seems that supporting both in D goes against current trends (Java, VB

 C# are all UNICODE-only); it also implicitly encourages the continued


 of

 VB is bloated, partially because of UNICODE-only strings. Java is
 platform-independent and doesn't really care of underlying system.
 C# is Microsoft's reply to Java, and is bloated as well.

 D is a practical tool. Since most systems and most programs today still
 work with ASCII strings, they should be in the language.


how is linux concerning caractere size ? i personaly see my future rather near linux than XP roland
Apr 05 2002
parent reply "Walter" <walter digitalmars.com> writes:
"roland" <nancyetroland free.fr> wrote in message
news:3CAE2667.1375F510 free.fr...
 how is linux concerning caractere size ?
 i personaly see my future rather near linux than XP

Linux uses 4 byte wchars. This uses up memory real fast. Unicode may be the future, but it is still many years away, and D should be agnostic about whether the app is ASCII or Unicode.
Apr 05 2002
next sibling parent reply "OddesE" <OddesE_XYZ hotmail.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a8lbqb$1d7s$1 digitaldaemon.com...
 "roland" <nancyetroland free.fr> wrote in message
 news:3CAE2667.1375F510 free.fr...
 how is linux concerning caractere size ?
 i personaly see my future rather near linux than XP

Linux uses 4 byte wchars. This uses up memory real fast. Unicode may be

 future, but it is still many years away, and D should be agnostic about
 whether the app is ASCII or Unicode.

I think memory shouldn't be a concern. I don't think text, as in characters and strings of characters, is the real memory user in today's computing is it? A typical e-mail or document isn't big at all. It's things like images, textures in games and audio and video that consume most space on disk or in memory. I agree that ASCII, although it isn't dead, deserves to die. I think the idea to make 32-bit characters the standard is a good one, although I have to admit I don't know much about the standardisation that is going on in that field. But I do know that 256 characters is way too little! maybe it is just too early to make a decision, when the standardisation hasn't settled down... -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
Apr 06 2002
parent "Walter" <walter digitalmars.com> writes:
"OddesE" <OddesE_XYZ hotmail.com> wrote in message
news:a8n3tp$18pi$1 digitaldaemon.com...
 I think memory shouldn't be a concern. I
 don't think text, as in characters and
 strings of characters, is the real memory
 user in today's computing is it? A
 typical e-mail or document isn't big at
 all. It's things like images, textures in
 games and audio and video that consume
 most space on disk or in memory.

It still is a concern. I have an app on linux with wchars, and it still uses 200 megs of ram, mostly because of the 4 bytes per char. Secondly, if you're distributing an executable with a lot of text strings in it, it can bloat up the download size quite a bit. I can also neatly fit all my source code on a CD. I don't want it 4 times bigger <g>.
 I agree that ASCII, although it isn't
 dead, deserves to die. I think the idea
 to make 32-bit characters the standard
 is a good one, although I have to admit
 I don't know much about the standardisation
 that is going on in that field.
 But I do know that 256 characters is way
 too little!
 maybe it is just too early to make a
 decision, when the standardisation
 hasn't settled down...

Another huge reason to support ASCII in D is because D is meant to interface with C functions. C apps are nearly all written to use ASCII. ASCII support isn't going away anytime soon in operating systems, so D must support it easilly.
Apr 06 2002
prev sibling parent reply "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
So what about my suggestion of making ASCII a bit more difficult to use -
that is, Unicode is the prefered/default character type in D.  'char' is a
Unicode character and "abc" is a Unicode string.

With the release of Windows XP, it's not going to be very long (months, not
years) before a Unicode-enabled platform is the norm for most people.

And I'm not sure I buy the "memory" argument - my PocketPC which is easily
more memory constrainted than any desktop PC only supports Unicode.

   Dan

"Walter" <walter digitalmars.com> wrote in message
news:a8lbqb$1d7s$1 digitaldaemon.com...
 "roland" <nancyetroland free.fr> wrote in message
 news:3CAE2667.1375F510 free.fr...
 how is linux concerning caractere size ?
 i personaly see my future rather near linux than XP

Linux uses 4 byte wchars. This uses up memory real fast. Unicode may be

 future, but it is still many years away, and D should be agnostic about
 whether the app is ASCII or Unicode.

Apr 08 2002
parent reply "Walter" <walter digitalmars.com> writes:
"J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
news:a8s2ng$1jg0$1 digitaldaemon.com...
 So what about my suggestion of making ASCII a bit more difficult to use -
 that is, Unicode is the prefered/default character type in D.  'char' is a
 Unicode character and "abc" is a Unicode string.

There is no default char type in D. char is ascii, wchar is unicode. The type of "abc" depends on context. The source text can be ascii or unicode (try it!).
 With the release of Windows XP, it's not going to be very long (months,

 years) before a Unicode-enabled platform is the norm for most people.

All win32 platforms support unicode already.
 And I'm not sure I buy the "memory" argument - my PocketPC which is easily
 more memory constrainted than any desktop PC only supports Unicode.

You can shrink down the memory for unicode quite a bit by using UTF8, at the expense of slowing things down.
Apr 08 2002
parent reply "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> writes:
But then aren't we full-circle back to
    void foo(char[]);
    void foo(wchar[]);
    ...
    foo("hello");
being ambiguous?  Although it sounds like you can (almost?) support both
Unicode and ASCII transparently, I'd still like to see Unicode implicitly
given more emphasis; for example, the above code snipet would be
    void foo(achar[]);    // ASCII
    void foo(char[]);    // Unicode
    ...
    foo("hello");    // Unicode string - calls foo(char[])
    foo(A"hello");    // ASCII string - calls foo(achar[])

As far as Win32 platforms go, I guess it depends on what one means by
"supporting Unicode."  Only a small handful of Win32 APIs are Unicode on
Win9x (although the recently released MSLU expands that list considerably).

Dell has a complete 1.8Ghz system with 256MB of RAM for $999; given numbers
like that, I'm not overly concerned with either processing power or memory.

   Dan

"Walter" <walter digitalmars.com> wrote in message
news:a8t2j5$1mc$1 digitaldaemon.com...
 "J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
 news:a8s2ng$1jg0$1 digitaldaemon.com...
 So what about my suggestion of making ASCII a bit more difficult to


 that is, Unicode is the prefered/default character type in D.  'char' is


 Unicode character and "abc" is a Unicode string.

There is no default char type in D. char is ascii, wchar is unicode. The type of "abc" depends on context. The source text can be ascii or unicode (try it!).
 With the release of Windows XP, it's not going to be very long (months,

 years) before a Unicode-enabled platform is the norm for most people.

All win32 platforms support unicode already.
 And I'm not sure I buy the "memory" argument - my PocketPC which is


 more memory constrainted than any desktop PC only supports Unicode.

You can shrink down the memory for unicode quite a bit by using UTF8, at

 expense of slowing things down.

Apr 09 2002
parent "Walter" <walter digitalmars.com> writes:
"J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
news:a8und0$3e7$1 digitaldaemon.com...
 But then aren't we full-circle back to
     void foo(char[]);
     void foo(wchar[]);
     ...
     foo("hello");
 being ambiguous?

Yes, but that doesn't in any way impede a programmer who wants to write a full unicode app.
Apr 09 2002
prev sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"J. Daniel Smith" <j_daniel_smith HoTMaiL.com> wrote in message
news:a8l5o5$17di$1 digitaldaemon.com...

 D is a new language, it should look to the future.

Unicode is not necessary future. Not in the few next years, at least. And after all, you can always use alias in your programs: alias wchar Char;
Apr 05 2002