www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Whitespace for Walter

reply Arcane Jill <Arcane_member pathlink.com> writes:
Another mad suggestion coming up, but this one might actually make some sort of
sense.

Unicode whitespace is defined as any of the following characters, and no other:

0009..000D    <control-0009>..<control-000D>
0020          SPACE
0085          <control-0085>
00A0          NO-BREAK SPACE
1680          OGHAM SPACE MARK
180E          MONGOLIAN VOWEL SEPARATOR
2000..200A    EN QUAD..HAIR SPACE
2028          LINE SEPARATOR
2029          PARAGRAPH SEPARATOR
202F          NARROW NO-BREAK SPACE
205F          MEDIUM MATHEMATICAL SPACE
3000          IDEOGRAPHIC SPACE

How straightforward would it be to allow the DMD compiler to accept /precisely/
this list as whitespace in a D source file?

Java got itself into a bit of a pickle by defining whitespace differently from
Unicode. They ended up having to have two separate functions (which from memory
I think are called isWhitespace() and isJavaWhitespace(), but I could be wrong).

It would be quite cool to have D whitespace and Unicode whitespace as one and
the same thing, don't you think?

Arcane Jill

PS. I /don't/ reccommend changing the value of const char[] whitespace; in
std.string, however. To do so would set an AWFUL precedent which const char
letters would NOT want to follow. You might, however, consider renaming those
constants to ASCII_WHITESPACE, ASCII_LETTERS, etc., once the new Unicode stuff
is up.
Jun 26 2004
next sibling parent "Phill" <phill pacific.net.au> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cbkqcc$usj$1 digitaldaemon.com...
 Another mad suggestion coming up, but this one might actually make some
sort of
 sense.

 Unicode whitespace is defined as any of the following characters, and no
other:
 0009..000D    <control-0009>..<control-000D>
 0020          SPACE
 0085          <control-0085>
 00A0          NO-BREAK SPACE
 1680          OGHAM SPACE MARK
 180E          MONGOLIAN VOWEL SEPARATOR
 2000..200A    EN QUAD..HAIR SPACE
 2028          LINE SEPARATOR
 2029          PARAGRAPH SEPARATOR
 202F          NARROW NO-BREAK SPACE
 205F          MEDIUM MATHEMATICAL SPACE
 3000          IDEOGRAPHIC SPACE

 How straightforward would it be to allow the DMD compiler to accept
/precisely/
 this list as whitespace in a D source file?

 Java got itself into a bit of a pickle by defining whitespace differently
from
 Unicode. They ended up having to have two separate functions (which from
memory
 I think are called isWhitespace() and isJavaWhitespace(), but I could be
wrong).

In Java:
Character.isSpace(char c)
 is deprecated and replaced by
Character.isWhiteSpace(char c)
Also
Character.isSpace(char ch)
for the Unicode space char.

There is no "isJavaWhitespace()" or
"Whitespace()"

There is Character.isJavaLetterOrDigit(char c) which
is deprecated, maybe you were confused with this.

Phill.
Jun 26 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
I think it's a good idea.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cbkqcc$usj$1 digitaldaemon.com...
 Another mad suggestion coming up, but this one might actually make some
sort of
 sense.

 Unicode whitespace is defined as any of the following characters, and no
other:
 0009..000D    <control-0009>..<control-000D>
 0020          SPACE
 0085          <control-0085>
 00A0          NO-BREAK SPACE
 1680          OGHAM SPACE MARK
 180E          MONGOLIAN VOWEL SEPARATOR
 2000..200A    EN QUAD..HAIR SPACE
 2028          LINE SEPARATOR
 2029          PARAGRAPH SEPARATOR
 202F          NARROW NO-BREAK SPACE
 205F          MEDIUM MATHEMATICAL SPACE
 3000          IDEOGRAPHIC SPACE

 How straightforward would it be to allow the DMD compiler to accept
/precisely/
 this list as whitespace in a D source file?

 Java got itself into a bit of a pickle by defining whitespace differently
from
 Unicode. They ended up having to have two separate functions (which from
memory
 I think are called isWhitespace() and isJavaWhitespace(), but I could be
wrong).
 It would be quite cool to have D whitespace and Unicode whitespace as one
and
 the same thing, don't you think?

 Arcane Jill

 PS. I /don't/ reccommend changing the value of const char[] whitespace; in
 std.string, however. To do so would set an AWFUL precedent which const
char
 letters would NOT want to follow. You might, however, consider renaming
those
 constants to ASCII_WHITESPACE, ASCII_LETTERS, etc., once the new Unicode
stuff
 is up.
Jun 26 2004