digitalmars.D - identifiers & "unialpha"

Thomas Kuehne (25/25) Sep 22 2006 -----BEGIN PGP SIGNED MESSAGE-----

Sean Kelly (7/30) Sep 22 2006 Agreed. Incidentally, the 2003 revision to the C++ standard ("ISO/IEC

Thomas Kuehne (46/73) Sep 22 2006 -----BEGIN PGP SIGNED MESSAGE-----

Walter Bright (10/33) Sep 22 2006 I'd like to leave things as they are for 1.0. I don't think that

Pragma (2/4) Sep 22 2006 Just a guess: "Chinese, Japanese & Korean"?

nobody (3/9) Sep 22 2006 Your guess is correct. Wikipedia does a great job explaining CJK:

Thomas Kuehne (22/53) Sep 22 2006 -----BEGIN PGP SIGNED MESSAGE-----

Walter Bright (8/33) Sep 22 2006 I guess I don't see why C99 would say . is a valid identifier character

Sean Kelly (22/47) Sep 22 2006 No, there are other differences as well. I think C99 was simply
Thomas Kuehne (88/110) Sep 22 2006 -----BEGIN PGP SIGNED MESSAGE-----
Kevin Bealer (18/38) Sep 23 2006 I think the big-alphabet languages tend to coin new letters somewhat

Kristian (3/39) Sep 23 2006 If that's the case, I'm very sorry to hear that! :(

Sean Kelly (7/18) Sep 23 2006 This is completely off-topic, but if you're interested in learning a bit...

Kristian (2/19) Sep 25 2006 Thanks for the tip.

Kevin Bealer (10/34) Sep 26 2006 Yes - I really enjoyed that movie.

Thomas Kuehne (14/18) Sep 26 2006 -----BEGIN PGP SIGNED MESSAGE-----

Thomas Kuehne (21/30) Sep 22 2006 -----BEGIN PGP SIGNED MESSAGE-----

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

http://www.digitalmars.com/d/lex.html#identifier





Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
"universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
"universal alpha".

Sample:
\u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
allowed by Appendix D in identifiers.

"ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
"ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
drop the redirection via "Appendix D" and use
"ISO/IEC TR 10176 (current)" instead of the dated version
"ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
chunk of CJK and Math characters that can be found in the current version.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFE/7wLK5blCcjpWoRAmkaAKCrkQoYh52hH1EO97xUMU4iQaJaywCgiR6E
tE8uxEORDcyK2epapicDHHY=
=Oop9
-----END PGP SIGNATURE-----

Sep 22 2006

Sean Kelly <sean f4.ca> writes:

Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 http://www.digitalmars.com/d/lex.html#identifier




 
 Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
 "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
 "universal alpha".
 
 Sample:
 \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
 allowed by Appendix D in identifiers.
 
 "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
 "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
 drop the redirection via "Appendix D" and use
 "ISO/IEC TR 10176 (current)" instead of the dated version
 "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
 chunk of CJK and Math characters that can be found in the current version.

Agreed.  Incidentally, the 2003 revision to the C++ standard ("ISO/IEC 
14882:2003(E)"), Appendix E, contains a revised copy of the character 
table (which is likely from "ISO/IEC TR 10176:2003") and appears to have 
done away with the "special characters" section entirely.  So I suspect 
your suggestion would eliminate the problem you mention above as well?


Sean

Sep 22 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Kelly schrieb am 2006-09-22:
 Thomas Kuehne wrote:
 
 http://www.digitalmars.com/d/lex.html#identifier




 
 Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
 "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
 "universal alpha".
 
 Sample:
 \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
 allowed by Appendix D in identifiers.
 
 "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
 "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
 drop the redirection via "Appendix D" and use
 "ISO/IEC TR 10176 (current)" instead of the dated version
 "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
 chunk of CJK and Math characters that can be found in the current version.

 Agreed.  Incidentally, the 2003 revision to the C++ standard ("ISO/IEC 
 14882:2003(E)"), Appendix E, contains a revised copy of the character 
 table (which is likely from "ISO/IEC TR 10176:2003") and appears to have 
 done away with the "special characters" section entirely.  So I suspect 
 your suggestion would eliminate the problem you mention above as well?

Yes. How about this rewrite:






















Accessing ISO standarts can be complicated. Here are the crossreferences 
for Unicode's UnicodeData.txt. For the relation between Unicode and 
ISO10176 see 
http://en.wikipedia.org/wiki/ISO/IEC_10646#Differences_between_ISO_10646_and_Unicode

Letters:
	Uppercase_Letter (Lu)
	Lowercase_Letter (Ll)
	Titlecase_Letter (Lt)
	Modifier_Letter (Lm)
	Other_Letter (Lo)

NonspacingMarks:
	Nonspacing_Mark (Mn)

Numbers:
	Decimal_Number (Nd)
	Letter_Number (Nl)
	Other_Number (No)

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFFB8/LK5blCcjpWoRAnMPAJsEaehF35W70k8S+BXbSSHXOeum8wCfR1UU
XeNEnZrWU8TYWSfzikQPm/8=
=n9aW
-----END PGP SIGNATURE-----

Sep 22 2006

Walter Bright <newshound digitalmars.com> writes:

Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 http://www.digitalmars.com/d/lex.html#identifier




 
 Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
 "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
 "universal alpha".
 
 Sample:
 \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
 allowed by Appendix D in identifiers.
 
 "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
 "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
 drop the redirection via "Appendix D" and use
 "ISO/IEC TR 10176 (current)" instead of the dated version
 "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
 chunk of CJK and Math characters that can be found in the current version.

I'd like to leave things as they are for 1.0. I don't think that 
anyone's code will be adversely affected by not having the latest alpha 
character additions to identifiers, and I also don't think math 
characters should be part of identifiers. What is CJK?

As it is now, it matches standard C's definition of identifiers, which 
is the intent of the reference. I haven't checked, but I think it 
matches Java's idea of an identifier character, too.

P.S. It also bugs me that the unicode people can't seem to make up their 
minds. Do character sets really need to change every 2 or 3 years?

Sep 22 2006

Pragma <ericanderton yahoo.removeme.com> writes:

 Thomas Kuehne wrote:
 What is CJK?


Just a guess: "Chinese, Japanese & Korean"?

- Eric

Sep 22 2006

nobody <nobody mailinator.com> writes:

Pragma wrote:
 Thomas Kuehne wrote:
 What is CJK?


 
 Just a guess: "Chinese, Japanese & Korean"?
 
 - Eric

Your guess is correct. Wikipedia does a great job explaining CJK:

http://en.wikipedia.org/wiki/CJK

Sep 22 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Walter Bright schrieb am 2006-09-22:
 Thomas Kuehne wrote:
 
 http://www.digitalmars.com/d/lex.html#identifier




 
 Why is D referencing "ISO/IEC 9899:1999 (E) Appendix D" for defining
 "universal alpha"? "ISO/IEC 9899:1999 (E) Appendix D" isn't listing
 "universal alpha".
 
 Sample:
 \u00B7 (MIDDLE DOT, Other_Punctuation) isn't an "universal alpha" but
 allowed by Appendix D in identifiers.
 
 "ISO/IEC 9899:1999 (E) Appendix D" itself is referencing
 "ISO/IEC TR 10176:1998" for the character data. I strongly suggest to
 drop the redirection via "Appendix D" and use
 "ISO/IEC TR 10176 (current)" instead of the dated version
 "ISO/IEC TR 10176:1998". The 1998 version didn't yet include quite a
 chunk of CJK and Math characters that can be found in the current version.

 I'd like to leave things as they are for 1.0. I don't think that 
 anyone's code will be adversely affected by not having the latest alpha 
 character additions to identifiers, and I also don't think math 
 characters should be part of identifiers. What is CJK?

CJK: Chinese, Japanese & Korean
0x20000 .. 0x2A6D6 CJK Ideograph Extension B
0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

 As it is now, it matches standard C's definition of identifiers, which 
 is the intent of the reference. I haven't checked, but I think it 
 matches Java's idea of an identifier character, too.

ISO/IEC 9899:1999 (E) Appendix D



Whereas Appendix D defines valid characters in identifiers, D uses it
as a source for "universal alpha". As a consequence std.uni.isUniAlpha
claims that \u00B7 (MIDDLE DOT) is a letter...

 P.S. It also bugs me that the unicode people can't seem to make up their 
 minds. Do character sets really need to change every 2 or 3 years?

Task at hand: Create a table of all characters used by humans all over
the world and minimize friction due to political issues
(e.g. characters' names). Except for bug fixes (typos...) the unicode people
usually only extend previous versions of the standard.

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFFGNBLK5blCcjpWoRAh+mAJ9k2lTcyhSiNjFsVRtCtiDhbCVdQwCdHiKE
LTtcD8IPwAUsHWoJMMXm+70=
=wNTb
-----END PGP SIGNATURE-----

Sep 22 2006

Walter Bright <newshound digitalmars.com> writes:

Thomas Kuehne wrote:
 Walter Bright schrieb am 2006-09-22:
 What is CJK?

 
 CJK: Chinese, Japanese & Korean
 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

Thank-you.

 As it is now, it matches standard C's definition of identifiers, which 
 is the intent of the reference. I haven't checked, but I think it 
 matches Java's idea of an identifier character, too.

 
 ISO/IEC 9899:1999 (E) Appendix D


 
 Whereas Appendix D defines valid characters in identifiers, D uses it
 as a source for "universal alpha". As a consequence std.uni.isUniAlpha
 claims that \u00B7 (MIDDLE DOT) is a letter...

I guess I don't see why C99 would say . is a valid identifier character 
if it isn't an alpha. It's all confusing to me, and I think needlessly 
complicated. Is \u00B7 the only difference?

 
 P.S. It also bugs me that the unicode people can't seem to make up their 
 minds. Do character sets really need to change every 2 or 3 years?

 
 Task at hand: Create a table of all characters used by humans all over
 the world and minimize friction due to political issues
 (e.g. characters' names). Except for bug fixes (typos...) the unicode people
 usually only extend previous versions of the standard.

Chinese, Japanese, and Korean are hardly obscure so I don't see why the 
character sets for them seem to need large numbers of additions this 
late in the game.

Sep 22 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 Thomas Kuehne wrote:
 ISO/IEC 9899:1999 (E) Appendix D



 Whereas Appendix D defines valid characters in identifiers, D uses it
 as a source for "universal alpha". As a consequence std.uni.isUniAlpha
 claims that \u00B7 (MIDDLE DOT) is a letter...

 
 I guess I don't see why C99 would say . is a valid identifier character 
 if it isn't an alpha. It's all confusing to me, and I think needlessly 
 complicated. Is \u00B7 the only difference?

No, there are other differences as well.  I think C99 was simply 
referring to the latest version of the document available in 1999, and 
it has since been revised (in 2003, apparently).  But I have no idea why 
characters present in the 1999 doc are not present in the 2003 doc.  To 
pass the buck even further, "ISO/IEC TR 10176:2003" Annex A says the 
following:

     This list comprises the letters (combining or not), syllables, and
     ideographs from ISO/IEC 10646-1, together with the modifier letters
     and marks conventionally used as parts of words.

So their list of characters is copied from the Unicode standard (ISO/IEC 
10646).  I can only conclude that the Unicode standard changed between 
1999-2003 and ISO/IEC 10176 simply incorporated the new list.  But who 
knows why the list was changed.

This does raise an interesting point however.  Since the C and C++ 
standards separately refer to SO/IEC 10176 for their character list, the 
identifiers a compliant C99 and C++2003 compiler should accept are 
different.  This seems contrary to the usual C++ practice of deferring 
to the C standard on semantic issues.

 P.S. It also bugs me that the unicode people can't seem to make up 
 their minds. Do character sets really need to change every 2 or 3 years?

 Task at hand: Create a table of all characters used by humans all over
 the world and minimize friction due to political issues
 (e.g. characters' names). Except for bug fixes (typos...) the unicode 
 people
 usually only extend previous versions of the standard.

 
 Chinese, Japanese, and Korean are hardly obscure so I don't see why the 
 character sets for them seem to need large numbers of additions this 
 late in the game.

Me either.  But then I'm not terribly inclined to read the Unicode 
standards committee minutes to find out either :-)


Sean

Sep 22 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Walter Bright schrieb am 2006-09-22:
 Thomas Kuehne wrote:
 Walter Bright schrieb am 2006-09-22:
 What is CJK?

 
 CJK: Chinese, Japanese & Korean
 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

 Thank-you.

 As it is now, it matches standard C's definition of identifiers, which 
 is the intent of the reference. I haven't checked, but I think it 
 matches Java's idea of an identifier character, too.

 
 ISO/IEC 9899:1999 (E) Appendix D


 
 Whereas Appendix D defines valid characters in identifiers, D uses it
 as a source for "universal alpha". As a consequence std.uni.isUniAlpha
 claims that \u00B7 (MIDDLE DOT) is a letter...

 I guess I don't see why C99 would say . is a valid identifier character 
 if it isn't an alpha. It's all confusing to me, and I think needlessly 
 complicated. Is \u00B7 the only difference?

No, see attachment.
Format: "[first_in_range, last_in_range],"

Thomas

begin 644 isalpha.zip


M+G%JI/$E("(JYE]_/_\$]^>_SS^Q_>_/?_XZG^E^9OUL[??9]+/'WV>/^GE_
M[?KK>'Z?,^%SEM_GUL]]/]_A\/V.=,$D6%?_78% "5 `/M\!?9X*RK5Q&H(B
MH!$,`8/ &O5-A_7-7S`XRKQ^^*W3]#O_0"BJ$LKU(SH/$+U$,:H?,58!58$$
M-I:L0$:)2SV-6\!^=2G>JY(\UR9<B10(TIU+2EM!%I6ICJ49!>A<TA*P$H'8

MP&7(XMAI`.IS):K7(%<_!>BP-=YA3T-P\Z,F';;F?D%6UVNY`:I,F%KN]&NE
MC7H3IK:B0'9)[93HKX"7(`F IUT\[?1TW&6HS,(Z;^+628DI?BRZ+ E3MX+V
M7-?;P]W[7)7&A6KNNMY>=;V]LL$])?PKP!.(49\(B "=2Y-<;W$0+`$ZEY;%





MVRFSG9S<3+A):50F-ROOUBHJ/#YG]P(ZP12 KD]<ST-MK.>FU&+B+DG<Q: O
MN7!6H$H1E?(2!`&!(`LH!$U`(Y "U+%5Q4;-!%6`)MUJ,FRC8RT*B`0R;*,-
M6=O5.H$,R_-CR?FQN/JK0X*.2<36X.2D9%D60CFD%P_I_5R5TRB0$.Z0":J4

M;D'M)9+=[]RFU`O%-Q2BT(`F$5Q]63*=; $RQ0C%:(H)BL6D"J2*255(<;E/
M]P6*AC!B7T0#BL,4!Q0MA.^`XC3OI=QUGOO\=%^ :`A2++B=E]+HM)L(B^:W
M9IX+LEE=2ERA4_8)RG0B%4CQ1'<)MA(+*Y>Q0IEGU.DN08XIEU](>4,!*!K"
M<F3S*R?8RHP$RC^7N=E<GEG0S(8&$&T5.%$RI0IL%9[CKF)"U5$1E9^K+$%=
MS4`L0UR%K=J8OG5(".O B`WFFV. 42<XN^1=#T"!&WE"<78N[0):E5(+*[1L

M&=4MU>YIAZ$%Q)VVL:WV,,4!Q6&*.*PVGS2G"ZEIKBY(+5/<,&][:&\H\I1[

M4`$JAC#B^[$E$W*L+TYW`'&.+L"O8%)A`2TBJ3I/2[^&Y.HY)2DU/%`TE(!X


MK!-H$S5(\<+WN/!/2RD')\[E1 0G7*&K3D+H7<]$&\AL!5FT,'G7!AQ\=;-"




MOB7X!.1Q^^L70&^E\^.'POF!&S!\H);G3_S`HO!3CGMUR;MET"L,C\'P*JP&

MMT)S22N\I[KOTP:P/1:0YA0Z_X%18?Y`/*#LSZ[?XP7UOT^,_-2`S!"#P9 `
M^6_)K[\4;H/M!;2'T^E'A>D#J\*/S:8V^;_$K]\5] ^< +;$IX^!HBW<Z6>%
M[0.A'IU-,[JD<'P 7(K>7(J^*[3013P93N=C,ZC-D#ZP`?+?VE\?H4O-G$]X

MS"BW#VP
MR8#F.L4$``#I
MESO6Z"8,A.MD%5G`+8QYV"HE`9O(R3[N\N.?04.:5)CO"`$:)/`??U^_2_KU
MU_6[ZC^__OSCZ[;5?:*K^M/5Z%K]Z5J-[K.Z3W27L86Q7S_=WG:WOS_=&=VY
MNFF&KS37Z#N%M[NN\;>^`0S`#C

M>.5R N6CE!F PJ+2HL+BC:476;,4CP`5?P!>` . #VRNC



M<2Q%UN$7YJV B(C1PFX`KL,P+;456R=(>`H%IU!X" 6G4)R;0P))/P`+&USZ
MP"PSSKI>:QU?0U`!6/ZNY4-3+%W3VIS>L0Z]42$S+3*<YDP`I[D1O`!"L':K
MA>LH#P!G*080FU/DBS*1%8G\-0$ I5)*?<H"3\14H:U26X6V^AI!!^"T$%N%
M(8382K%UWQ 46R&V&F?!):(46U&UU:-<*,16IP7$5HJM2*"O"8#BJ.,`;'_$


MSYD6 A>`/A1#SI-AOQFHOB'5C47:')MCJILCZE3?!GQ06X>V3FT=VCJU==R$
M3FT]K>T[M75HZY32D;=.*1UIZB5VZZ4!T$>!#TKIN"R<RCF4<RKG4,ZIG$,Y
M9X%U*.=4SJ&<,^<<.>=,,4>*.8/L"+*SGCJ"[`RR(\C.(#O>'\[WAR/%?'`O


M8EK-!!6 $F!:I0^H/Y1+5TRKH=S`TVGP.`P\G0:?3 ,OE,'S,7`^OH; !>`0




M&F? W.[G<3_WP!E6]Y77ANZK7$3/1D]DZ9U0Z7[: YZ-GH/V+VW*1*EN5`]Z
M-^I$=P&ZRT%MHW;0'LCB^7WN&3-_>%/.&QTKU,<[L;CE"Y7Z:Q\B_`I]K1"5




<%$55>```4$L%! `````"``(`B````%,,````````
`
end

-----BEGIN PGP SIGNATURE-----

iD8DBQFFFIBKLK5blCcjpWoRAn+iAJ9Eh/wIVuebe7U4ADbXE3FAHumBVACgoC3b
PBzvmjyVX6kOba+Ie2KozzE=
=gjQb
-----END PGP SIGNATURE-----

Sep 22 2006

Kevin Bealer <kevinbealer gmail.com> writes:

Walter Bright wrote:
 Thomas Kuehne wrote:
 Walter Bright schrieb am 2006-09-22:
 What is CJK?

 CJK: Chinese, Japanese & Korean
 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

 
 Thank-you.
 

...
 Task at hand: Create a table of all characters used by humans all over
 the world and minimize friction due to political issues
 (e.g. characters' names). Except for bug fixes (typos...) the unicode 
 people
 usually only extend previous versions of the standard.

 
 Chinese, Japanese, and Korean are hardly obscure so I don't see why the 
 character sets for them seem to need large numbers of additions this 
 late in the game.

I think the big-alphabet languages tend to coin new letters somewhat 
like other languages do words (but maybe less frequently), but I'm not 
sure about that.

I have heard, though, that Chinese was simplified to a smaller set with 
different appearances during the revolution and the various political 
upheavals since.  They have been adding letters back since as they 
discover they are really needed -- so these get put into Unicode.

If you've read "1984" by Orwell, it's something like the motivation for 
NewSpeak.  Old literature is written in the old letters, and is 
disappearing because the public can't read it.

It's a kind of history censorship - you can't translate the old Chinese 
literature because they want to destroy the old culture as it competes 
philosophically with Communism.

Essentially, they didn't have to burn all the old books -- they just 
burned all the old printing presses.

Kevin

Sep 23 2006

Kristian <kjkilpi gmail.com> writes:

On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer gmail.com>  
wrote:

 Walter Bright wrote:
 Thomas Kuehne wrote:
 Walter Bright schrieb am 2006-09-22:
 What is CJK?

 CJK: Chinese, Japanese & Korean
 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

  Thank-you.

 ...
 Task at hand: Create a table of all characters used by humans all over
 the world and minimize friction due to political issues
 (e.g. characters' names). Except for bug fixes (typos...) the unicode  
 people
 usually only extend previous versions of the standard.

  Chinese, Japanese, and Korean are hardly obscure so I don't see why  
 the character sets for them seem to need large numbers of additions  
 this late in the game.

 I think the big-alphabet languages tend to coin new letters somewhat  
 like other languages do words (but maybe less frequently), but I'm not  
 sure about that.

 I have heard, though, that Chinese was simplified to a smaller set with  
 different appearances during the revolution and the various political  
 upheavals since.  They have been adding letters back since as they  
 discover they are really needed -- so these get put into Unicode.

 If you've read "1984" by Orwell, it's something like the motivation for  
 NewSpeak.  Old literature is written in the old letters, and is  
 disappearing because the public can't read it.





 It's a kind of history censorship - you can't translate the old Chinese  
 literature because they want to destroy the old culture as it competes  
 philosophically with Communism.

 Essentially, they didn't have to burn all the old books -- they just  
 burned all the old printing presses.

 Kevin

If that's the case, I'm very sorry to hear that! :(

Sep 23 2006

Sean Kelly <sean f4.ca> writes:

Kristian wrote:
 On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer <kevinbealer gmail.com> 
 wrote:
 
 It's a kind of history censorship - you can't translate the old 
 Chinese literature because they want to destroy the old culture as it 
 competes philosophically with Communism.

 Essentially, they didn't have to burn all the old books -- they just 
 burned all the old printing presses.

 
 If that's the case, I'm very sorry to hear that! :(

This is completely off-topic, but if you're interested in learning a bit 
about the Communist Revolution in China the fun way, go find the movie 
"To Live" in the foreign film section of your favorite video store. 
It's an excellent film that spans maybe 30 years of Chinese history, 
including the Communist Revolution.


Sean

Sep 23 2006

Kristian <kjkilpi gmail.com> writes:

On Sat, 23 Sep 2006 19:01:36 +0300, Sean Kelly <sean f4.ca> wrote:

 Kristian wrote:
 On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer  
 <kevinbealer gmail.com> wrote:

 It's a kind of history censorship - you can't translate the old  
 Chinese literature because they want to destroy the old culture as it  
 competes philosophically with Communism.

 Essentially, they didn't have to burn all the old books -- they just  
 burned all the old printing presses.

  If that's the case, I'm very sorry to hear that! :(

 This is completely off-topic, but if you're interested in learning a bit  
 about the Communist Revolution in China the fun way, go find the movie  
 "To Live" in the foreign film section of your favorite video store. It's  
 an excellent film that spans maybe 30 years of Chinese history,  
 including the Communist Revolution.


 Sean

Thanks for the tip.

Sep 25 2006

Kevin Bealer <kevinbealer gmail.com> writes:

Kristian wrote:
 On Sat, 23 Sep 2006 19:01:36 +0300, Sean Kelly <sean f4.ca> wrote:
 
 Kristian wrote:
 On Sat, 23 Sep 2006 11:40:08 +0300, Kevin Bealer 
 <kevinbealer gmail.com> wrote:

 It's a kind of history censorship - you can't translate the old 
 Chinese literature because they want to destroy the old culture as 
 it competes philosophically with Communism.

 Essentially, they didn't have to burn all the old books -- they just 
 burned all the old printing presses.

  If that's the case, I'm very sorry to hear that! :(

 This is completely off-topic, but if you're interested in learning a 
 bit about the Communist Revolution in China the fun way, go find the 
 movie "To Live" in the foreign film section of your favorite video 
 store. It's an excellent film that spans maybe 30 years of Chinese 
 history, including the Communist Revolution.


 Sean

 
 Thanks for the tip.

Yes - I really enjoyed that movie.

The site where I got the history of this, which I tried to summarize 
above, was a unicode related article.  What I wrote above is somewhat 
negative (intentionally) toward the PRC -- I don't take any of that 
back, but I thought I should post the link as well.

It also has some interesting unicode related info (which is maybe 
marginally on-topic?) but the technical stuff might be out-dated.

http://www.hastingsresearch.com/net/04-unicode-limitations.shtml

Kevin

Sep 26 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kevin Bealer schrieb am 2006-09-26:
 Kristian wrote:

<snip>

 It also has some interesting unicode related info (which is maybe 
 marginally on-topic?) but the technical stuff might be out-dated.

 http://www.hastingsresearch.com/net/04-unicode-limitations.shtml

The technical stuff is way outdated. The article is based on version 3,
the current one is 5. Version 4 did fix most of the CJK issues, however
the compatibility ideographs and variant selectors might turn
out to be monsters like the infamous tags (0xE0001, 0xE0020 - 0xE007F).

Thomas


-----BEGIN PGP SIGNATURE-----

iD8DBQFFGPXjLK5blCcjpWoRAva1AKCEHB62SU0D6PV30FtHBaiPMvDGzwCgpKC4
XU1sRteQUGW3XXL7RfVKUuw=
=Rl30
-----END PGP SIGNATURE-----

Sep 26 2006

Thomas Kuehne <thomas-dloop kuehne.cn> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas Kuehne schrieb am 2006-09-22:
 Walter Bright schrieb am 2006-09-22:
 Thomas Kuehne wrote:


<snip>
 I'd like to leave things as they are for 1.0. I don't think that 
 anyone's code will be adversely affected by not having the latest alpha 
 character additions to identifiers, and I also don't think math 
 characters should be part of identifiers. What is CJK?

 CJK: Chinese, Japanese & Korean
 0x20000 .. 0x2A6D6 CJK Ideograph Extension B
 0x2F800 .. 0x2FA1D CJK COMPATIBILITY IDEOGRAPHS

A closer look reveals that Appendix D is also missing
(among many others):

0x0712 .. 0x072F SYRIAC LETTER 
0x1200 .. 0x1248 ETHIOPIC SYLLABLE
0x13A0 .. 0x13F4 CHEROKEE LETTER
0x3400 .. 0x4DB5 CJK Ideograph Extension A
0xA016 .. 0xA48C YI SYLLABLE
0xF900 .. 0xFAD9 CJK COMPATIBILITY IDEOGRAPH
0xFB46 .. 0xFBB1 HEBREW / ARABIC LETTER
0xFF21 .. 0xFF3A FULLWIDTH LATIN CAPITAL LETTER
0xFF41 .. 0xFF5A FULLWIDTH LATIN SMALL LETTER

Thomas

-----BEGIN PGP SIGNATURE-----

iD8DBQFFFGmlLK5blCcjpWoRAlbeAJsHDZbaU/NlcHy2NMelqT3JfVN4WgCffOAc
ws0wT61MxHAUV6f7viBW8hU=
=uM8P
-----END PGP SIGNATURE-----

Sep 22 2006

D Programming

C/C++ Programming

Other

digitalmars.D - identifiers & "unialpha"