www.digitalmars.com         C & C++   DMDScript  

D.gnu - OS X bug: universal alpha indentifiers

reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Programs with non-ascii identifiers do not
link, under Mac OS X 10.3 using GDC 0.10...


They use the mangled name as a label for
the assembler, and then choke on the UTF-8:

unialpha.d:
 void anders() {}
 void björklund() {}
 
 void main()
 {
   anders(); björklund();
 }

Gives the error:
 /var/tmp//ccW4WWE0.s:44:Invalid mnemonic '?rklundFZv'

Here's the disasm:
 	bl __D8unialpha10björklundFZv

Similar errors for variables: unialpha2.d:
 int anders;
 int björklund;
 
 void main()
 {
   anders = 1; björklund = 2;
 }

gdc:
 /var/tmp//cc7ur4wd.s:31:Parameter syntax error (parameter 3)
 /var/tmp//cc7ur4wd.s:31:Invalid mnemonic '?rklundi-L1$pb)'
 /var/tmp//cc7ur4wd.s:32:Parameter error: expression must be absolute
(parameter 2)
 /var/tmp//cc7ur4wd.s:32:Invalid mnemonic '?rklundi-L1$pb)(r9)'

asm:
 	addis r9,r31,ha16(__D9unialpha210björklundi-L1$pb)
 	la r9,lo16(__D9unialpha210björklundi-L1$pb)(r9)

Not sure how this can be fixed, without changing the way that D mangles the names... Both programs compile just fine on Linux. --anders PS: Assembler is:
 Apple Computer, Inc. version cctools-525.obj~1, GNU assembler version 1.38

Jan 24 2005
parent reply "Thomas Kuehne" <eisvogel users.sourceforge.net> writes:
Added to DStress as
http://dstress.kuehne.cn/run/unicode_03.d
http://dstress.kuehne.cn/run/unicode_04.d
http://dstress.kuehne.cn/run/unicode_05.d
http://dstress.kuehne.cn/run/unicode_06.d
http://dstress.kuehne.cn/run/unicode_07.d

Thomas

Anders F Bjrklund schrieb in news:ct428n$2qoe$1 digitaldaemon.com :
 Programs with non-ascii identifiers do not
 link, under Mac OS X 10.3 using GDC 0.10...


 They use the mangled name as a label for
 the assembler, and then choke on the UTF-8:

 unialpha.d:
 void anders() {}
 void bjrklund() {}

 void main()
 {
   anders(); bjrklund();
 }

Gives the error:
 /var/tmp//ccW4WWE0.s:44:Invalid mnemonic '?rklundFZv'

Here's the disasm:
 bl __D8unialpha10bj??rklundFZv

Similar errors for variables: unialpha2.d:
 int anders;
 int bjrklund;

 void main()
 {
   anders = 1; bjrklund = 2;
 }

gdc:
 /var/tmp//cc7ur4wd.s:31:Parameter syntax error (parameter 3)
 /var/tmp//cc7ur4wd.s:31:Invalid mnemonic '?rklundi-L1$pb)'
 /var/tmp//cc7ur4wd.s:32:Parameter error: expression must be absolute
(parameter 2)
 /var/tmp//cc7ur4wd.s:32:Invalid mnemonic '?rklundi-L1$pb)(r9)'

asm:
 addis r9,r31,ha16(__D9unialpha210bj??rklundi-L1$pb)
 la r9,lo16(__D9unialpha210bj??rklundi-L1$pb)(r9)

Not sure how this can be fixed, without changing the way that D mangles the names... Both programs compile just fine on Linux. --anders PS: Assembler is:
 Apple Computer, Inc. version cctools-525.obj~1, GNU assembler version 1.38


Jan 27 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Thomas Kuehne wrote:

 Added to DStress as
 http://dstress.kuehne.cn/run/unicode_03.d
 http://dstress.kuehne.cn/run/unicode_04.d
 http://dstress.kuehne.cn/run/unicode_05.d
 http://dstress.kuehne.cn/run/unicode_06.d
 http://dstress.kuehne.cn/run/unicode_07.d

If you are feeling like testing or something, here are the rest of the Universal Alphas : http://www.algonet.se/~afb/d/universalalphas/
 Identifiers start with a letter, _, or unicode alpha, and are followed
 by any number of letters, _, digits, or universal alphas. Universal
 alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
 C99 Standard.)

http://www.digitalmars.com/d/lex.html#identifier I think Walter officially acknowledged the phrase "unicode alpha" as just a typo for universal... (the meaning is that it can't start with a digit) --anders
Jan 27 2005
parent reply "Thomas Kuehne" <eisvogel users.sourceforge.net> writes:
Anders F Bjrklund schrieb in news:ctairr$1ngb$1 digitaldaemon.com :
 If you are feeling like testing or something,
 here are the rest of the Universal Alphas :

 http://www.algonet.se/~afb/d/universalalphas/

I've been only testing the name mangling, thus it shouldn't be important what scripts I check.
 Identifiers start with a letter, _, or unicode alpha, and are followed
 by any number of letters, _, digits, or universal alphas. Universal
 alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
 C99 Standard.)

http://www.digitalmars.com/d/lex.html#identifier I think Walter officially acknowledged the phrase "unicode alpha" as just a typo for universal... (the meaning is that it can't start with a digit)

This shoulde be clarified. I suppose that "digits" are only "0123456789" - there are loads of other digits in Unicode. Why is an ancient (1999) version used in the documentation? I've tried codepoints that are assigned in the current standard bu weren't in the 1999 one, and as you might have guessed even currently reserved codepoints weren't caught by the frontent... Thomas
Jan 27 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Thomas Kuehne wrote:

Identifiers start with a letter, _, or unicode alpha, and are followed
by any number of letters, _, digits, or universal alphas. Universal
alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
C99 Standard.)



 This shoulde be clarified. I suppose that "digits" are only "0123456789"
 - there are loads of other digits in Unicode.

Yes, the quoted C99 standard (which isn't all that "ancient") used:
 Digits: 0660-0669, 06F0-06F9, 0966-096F, 09E6-09EF, 0A66-0A6F,
 0AE6-0AEF, 0B66-0B6F, 0BE7-0BEF, 0C66-0C6F, 0CE6-0CEF, 0D66-0D6F,
 0E50-0E59, 0ED0-0ED9, 0F20-0F33

But I'm also thinking that a "digit" here meant [0-9]... I think a "letter" to Walter is just [a-zA-Z], as well ? And I agree, it would be a lot easier to just say that. --anders
Jan 27 2005