www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Encoding problems...

reply Robert Fraser <fraserofthenight gmail.com> writes:
Hi all,

Quick question: I want to use some unicode identifiers, but I get 
"unsupported char 0xe2", both with using and not using a BOM. The 
characters in question are the superset/subset-equals operators: ⊇ and 
⊆... Perhaps these are just unsupported by DMD (in which case, I'll file 
a bug)?

Thanks,
Robert
May 27 2009
next sibling parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Wed, May 27, 2009 at 8:55 PM, Robert Fraser
<fraserofthenight gmail.com> wrote:
 Hi all,

 Quick question: I want to use some unicode identifiers, but I get
 "unsupported char 0xe2", both with using and not using a BOM. The characters
 in question are the superset/subset-equals operators: $B"=(B and $B"<(B...
Perhaps
 these are just unsupported by DMD (in which case, I'll file a bug)?

 Thanks,
 Robert
If they're not classified as "universal alpha" I don't think you can use them in identifiers.
May 27 2009
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Jarrett Billingsley wrote:
 On Wed, May 27, 2009 at 8:55 PM, Robert Fraser
 <fraserofthenight gmail.com> wrote:
 Hi all,

 Quick question: I want to use some unicode identifiers, but I get
 "unsupported char 0xe2", both with using and not using a BOM. The characters
 in question are the superset/subset-equals operators: $B"=(B and $B"<(B...
Perhaps
 these are just unsupported by DMD (in which case, I'll file a bug)?

 Thanks,
 Robert
If they're not classified as "universal alpha" I don't think you can use them in identifiers.
Lame. K; thanks.
May 27 2009
parent reply grauzone <none example.net> writes:
Robert Fraser wrote:
 Jarrett Billingsley wrote:
 On Wed, May 27, 2009 at 8:55 PM, Robert Fraser
 <fraserofthenight gmail.com> wrote:
 Hi all,

 Quick question: I want to use some unicode identifiers, but I get
 "unsupported char 0xe2", both with using and not using a BOM. The characters
 in question are the superset/subset-equals operators: $B"=(B and $B"<(B...
Perhaps
 these are just unsupported by DMD (in which case, I'll file a bug)?

 Thanks,
 Robert
If they're not classified as "universal alpha" I don't think you can use them in identifiers.
How the hell did your news client switch from UTF-8 to Japanese-something? (charset=UTF-8 => charset=ISO-2022-JP)
 
 Lame. K; thanks.
Don't worry, people working with your code will be thankful!
May 28 2009
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
grauzone wrote:
 Robert Fraser wrote:
 Jarrett Billingsley wrote:
 On Wed, May 27, 2009 at 8:55 PM, Robert Fraser
 <fraserofthenight gmail.com> wrote:
 Hi all,

 Quick question: I want to use some unicode identifiers, but I get
 "unsupported char 0xe2", both with using and not using a BOM. The characters
 in question are the superset/subset-equals operators: $B"=(B and $B"<(B...
Perhaps
 these are just unsupported by DMD (in which case, I'll file a bug)?

 Thanks,
 Robert
If they're not classified as "universal alpha" I don't think you can use them in identifiers.
How the hell did your news client switch from UTF-8 to Japanese-something? (charset=UTF-8 => charset=ISO-2022-JP)
 Lame. K; thanks.
Don't worry, people working with your code will be thankful!
Hmm... I'd say x.$B"<(B(y) is preferable x.isSubsetOf(y), but it's not a huge deal.
May 28 2009
parent reply BCS <ao pathlink.com> writes:
Reply to Robert,


 Hmm... I'd say x.⊆(y) is preferable x.isSubsetOf(y), but it's not a
 huge deal.
 
Only until you have to type it. I think universal alpha includes only the union of things that can be easily typed on standard keyboards. I don't think any keyboard (ok maybe an APL keyboard) has the subset symbol on it.
May 28 2009
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
BCS wrote:
 Reply to Robert,
 
 
 Hmm... I'd say x.⊆(y) is preferable x.isSubsetOf(y), but it's not a
 huge deal.
Only until you have to type it. I think universal alpha includes only the union of things that can be easily typed on standard keyboards.
<snip> What inspired you to form that opinion? My impression was that it's some standard list of Unicode characters that are letters (or logogram or ideogram or whatever) in some language somewhere in the world. Anyway.... http://www.digitalmars.com/d/1.0/lex.html "Identifiers start with a letter, _, or universal alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.)" I eventually managed to find this: http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf Stewart.
May 28 2009
parent reply BCS <ao pathlink.com> writes:
Reply to Stewart,

 BCS wrote:
 
 Only until you have to type it. I think universal alpha includes only
 the union of things that can be easily typed on standard keyboards.
 
<snip> What inspired you to form that opinion? My impression was that it's some standard list of Unicode characters that are letters (or logogram or ideogram or whatever) in some language somewhere in the world.
That's more or less the same thing (although I'll admit, my original comment is not well stated). I'm not just talking about standard QWERTY keyboard but also standard keyboards for other languages and alphabets. I rather suspect that for every char in universal alpha, there is a standard keyboard somewhere that has it.
May 28 2009
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
BCS wrote:
 Reply to Stewart,
<snip>
 My impression was that it's some standard list of Unicode characters
 that are letters (or logogram or ideogram or whatever) in some
 language somewhere in the world.
That's more or less the same thing (although I'll admit, my original comment is not well stated).
Indeed, my keyboard has a number of punctuation characters, most of which aren't valid in identifiers.
 I'm not just talking about standard QWERTY 
 keyboard but also standard keyboards for other languages and alphabets. 
I'd got that far.
 I rather suspect that for every char in universal alpha, there is a 
 standard keyboard somewhere that has it.
So I guess it's therefore likely to exclude ancient scripts with not enough modern use to have warranted the invention of a standard keyboard therefor. (One omission I noticed is Phoenician, though that may be also due to its later arrival in Unicode.) Stewart.
May 28 2009
parent BCS <none anon.com> writes:
Hello Stewart,

 So I guess it's therefore likely to exclude ancient scripts with not
 enough modern use to have warranted the invention of a standard
 keyboard therefor.  (One omission I noticed is Phoenician, though that
 may be also due to its later arrival in Unicode.)
Anyone who really wants to use Phoenician for symbol names should be taken out and shot (with a nerf gun).
 
 Stewart.
 
May 28 2009
prev sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
BCS wrote:
 Reply to Robert,
 
 
 Hmm... I'd say x.⊆(y) is preferable x.isSubsetOf(y), but it's not a
 huge deal.
Only until you have to type it. I think universal alpha includes only the union of things that can be easily typed on standard keyboards. I don't think any keyboard (ok maybe an APL keyboard) has the subset symbol on it.
I have 10 configurable keys on my keyboard, none of which are in use. I could also remap my numpad (cause, seriously, who uses this?) Also, many editors can be configured so that a sequence of characters converts to a single one. There appears to be no reason that mathematical symbols aren't allowed in identifiers... Think of how awesome it would be to write assert(x⊇y→∀a∈x∃b∈y(a⊇b)) ... Okay, that would require overloading of those operators (and instantiating variables in a new way), but still!
May 28 2009
next sibling parent BCS <none anon.com> writes:
Hello Robert,

 BCS wrote:
 
 Reply to Robert,
 
 Hmm... I'd say x.⊆(y) is preferable x.isSubsetOf(y), but it's not a
 huge deal.
 
Only until you have to type it. I think universal alpha includes only the union of things that can be easily typed on standard keyboards. I don't think any keyboard (ok maybe an APL keyboard) has the subset symbol on it.
I have 10 configurable keys on my keyboard, none of which are in use. I could also remap my numpad (cause, seriously, who uses this?) Also, many editors can be configured so that a sequence of characters converts to a single one. There appears to be no reason that mathematical symbols aren't allowed in identifiers... Think of how awesome it would be to write assert(x⊇y→∀a∈x∃b∈y(a⊇b)) ... Okay, that would require overloading of those operators (and instantiating variables in a new way), but still!
Allowing them as operators would be cool (and won't happen for another whole host of reasons that have nothing to do with this) but in identifiers? Not a chance. I don't care what you can type, what matters is what /I/ can type (the generic 'I', assuming I can read your comments -> I use your language -> I use your alphabet).
May 28 2009
prev sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Robert Fraser wrote:
 BCS wrote:
 Reply to Robert,


 Hmm... I'd say x.⊆(y) is preferable x.isSubsetOf(y), but it's not a
 huge deal.
Only until you have to type it. I think universal alpha includes only the union of things that can be easily typed on standard keyboards. I don't think any keyboard (ok maybe an APL keyboard) has the subset symbol on it.
I have 10 configurable keys on my keyboard, none of which are in use. I could also remap my numpad (cause, seriously, who uses this?) Also, many editors can be configured so that a sequence of characters converts to a single one.
Which would possibly make D the first language to *require* a specialised keyboard and/or editor since APL. Not a good precedent. Oh, and don't try to argue it isn't mandatory. If you can overload those operators, people WILL use them and WILL complain that it's too hard.
 There appears to be no reason that mathematical symbols aren't allowed
 in identifiers... Think of how awesome it would be to write
 assert(x⊇y→∀a∈x∃b∈y(a⊇b)) ... Okay, that would require
overloading of
 those operators (and instantiating variables in a new way), but still!
I think that example you gave is an excellent reason not to allow them. :D It would be nice, but it's really not feasible without widespread editor and/or keyboard support for extra symbols, which I just don't see happening.
May 28 2009
parent Kagamin <spam here.lot> writes:
Daniel Keep Wrote:

 It would be nice, but it's really not feasible without widespread editor
 and/or keyboard support for extra symbols, which I just don't see happening.
http://www.microsoft.com/globaldev/tools/msklc.mspx :)))
May 29 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Robert Fraser wrote:
 Hi all,
 
 Quick question: I want to use some unicode identifiers, but I get 
 "unsupported char 0xe2", both with using and not using a BOM. The 
 characters in question are the superset/subset-equals operators: ⊇ and 
 ⊆... Perhaps these are just unsupported by DMD (in which case, I'll file 
 a bug)?
 
 Thanks,
 Robert
www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf (As an aside, Google's link obfuscation is hella annoying.) The relevant range is U+2200 to U+22FF (specifically U+2286, U+2287). It's not included.
May 28 2009
parent reply BCS <none anon.com> writes:
Hello Christopher,

 (As an aside, Google's link obfuscation is hella annoying.)
??
May 28 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
BCS wrote:
 Hello Christopher,
 
 (As an aside, Google's link obfuscation is hella annoying.)
??
You do the google search for ISO9899. The link they give you: http://www.google.com/url?sa=t&source=web&ct=res&cd=4&url=http%3A%2F%2Fwww.open-std.org%2FJTC1%2FSC22%2Fwg14%2Fwww%2Fdocs%2Fn1124.pdf&ei=IQofSs23FNjXlAeJmeXGBQ&usg=AFQjCNGZNITNpxvZKard5pSr7RQvxmTDkQ&sig2=8T5gS1aSODl4KdKmy2jp_w Eugh.
May 28 2009
next sibling parent BCS <ao pathlink.com> writes:
Reply to Christopher,

 BCS wrote:
 
 Hello Christopher,
 
 (As an aside, Google's link obfuscation is hella annoying.)
 
??
You do the google search for ISO9899. The link they give you: http://www.google.com/url?sa=t&source=web&ct=res&cd=4&url=http%3A%2F%2 Fwww.open-std.org%2FJTC1%2FSC22%2Fwg14%2Fwww%2Fdocs%2Fn1124.pdf&ei=IQo fSs23FNjXlAeJmeXGBQ&usg=AFQjCNGZNITNpxvZKard5pSr7RQvxmTDkQ&sig2=8T5gS1 aSODl4KdKmy2jp_w Eugh.
only if you are logged in to a google account. The mangling is so they can tell what you click on for ( you.are(paranoid) ? "stalking you" : "creating better personalized search results" )
May 28 2009
prev sibling parent Kagamin <spam here.lot> writes:
Christopher Wright Wrote:

 BCS wrote:
 Hello Christopher,
 
 (As an aside, Google's link obfuscation is hella annoying.)
??
You do the google search for ISO9899. The link they give you: http://www.google.com/url?sa=t&source=web&ct=res&cd=4&url=http%3A%2F%2Fwww.open-std.org%2FJTC1%2FSC22%2Fwg14%2Fwww%2Fdocs%2Fn1124.pdf&ei=IQofSs23FNjXlAeJmeXGBQ&usg=AFQjCNGZNITNpxvZKard5pSr7RQvxmTDkQ&sig2=8T5gS1aSODl4KdKmy2jp_w Eugh.
http://en.wikipedia.org/wiki/C99 huh...
May 29 2009