digitalmars.D.learn - what's the correct way to handle unicode? - trying to print out
- aliak (22/22) Jul 03 2018 Hi, trying to figure out how to loop through a string of
- aliak (6/28) Jul 03 2018 Hehe I guess the forum really is using D :p
- ag0aep6g (5/10) Jul 03 2018 Looks like forum.dlang.org has a problem when they appear side
- crimaniak (3/7) Jul 04 2018 For me, it looks as the used font has ligatures for these faces.
- Steven Schveighoffer (8/32) Jul 03 2018 Yeah, it appears that you can't actually print a grapheme. I would have
- Adam D. Ruppe (12/15) Jul 03 2018 What system are you on? Successfully printing this stuff depends
- aliak (5/19) Jul 04 2018 Just 'c' didn't but 'c[]' seems like the thing to do! Thankies!
- ag0aep6g (6/16) Jul 03 2018 You're looking for `c[]`. But that won't work, because std.uni
- Steven Schveighoffer (5/25) Jul 03 2018 Oops! I didn't realize this, ignore my message about reporting a bug.
Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume) Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar? Cheers, - Ali
Jul 03 2018
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume) Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar? Cheers, - AliHehe I guess the forum really is using D :p The two graphemes I'm talking about (which seem to not be rendered correctly above) are: family emoji: https://emojipedia.org/family-woman-woman-boy-boy/ rainbow flag: https://emojipedia.org/rainbow-flag/
Jul 03 2018
On Tuesday, 3 July 2018 at 13:36:56 UTC, aliak wrote:Hehe I guess the forum really is using D :p The two graphemes I'm talking about (which seem to not be rendered correctly above) are: family emoji: https://emojipedia.org/family-woman-woman-boy-boy/ rainbow flag: https://emojipedia.org/rainbow-flag/Looks like forum.dlang.org has a problem when they appear side by-side. Works (in the preview): 👩👩👦👦 🏳️🌈 Doesn't work: 👩👩👦👦🏳️🌈
Jul 03 2018
On Tuesday, 3 July 2018 at 14:39:34 UTC, ag0aep6g wrote:Looks like forum.dlang.org has a problem when they appear side by-side. Works (in the preview): 👩👩👦👦 🏳️🌈 Doesn't work: 👩👩👦👦🏳️🌈For me, it looks as the used font has ligatures for these faces. Mozilla under Linux, I guess it's 'EmojiOne Mozilla' font.
Jul 04 2018
On 7/3/18 9:32 AM, aliak wrote:Hi, trying to figure out how to loop through a string of characters and then spit them back out. Eg: foreach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish. So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); } And then if I type the loop variable as dchar, then it seems that the family empji is printed out as 4 faces - so the code points I guess - and the rainbow flag is other stuff (also its code points I assume)Yeah, it appears that you can't actually print a grapheme. I would have assumed writeln(c) works. It does work, it just prints the struct data instead of converting back to utf.Is there a type that I can use to store graphemes and then output them as a grapheme as well? Or do I have to use like lib ICU maybe or something similar?I honestly can't figure it out. I think directly writing graphemes as viewable UTF was not something that was considered. Definitely needs a bugzilla issue. -Steve
Jul 03 2018
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:So basically the above just doesn't work. Prints gibberish.What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program.foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>);prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented.
Jul 03 2018
On Tuesday, 3 July 2018 at 14:37:32 UTC, Adam D. Ruppe wrote:On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:Just 'c' didn't but 'c[]' seems like the thing to do! Thankies! Terminal on osx, and yeah you're right. Seems like just trying to paste rainbow flag right in to terminal results in the 3 separate code points[...]What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program.[...]prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented.
Jul 04 2018
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:foreach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish.Because you're printing one UTF-8 code unit (`char`) per line.So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); }You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Jul 03 2018
On 7/3/18 10:37 AM, ag0aep6g wrote:On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steveforeach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish.Because you're printing one UTF-8 code unit (`char`) per line.So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); }You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Jul 03 2018
On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer wrote:On 7/3/18 10:37 AM, ag0aep6g wrote:Aha, ok I see. Many gracias! Though, seems by a couple years old you mean 6 years! :) Is updating unicode stuff to the latest a matter of some config file somewhere with the code point configurations that result in specific graphemes? Feels kinda ... quite bad that we're 6 years behind the current standard. Also, any reason (technical or otherwise) that we have to slice a grapheme to get it printed? Or just no one implemented something like toString or the like? It's quite non intuitive as it is right now IMO. I can't really imagine anyone figuring out that they have to slice a grapheme to get it to print 🤔 Cheers, - AliOn Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steveforeach (c; "👩👩👦👦🏳️🌈") { writeln(c); } So basically the above just doesn't work. Prints gibberish.Because you're printing one UTF-8 code unit (`char`) per line.So I figured, std.uni.byGrapheme would help, since that's what they are, but I can't get it to print them back out? Is there a way? foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) { writeln(c.<????>); }You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Jul 04 2018
On 07/04/2018 05:12 PM, aliak wrote:Is updating unicode stuff to the latest a matter of some config file somewhere with the code point configurations that result in specific graphemes?I don't know. [...]Also, any reason (technical or otherwise) that we have to slice a grapheme to get it printed? Or just no one implemented something like toString or the like?I don't know. [...]I can't really imagine anyone figuring out that they have to slice a grapheme to get it to print 🤔You can figure it out by reading the documentation for `Grapheme`. However, the documentation doesn't make it clear that `byGrapheme` is a range of `Grapheme`s. That's an easy fix, though: https://github.com/dlang/phobos/pull/6627
Jul 04 2018