www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - what's the correct way to handle unicode? - trying to print out

reply aliak <something something.com> writes:
Hi, trying to figure out how to loop through a string of 
characters and then spit them back out.

Eg:

foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
   writeln(c);
}

So basically the above just doesn't work. Prints gibberish.

So I figured, std.uni.byGrapheme would help, since that's what 
they are, but I can't get it to print them back out? Is there a 
way?

foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
}

And then if I type the loop variable as dchar,  then it seems  
that the family empji is printed out as 4 faces - so the code 
points I guess - and the rainbow flag is other stuff (also its 
code points I assume)

Is there a type that I can use to store graphemes and then output 
them as a grapheme as well? Or do I have to use like lib ICU 
maybe or something similar?

Cheers,
- Ali
Jul 03 2018
next sibling parent reply aliak <something something.com> writes:
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 Hi, trying to figure out how to loop through a string of 
 characters and then spit them back out.

 Eg:

 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
   writeln(c);
 }

 So basically the above just doesn't work. Prints gibberish.

 So I figured, std.uni.byGrapheme would help, since that's what 
 they are, but I can't get it to print them back out? Is there a 
 way?

 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
 }

 And then if I type the loop variable as dchar,  then it seems  
 that the family empji is printed out as 4 faces - so the code 
 points I guess - and the rainbow flag is other stuff (also its 
 code points I assume)

 Is there a type that I can use to store graphemes and then 
 output them as a grapheme as well? Or do I have to use like lib 
 ICU maybe or something similar?

 Cheers,
 - Ali
Hehe I guess the forum really is using D :p The two graphemes I'm talking about (which seem to not be rendered correctly above) are: family emoji: https://emojipedia.org/family-woman-woman-boy-boy/ rainbow flag: https://emojipedia.org/rainbow-flag/
Jul 03 2018
parent reply ag0aep6g <anonymous example.com> writes:
On Tuesday, 3 July 2018 at 13:36:56 UTC, aliak wrote:
 Hehe I guess the forum really is using D :p

 The two graphemes I'm talking about (which seem to not be 
 rendered correctly above) are:

 family emoji: https://emojipedia.org/family-woman-woman-boy-boy/
 rainbow flag: https://emojipedia.org/rainbow-flag/
Looks like forum.dlang.org has a problem when they appear side by-side. Works (in the preview): 👩‍👩‍👦‍👦 🏳️‍🌈 Doesn't work: 👩‍👩‍👦‍👦🏳️‍🌈
Jul 03 2018
parent crimaniak <crimaniak gmail.com> writes:
On Tuesday, 3 July 2018 at 14:39:34 UTC, ag0aep6g wrote:

 Looks like forum.dlang.org has a problem when they appear side 
 by-side.

 Works (in the preview): 👩‍👩‍👦‍👦 🏳️‍🌈
 Doesn't work: 👩‍👩‍👦‍👦🏳️‍🌈
For me, it looks as the used font has ligatures for these faces. Mozilla under Linux, I guess it's 'EmojiOne Mozilla' font.
Jul 04 2018
prev sibling next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 7/3/18 9:32 AM, aliak wrote:
 Hi, trying to figure out how to loop through a string of characters and 
 then spit them back out.
 
 Eg:
 
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
    writeln(c);
 }
 
 So basically the above just doesn't work. Prints gibberish.
 
 So I figured, std.uni.byGrapheme would help, since that's what they are, 
 but I can't get it to print them back out? Is there a way?
 
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
    writeln(c.<????>);
 }
 
 And then if I type the loop variable as dchar,  then it seems that the 
 family empji is printed out as 4 faces - so the code points I guess - 
 and the rainbow flag is other stuff (also its code points I assume)
Yeah, it appears that you can't actually print a grapheme. I would have assumed writeln(c) works. It does work, it just prints the struct data instead of converting back to utf.
 Is there a type that I can use to store graphemes and then output them 
 as a grapheme as well? Or do I have to use like lib ICU maybe or 
 something similar?
I honestly can't figure it out. I think directly writing graphemes as viewable UTF was not something that was considered. Definitely needs a bugzilla issue. -Steve
Jul 03 2018
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 So basically the above just doesn't work. Prints gibberish.
What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program.
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented.
Jul 03 2018
parent aliak <something something.com> writes:
On Tuesday, 3 July 2018 at 14:37:32 UTC, Adam D. Ruppe wrote:
 On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 [...]
What system are you on? Successfully printing this stuff depends on a lot of display details too, like writeln goes to a terminal/console and they are rarely configured to support such characters by default. You might actually be better off printing it to a file instead of to a display, then opening that file in your browser or something, just to confirm the code printed is correctly displayed by the other program.
   [...]
prolly just printing `c` itself would work and if not try `c[]` but then again it might see it as multiple graphemes, idk if it is even implemented.
Just 'c' didn't but 'c[]' seems like the thing to do! Thankies! Terminal on osx, and yeah you're right. Seems like just trying to paste rainbow flag right in to terminal results in the 3 separate code points
Jul 04 2018
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
   writeln(c);
 }

 So basically the above just doesn't work. Prints gibberish.
Because you're printing one UTF-8 code unit (`char`) per line.
 So I figured, std.uni.byGrapheme would help, since that's what 
 they are, but I can't get it to print them back out? Is there a 
 way?

 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
 }
You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Jul 03 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 7/3/18 10:37 AM, ag0aep6g wrote:
 On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
   writeln(c);
 }

 So basically the above just doesn't work. Prints gibberish.
Because you're printing one UTF-8 code unit (`char`) per line.
 So I figured, std.uni.byGrapheme would help, since that's what they 
 are, but I can't get it to print them back out? Is there a way?

 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
 }
You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steve
Jul 03 2018
parent reply aliak <something something.com> writes:
On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer 
wrote:
 On 7/3/18 10:37 AM, ag0aep6g wrote:
 On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
   writeln(c);
 }

 So basically the above just doesn't work. Prints gibberish.
Because you're printing one UTF-8 code unit (`char`) per line.
 So I figured, std.uni.byGrapheme would help, since that's 
 what they are, but I can't get it to print them back out? Is 
 there a way?

 foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
   writeln(c.<????>);
 }
You're looking for `c[]`. But that won't work, because std.uni apparently doesn't recognize those as grapheme clusters. The emojis may be too new. std.uni is based on Unicode version 6.2, which is a couple years old.
Oops! I didn't realize this, ignore my message about reporting a bug. I still think it's very odd for printing a grapheme to print the data structure. -Steve
Aha, ok I see. Many gracias! Though, seems by a couple years old you mean 6 years! :) Is updating unicode stuff to the latest a matter of some config file somewhere with the code point configurations that result in specific graphemes? Feels kinda ... quite bad that we're 6 years behind the current standard. Also, any reason (technical or otherwise) that we have to slice a grapheme to get it printed? Or just no one implemented something like toString or the like? It's quite non intuitive as it is right now IMO. I can't really imagine anyone figuring out that they have to slice a grapheme to get it to print 🤔 Cheers, - Ali
Jul 04 2018
parent ag0aep6g <anonymous example.com> writes:
On 07/04/2018 05:12 PM, aliak wrote:
 Is updating unicode stuff to the latest a matter of some config file
 somewhere with the code point configurations that result in specific
 graphemes?
I don't know. [...]
 Also, any reason (technical or otherwise) that we have to slice a 
 grapheme to get it printed? Or just no one implemented something like
 toString or the like?
I don't know. [...]
 I can't really imagine anyone figuring out that they have to slice a
 grapheme to get it to print 🤔
You can figure it out by reading the documentation for `Grapheme`. However, the documentation doesn't make it clear that `byGrapheme` is a range of `Grapheme`s. That's an easy fix, though: https://github.com/dlang/phobos/pull/6627
Jul 04 2018