www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to decode UTF-8 text?

reply Andrey <saasecondbox yandex.ru> writes:
Hello,
I have got some text with UTF-8. For example this part:
 <title>Παράλληλη
αναζήτηση</title>
How to decode it to get this result?
 <title>Παράλληλη αναζήτηση</title>
I have tried functions like "decode", "byUTF", "to!wchar"... but no success. Input string is correct - checked it with "https://www.browserling.com/tools/utf8-decode".
Mar 27
parent reply kdevel <kdevel vogtner.de> writes:
On Wednesday, 27 March 2019 at 13:39:07 UTC, Andrey wrote:
 I have got some text with UTF-8. For example this part:
 <title>Παράλληλη
αναζήτηση</title>
This looks like a UTF-8 sequence which has been UTF-8 encoded.
 How to decode it to get this result?
 <title>Παράλληλη αναζήτηση</title>
Undo the second UTF-8 encoding by transcoding the UTF-8 into an 8-bit character set (latin1, windows-1252 etc.) which you have to guess.
 I have tried functions like "decode", "byUTF", "to!wchar"... 
 but no success.

 Input string is correct - checked it with 
 "https://www.browserling.com/tools/utf8-decode".
```decode.d import std.stdio; import std.encoding; void main () { string src = "<title>Î\u00a0αράλληλη αναζήτηση</title>"; Latin1String ls; transcode (src, ls); string targ = cast (string) ls; targ.writeln; } ``` $ ./decode <title>Παράλληλη αναζήτηση</title>
Mar 27
parent Andrey <saasecondbox yandex.ru> writes:
On Wednesday, 27 March 2019 at 19:16:21 UTC, kdevel wrote:
 On Wednesday, 27 March 2019 at 13:39:07 UTC, Andrey wrote:
Thank you!
Mar 29