digitalmars.D.learn - Best way to read/write Chinese (GBK/GB18030) files?
- John Xu (8/8) Mar 06 2023 I'm new to dlang. I didn't find much tutorials on internet about
- Steven Schveighoffer (7/19) Mar 06 2023 It appears that encoding is not supported.
- ryuukk_ (5/13) Mar 06 2023 I found this:
- John Xu (7/11) Mar 09 2023 Thanks for quick answers. Now I found I can read both UTF8 and
- zjh (15/15) Mar 09 2023 On Friday, 10 March 2023 at 02:48:43 UTC, John Xu wrote:
- zjh (3/3) Mar 09 2023 On Friday, 10 March 2023 at 06:19:38 UTC, zjh wrote:
- 0xEAB (5/7) Mar 11 2023 D’s char + string types are Unicode.
- zjh (5/7) Mar 11 2023 There is no example. An example should be added in an obvious
- 0xEAB (15/19) Mar 12 2023 To read binary data from a file and dump it into another, you do:
- zjh (3/4) Mar 12 2023 Thank you for your reply, but is there any way to output `gbk`
- Steven Schveighoffer (9/15) Mar 13 2023 What is required is an addition to the `std.encoding` module, to allow
- zjh (3/5) Mar 13 2023 Thank you for your information.
- Kagamin (3/5) Mar 14 2023 I guess if your console is in gbk encoding, you can just write
I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030: "Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252." Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?
Mar 06 2023
On 3/6/23 8:45 PM, John Xu wrote:I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030: "Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252."It appears that encoding is not supported. There is a scant mention of it, in the BOM detection. But I don't think there's any mechanism to encode/decode it.Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?D has direct bindings to C, so possibly using a C library. I don't see anything jumping out at me from code.dlang.org -Steve
Mar 06 2023
On Tuesday, 7 March 2023 at 01:45:27 UTC, John Xu wrote:I'm new to dlang. I didn't find much tutorials on internet about how to read/write Chinese easily. std.encoding doesn't seem to support GBK or GB18030: "Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), ISO-8859-2 (LATIN-2), WINDOWS-1250, WINDOWS-1251 and WINDOWS-1252." Then what is best way to read GBK/GB18030 contents ? Even GBK/GB18030 file names ?I found this: https://github.com/meatatt/exCode/blob/master/source/excode/package.d There is mention of unicode/GBK conversion, maybe it could be helpful
Mar 06 2023
I found this: https://github.com/meatatt/exCode/blob/master/source/excode/package.d There is mention of unicode/GBK conversion, maybe it could be helpfulThanks for quick answers. Now I found I can read both UTF8 and UTF-16LE chinese file: string txt = std.file.read(chineseFile).to!string; and write to UTF8 file: std.file.write(utf8ChineseFile, txt); But still need figure out how to read/write GBK directly.
Mar 09 2023
On Friday, 10 March 2023 at 02:48:43 UTC, John Xu wrote: ```d module chinese; import std.stdio : writeln; import std.conv; import std.windows.charset; int main(string[] argv) { auto s1 = "中文";//utf8 字符串 writeln("word:"~ s1); //乱的 writeln("word:" ~ to!string(toMBSz(text(s1)))); //转后就正常了 writeln("Hello D-World!"); return 0; } ```
Mar 09 2023
On Friday, 10 March 2023 at 06:19:38 UTC, zjh wrote: `D language` is too unfriendly for Chinese users! You can't even write `gbk` files.
Mar 09 2023
On Friday, 10 March 2023 at 07:16:32 UTC, zjh wrote:`D language` is too unfriendly for Chinese users! You can't even write `gbk` files.D’s char + string types are Unicode. To quote the tour, “In D, *all* strings are Unicode strings”. If you desire to use other encodings, how about using ubyte + ubyte[]?
Mar 11 2023
On Saturday, 11 March 2023 at 19:56:09 UTC, 0xEAB wrote:If you desire to use other encodings, how about using ubyte + ubyte[]?There is no example. An example should be added in an obvious position. I tried for a long time, but couldn't output `gbk`, and I finally gave up.
Mar 11 2023
On Sunday, 12 March 2023 at 00:54:53 UTC, zjh wrote:On Saturday, 11 March 2023 at 19:56:09 UTC, 0xEAB wrote:To read binary data from a file and dump it into another, you do: ```d import std.file : read, write; void[] data = read("infile.txt"); write("outfile.txt", data); ``` To write binary data to a file: ```d import std.file : write; ubyte[] data = [0xA0, 0x0A, 0x30, 0x01, 0xFF, 0x00, 0xFE]; write("myfile.txt", data); ``` `data` could contain GBK encoded text, for example. (Just don’t use `"Unicode literals"`.)If you desire to use other encodings, how about using ubyte + ubyte[]?There is no example.
Mar 12 2023
On Sunday, 12 March 2023 at 20:03:23 UTC, 0xEAB wrote:...Thank you for your reply, but is there any way to output `gbk` code to the console?
Mar 12 2023
On 3/12/23 8:32 PM, zjh wrote:On Sunday, 12 March 2023 at 20:03:23 UTC, 0xEAB wrote:What is required is an addition to the `std.encoding` module, to allow such an encoding. Encodings are simply translating some encoding (e.g. utf) to another (e.g. gbk). If you look at `std.encoding` you can get an idea of what it might require. It will take some effort and especially some help from a knowledgeable user (such as yourself). -Steve...Thank you for your reply, but is there any way to output `gbk` code to the console?
Mar 13 2023
On Monday, 13 March 2023 at 15:50:37 UTC, Steven Schveighoffer wrote:What is required is an addition to the `std.encoding` module, to allow such an encoding.Thank you for your information.
Mar 13 2023
On Monday, 13 March 2023 at 00:32:07 UTC, zjh wrote:Thank you for your reply, but is there any way to output `gbk` code to the console?I guess if your console is in gbk encoding, you can just write bytes with stdout.write.
Mar 14 2023
On Tuesday, 14 March 2023 at 09:20:54 UTC, Kagamin wrote:I guess if your console is in gbk encoding, you can just write bytes with stdout.write.Thank you for your reply, but only display bytes, not gbk text.
Mar 14 2023
On Wednesday, 22 March 2023 at 15:23:42 UTC, Kagamin wrote:https://dlang.org/phobos/std_stdio.html#rawWriteIt's really amazing, it succeeded. Thank you! ```cpp auto b="test.txt";//gbk void[]d=read(b); stdout.rawWrite(d); ```
Mar 22 2023