digitalmars.D.learn - Read a unicode character from the terminal
- Jacob Carlborg <doob me.com> Mar 31 2012
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Mar 31 2012
- Jordi Sayol <g.sayol yahoo.es> Mar 31 2012
- Jordi Sayol <g.sayol yahoo.es> Mar 31 2012
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Mar 31 2012
- Jacob Carlborg <doob me.com> Apr 01 2012
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Apr 01 2012
- Jacob Carlborg <doob me.com> Apr 01 2012
- Jacob Carlborg <doob me.com> Apr 01 2012
- Stewart Gordon <smjg_1998 yahoo.com> Mar 31 2012
- Jacob Carlborg <doob me.com> Apr 01 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 04 2012
- Jacob Carlborg <doob me.com> Apr 04 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 04 2012
- Jacob Carlborg <doob me.com> Apr 04 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 05 2012
- Jacob Carlborg <doob me.com> Apr 05 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 07 2012
- Jacob Carlborg <doob me.com> Apr 07 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 07 2012
- Jacob Carlborg <doob me.com> Apr 07 2012
- Stewart Gordon <smjg_1998 yahoo.com> Apr 07 2012
- Jacob Carlborg <doob me.com> Apr 04 2012
How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark. -- /Jacob Carlborg
Mar 31 2012
On 03/31/2012 08:56 AM, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc"
I recommend using stdin. The destiny of std.cstream is uncertain and stdin is sufficient. (I know that it lacks support for BOM but I don't need them.)but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.
The word 'character' used to mean characters of the Latin-based alphabets but with Unicode support that's not the case anymore. In D, 'character' means UTF code unit, nothing else. Unfortunately, although 'Unidode character' is just the correct term to use, it conflicts with D's characters which are not Unicode characters. 'Unicode code point' is the non-conflicting term that matches what we mean with 'Unicode character.' Only dchar can hold code points. That's the part about characters. The other side is what is being fed into the program through its standard input. On my Linux consoles, the text comes as a stream of chars, i.e. a UTF-8 encoded text. You must ensure that your terminal is capable of supporting Unicode through its settings. On Windows terminals, one must enter 'chcp 65001' to set the terminal to UTF-8. Then, it is the program that must know what the data represents. If you are expecting a Unicode code point, then you may think that is should be as simple as reading into a dchar: import std.stdio; void main() { dchar letter; readf("%s", &letter); // <-- does not work! writeln(letter); } The output: $ ./deneme ç Ã <-- will be different on different consoles The problem is, char can implicitly be converted to dchar. Since the letter ç consists of two chars (two UTF-8 code units), dchar gets the first one converted as a dchar. To see this, read and write two chars in a loop without a newline in between: import std.stdio; void main() { foreach (i; 0 .. 2) { char code; readf("%s", &code); write(code); } writeln(); } This time two code units are read and then outputted to form a Unicode character on the console: $ ./deneme ç ç <-- result of two write(code) expressions The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)). For now, this is a way of getting Unicode characters from the input: import std.stdio; void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } Once you have the input as a string, std.utf.decode can also be used. Ali
Mar 31 2012
Many thanks to be so educational. Best regards, -- Jordi Sayol
Mar 31 2012
BTW, for those who do not know, Ali =C3=87ehreli is writing a book to lea= rn "D" from scratch. It's very educational. There are two formats: HTML (on-line) and PDF. http://ddili.org/ders/d.en/index.html Best regards, --=20 Jordi Sayol
Mar 31 2012
On 03/31/2012 02:31 PM, Jordi Sayol wrote:BTW, for those who do not know, Ali Çehreli is writing a book to learn "D" from scratch. It's very educational. There are two formats: HTML (on-line) and PDF. http://ddili.org/ders/d.en/index.html Best regards,
Thank you very much for the free plug! :) I have translated eleven more chapters since the last announcement. I am on the assert chapter as we speak. It is taking longer than I had expected because I constantly make improvements to the original: corrections, consistency improvements, additions, adapting code samples to the current state of D, etc. Ali
Mar 31 2012
On 03/31/2012 11:53 AM, Ali Çehreli wrote:The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)).
Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :p import std.stdio; import std.utf; import std.array; struct UnicodeRange { File file; char[4] codes; bool ready; this(File file) { this.file = file; this.ready = false; } bool empty() const property { return file.eof(); } dchar front() const property { if (!ready) { // Sorry, no 'mutable' in D! :p UnicodeRange * mutable_this = cast(UnicodeRange*)&this; mutable_this.readNext(); } return codes.front; } void popFront() { codes = codes.init; ready = false; } void readNext() { foreach (ref code; codes) { file.readf("%s", &code); if (file.eof()) { codes[] = '\0'; ready = false; break; } // Expensive way of determining "ready"! try { if (isValidDchar(codes.front)) { ready = true; break; } } catch (Exception) { // not ready } } } } UnicodeRange byUnicode(File file = stdin) { return UnicodeRange(file); } void main() { foreach(c; byUnicode()) { writeln(c); } } Ali
Mar 31 2012
On 2012-04-01 01:17, Ali Çehreli wrote:On 03/31/2012 11:53 AM, Ali Çehreli wrote: > The solution is to use ranges when pulling Unicode characters out of > strings. std.stdin does not provide this yet, but it will eventually > happen (so I've heard :)). Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :p
Ok, what's the differences compared to the example in your first post: void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } -- /Jacob Carlborg
Apr 01 2012
On 04/01/2012 05:00 AM, Jacob Carlborg wrote:On 2012-04-01 01:17, Ali Çehreli wrote:On 03/31/2012 11:53 AM, Ali Çehreli wrote:The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)).
Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :p
Ok, what's the differences compared to the example in your first post: void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } }
No difference in that example because it consumes the entire input as dchars. But in general, with that inefficient range, it is possible to pull just one dchar from the input and leave the rest of the stream untouched. For example, it would be possible to readf() an int right after that: auto u = byUnicode(); dchar d = u.front; // <-- reads just one dchar from the range int i; readf("%s", &i); // <-- continues with std.stdio functions writeln(i); With the getline() method, the int must be looked up in the line first, then from the input. Ali
Apr 01 2012
On 2012-04-01 16:02, Ali Çehreli wrote:No difference in that example because it consumes the entire input as dchars. But in general, with that inefficient range, it is possible to pull just one dchar from the input and leave the rest of the stream untouched. For example, it would be possible to readf() an int right after that: auto u = byUnicode(); dchar d = u.front; // <-- reads just one dchar from the range int i; readf("%s", &i); // <-- continues with std.stdio functions writeln(i); With the getline() method, the int must be looked up in the line first, then from the input. Ali
Ok, I see, thanks. -- /Jacob Carlborg
Apr 01 2012
On 2012-03-31 20:53, Ali Çehreli wrote:I recommend using stdin. The destiny of std.cstream is uncertain and stdin is sufficient. (I know that it lacks support for BOM but I don't need them.)
I thought std.cstream was a stream wrapper around stdin.The word 'character' used to mean characters of the Latin-based alphabets but with Unicode support that's not the case anymore. In D, 'character' means UTF code unit, nothing else. Unfortunately, although 'Unidode character' is just the correct term to use, it conflicts with D's characters which are not Unicode characters. 'Unicode code point' is the non-conflicting term that matches what we mean with 'Unicode character.' Only dchar can hold code points. That's the part about characters.
Yeah, exactly. When I think about it, I don't know why I thought "getc" would work since it only returns a "char" and not a "dchar".The other side is what is being fed into the program through its standard input. On my Linux consoles, the text comes as a stream of chars, i.e. a UTF-8 encoded text. You must ensure that your terminal is capable of supporting Unicode through its settings. On Windows terminals, one must enter 'chcp 65001' to set the terminal to UTF-8.
I'm on Mac OS X, the terminal is capable of handling Unicode.Then, it is the program that must know what the data represents. If you are expecting a Unicode code point, then you may think that is should be as simple as reading into a dchar: import std.stdio; void main() { dchar letter; readf("%s", &letter); // <-- does not work! writeln(letter); } The output: $ ./deneme ç Ã <-- will be different on different consoles
I tried that as well.The problem is, char can implicitly be converted to dchar. Since the letter ç consists of two chars (two UTF-8 code units), dchar gets the first one converted as a dchar. To see this, read and write two chars in a loop without a newline in between: import std.stdio; void main() { foreach (i; 0 .. 2) { char code; readf("%s", &code); write(code); } writeln(); } This time two code units are read and then outputted to form a Unicode character on the console: $ ./deneme ç ç <-- result of two write(code) expressions The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)). For now, this is a way of getting Unicode characters from the input: import std.stdio; void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } Once you have the input as a string, std.utf.decode can also be used. Ali
I'll give that a try, thanks. -- /Jacob Carlborg
Apr 01 2012
On 31/03/2012 16:56, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.
What OS are you using? And what codepage is the console set to? You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!) Stewart.
Mar 31 2012
On 2012-04-01 00:14, Stewart Gordon wrote:On 31/03/2012 16:56, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.
What OS are you using? And what codepage is the console set to?
I'm using Mac OS X and the terminal is set to handle UTF-8.You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!) Stewart.
I'll have a look, thanks. -- /Jacob Carlborg
Apr 01 2012
On 31/03/2012 23:14, Stewart Gordon wrote: <snip>You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!)
The D2 version is now up on the site. Jacob - would you be up for helping me with testing/implementation of my library on Mac OS? If you do a search for "todo" you'll see what needs to be done. Some of it will benefit Unix-type systems generally. If perchance you have a big-endian CPU, testing the bit arrays on it would also be of value. Stewart.
Apr 04 2012
On 2012-04-04 18:06, Stewart Gordon wrote:The D2 version is now up on the site. Jacob - would you be up for helping me with testing/implementation of my library on Mac OS? If you do a search for "todo" you'll see what needs to be done. Some of it will benefit Unix-type systems generally. If perchance you have a big-endian CPU, testing the bit arrays on it would also be of value. Stewart.
Sure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library? Doesn't it duplicate functionally that's already available in Phobos and/or Tango? For Mac OS X, if you just follow the Posix standard you'll get very far. I have an x86 CPU, there were a couple of years ago since Apple last had a PPC based computer. -- /Jacob Carlborg
Apr 04 2012
On 04/04/2012 17:37, Jacob Carlborg wrote: <snip>Sure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library?
Just to hold some miscellaneous utility classes/structs/functions.Doesn't it duplicate functionally that's already available in Phobos and/or Tango?
It certainly does in places. But what matters is that it contains functionality that isn't present in Phobos (or wasn't present in Phobos at the time I wrote it). Stewart.
Apr 04 2012
On 2012-04-05 01:21, Stewart Gordon wrote:On 04/04/2012 17:37, Jacob Carlborg wrote: <snip>Sure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library?
Just to hold some miscellaneous utility classes/structs/functions.Doesn't it duplicate functionally that's already available in Phobos and/or Tango?
It certainly does in places. But what matters is that it contains functionality that isn't present in Phobos (or wasn't present in Phobos at the time I wrote it). Stewart.
Ok, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos? -- /Jacob Carlborg
Apr 04 2012
On 05/04/2012 07:18, Jacob Carlborg wrote: <snip>Ok, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos?
Maybe it contains the code I need to finish datetime off. Though I can't really just copy someone else's code, I suppose I can at least see what functions it uses. I haven't noticed much along the lines of command line manipulation in Phobos - only the code (now in druntime) to populate the args argument to main (which under Posix it just uses argc/argv from the C main). Or is there something I haven't found? Stewart.
Apr 05 2012
On 2012-04-05 12:55, Stewart Gordon wrote:On 05/04/2012 07:18, Jacob Carlborg wrote: <snip>Ok, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos?
Maybe it contains the code I need to finish datetime off. Though I can't really just copy someone else's code, I suppose I can at least see what functions it uses. I haven't noticed much along the lines of command line manipulation in Phobos - only the code (now in druntime) to populate the args argument to main (which under Posix it just uses argc/argv from the C main). Or is there something I haven't found? Stewart.
http://dlang.org/phobos/std_getopt.html But it might not do what you want. -- /Jacob Carlborg
Apr 05 2012
On 05/04/2012 14:51, Jacob Carlborg wrote: <snip>http://dlang.org/phobos/std_getopt.html But it might not do what you want.
Where is the code in std.getopt that has any relevance whatsoever to what smjg.libs.util.datetime or smjg.libs.util.commandline is for? Stewart.
Apr 07 2012
On 2012-04-07 14:36, Stewart Gordon wrote:On 05/04/2012 14:51, Jacob Carlborg wrote: <snip>http://dlang.org/phobos/std_getopt.html But it might not do what you want.
Where is the code in std.getopt that has any relevance whatsoever to what smjg.libs.util.datetime or smjg.libs.util.commandline is for? Stewart.
Both std.getopt and mjg.libs.util.commandline handle command line arguments? -- /Jacob Carlborg
Apr 07 2012
On 07/04/2012 17:54, Jacob Carlborg wrote: <snip>Both std.getopt and mjg.libs.util.commandline handle command line arguments?
What's that to do with anything? If the code I need to finish smjg.libs.util.commandline is somewhere in std.getopt, please tell me where exactly it is. If it isn't, then why did you refer me to it? That's like telling someone who's writing a bigint library and struggling to implement multiplication to just look in std.math. After all, they both handle numbers. Stewart.
Apr 07 2012
On 2012-04-07 19:57, Stewart Gordon wrote:On 07/04/2012 17:54, Jacob Carlborg wrote: <snip>Both std.getopt and mjg.libs.util.commandline handle command line arguments?
What's that to do with anything? If the code I need to finish smjg.libs.util.commandline is somewhere in std.getopt, please tell me where exactly it is. If it isn't, then why did you refer me to it? That's like telling someone who's writing a bigint library and struggling to implement multiplication to just look in std.math. After all, they both handle numbers. Stewart.
I don't know what your module is supposed to do. -- /Jacob Carlborg
Apr 07 2012
On 07/04/2012 20:16, Jacob Carlborg wrote: <snip>I don't know what your module is supposed to do.
Then how about reading its documentation? http://pr.stewartsplace.org.uk/d/sutil/doc/commandline.html If there's something you don't understand about it, this is the issue that needs to be addressed, rather than wildly guessing that some Phobos module provides the answer. Stewart.
Apr 07 2012
On 2012-03-31 17:56, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.
I solved it like this: dchar readChar () { char[4] buffer; buffer[0] = din.getc(); auto len = codeLength!(char)(buffer[0]); foreach (i ; 1 .. len) buffer[i] = din.getc(); size_t i; return decode(buffer, i); } -- /Jacob Carlborg
Apr 04 2012









Jordi Sayol <g.sayol yahoo.es> 