www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Wrong output of quotes in Windows =?UTF-8?Q?=28encoding=3F=29?=

reply Hugo Florentino <hugo acdam.cu> writes:
Hi,

A short while ago I had minor difficulties escaping quotes, and noticed 
(I don't remember where) a simple function by a D user which I have now 
tried to enhance. The problem is that output is incorrect in Windows 
(even with unicode-supporting fonts). I tried to use transcode but could 
not get it to work.

Check the following code, and please advise me what to do in order to 
get the correct output:


import std.stdio, std.string, std.encoding;

 trusted string quote(in string str, in char chr = 'd') pure {
   switch(chr) {
     case 'b': return '`' ~ str ~ '`'; // backtick
     case 'd': return `"` ~ str ~ `"`; // double
     case 'f': return `«` ~ str ~ `»`; // french
     case 's': return `'` ~ str ~ `'`; // single
     case 't': return `“` ~ str ~ `”`; // typographic
     default: return `"` ~ str ~ `"`; // double
   }
}

void main() {
   char[] a = ['b', 'd', 'f', 's', 't'];
   auto input = "just a test";
   foreach(char type; a)
     writeln(format("Quote type %s:\t%s", type, quote(input, type)));
}
Dec 18 2013
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/18/2013 05:32 AM, Hugo Florentino wrote:

 output is incorrect in Windows (even with unicode-supporting
 fonts).
Is the code page also set to UTF-8? I think you must issue the command 'chcp 65001'. I have changed your program to print the code units individually in hex. I changed the test string to a single space character so that you can identify it easily on the output: import std.stdio, std.string, std.encoding; trusted string quote(in string str, in char chr = 'd') pure { switch(chr) { case 'b': return '`' ~ str ~ '`'; // backtick case 'd': return `"` ~ str ~ `"`; // double case 'f': return `«` ~ str ~ `»`; // french case 's': return `'` ~ str ~ `'`; // single case 't': return `“` ~ str ~ `”`; // typographic default: return `"` ~ str ~ `"`; // double } } void main() { char[] a = ['b', 'd', 'f', 's', 't']; auto input = " "; foreach(char type; a) writeln(format("Quote type %s:\t%(%02x %)", type, cast(ubyte[])quote(input, type))); } Does the output of the program look correct according to UTF-8? Then your compiler has produced a correct program. :) Here is the output I get on SL6.1 compiled with dmd v2.065-devel-41ebb59: Quote type b: 60 20 60 Quote type d: 22 20 22 Quote type f: c2 ab 20 c2 bb Quote type s: 27 20 27 Quote type t: e2 80 9c 20 e2 80 9d I trust the correctness of this feature of D so much that I am too lazy to check whether those code units correspond to the intended Unicode characters. :) Ali
Dec 18 2013
parent reply Hugo Florentino <hugo acdam.cu> writes:
On Wed, 18 Dec 2013 10:05:49 -0800, Ali Çehreli wrote:
 On 12/18/2013 05:32 AM, Hugo Florentino wrote:

 output is incorrect in Windows (even with unicode-supporting
 fonts).
Is the code page also set to UTF-8? I think you must issue the command 'chcp 65001'. I have changed your program to print the code units individually in hex. I changed the test string to a single space character so that you can identify it easily on the output: import std.stdio, std.string, std.encoding; trusted string quote(in string str, in char chr = 'd') pure { switch(chr) { case 'b': return '`' ~ str ~ '`'; // backtick case 'd': return `"` ~ str ~ `"`; // double case 'f': return `«` ~ str ~ `»`; // french case 's': return `'` ~ str ~ `'`; // single case 't': return `“` ~ str ~ `”`; // typographic default: return `"` ~ str ~ `"`; // double } } void main() { char[] a = ['b', 'd', 'f', 's', 't']; auto input = " "; foreach(char type; a) writeln(format("Quote type %s:\t%(%02x %)", type, cast(ubyte[])quote(input, type))); } Does the output of the program look correct according to UTF-8? Then your compiler has produced a correct program. :) Here is the output I get on SL6.1 compiled with dmd v2.065-devel-41ebb59: Quote type b: 60 20 60 Quote type d: 22 20 22 Quote type f: c2 ab 20 c2 bb Quote type s: 27 20 27 Quote type t: e2 80 9c 20 e2 80 9d I trust the correctness of this feature of D so much that I am too lazy to check whether those code units correspond to the intended Unicode characters. :) Ali
Changing the codepage worked indeed. Thanks. Now, how could I do that programmatically, so that if my application runs on a system with a different codepage, the output looks correct? After all, not all users feel comfortable typing unknown commands.
Dec 18 2013
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/18/2013 01:17 PM, Hugo Florentino wrote:

 Changing the codepage worked indeed. Thanks.
 Now, how could I do that programmatically, so that if my application
 runs on a system with a different codepage, the output looks correct?
It is not solvable in general because stdout is nothing but a stream that accepts characters. (Well, UTF-8 code units when it comes to Unicode). The program can detect or assume that it is running in a console and change that environment if it is allowed to do so. Google searches like "change code page console programmatically windows" produce some answers but I don't have any experience. :) Ali
Dec 18 2013
parent reply Simon <s.d.hammett gmail.com> writes:
On 18/12/2013 22:11, Ali Çehreli wrote:
 On 12/18/2013 01:17 PM, Hugo Florentino wrote:

  > Changing the codepage worked indeed. Thanks.
  > Now, how could I do that programmatically, so that if my application
  > runs on a system with a different codepage, the output looks correct?

 It is not solvable in general because stdout is nothing but a stream
 that accepts characters. (Well, UTF-8 code units when it comes to Unicode).

 The program can detect or assume that it is running in a console and
 change that environment if it is allowed to do so.

 Google searches like "change code page console programmatically windows"
 produce some answers but I don't have any experience. :)

 Ali
Call: SetConsoleOutputCP(65001); Works for me on win7 64bit. Not sure how far back it's supported though. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686036(v=vs.85).aspx You might need your own definition of it, don't know it's available in the phobos windows bit. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Dec 19 2013
parent Hugo Florentino <hugo acdam.cu> writes:
On Thu, 19 Dec 2013 19:38:20 +0000, Simon wrote:
 Call:

   SetConsoleOutputCP(65001);

 Works for me on win7 64bit. Not sure how far back it's supported 
 though.
Interesting, thanks.
Dec 19 2013