digitalmars.D.bugs - [Issue 5173] New: std.process.shell cannot handle non-UTF8 output
- d-bugmail puremagic.com (57/57) Nov 05 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5173
 - d-bugmail puremagic.com (6/6) Nov 05 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5173
 - d-bugmail puremagic.com (10/10) Nov 05 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5173
 
http://d.puremagic.com/issues/show_bug.cgi?id=5173
           Summary: std.process.shell cannot handle non-UTF8 output
           Product: D
           Version: D2
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: lars.holowko gmail.com
PDT ---
std.process.shell dies with an exception when the utility returns UTF-16.
for example:
import std.process, std.stdio, std.string;
int main(string[] args)
{
    auto output = shell("wmic NTDOMAIN GET DomainName /value");
    writefln("Output: %s", output);
    return 0;
}
produces this output:
dchar decode(in char[], ref size_t): Invalid UTF-8 sequence [255, 254, 13, 0,
10, 0, 13, 0, 10, 0, 68, 0, 111, 0, 109, 0, 97, 0, 105, 0, 110, 0, 78, 0, 97,
0, 109, 0, 101, 0, 61, 0, 13, 0, 10, 0, 13, 0, 10, 0, 13, 0, 10, 0] around
index 0
wmic's output looks like UTF-16(little endian).
As a work-around, if I modify std.process.shell slightly to use a wstring
instead:
import std.array, std.random, std.file, std.format, std.exception;
wstring shell2(string cmd)
{
    auto a = appender!string();
    foreach (ref e; 0 .. 8)
    {
        formattedWrite(a, "%x", rndGen.front);
        rndGen.popFront;
    }
    auto filename = a.data;
    scope(exit) if (exists(filename)) remove(filename);
    errnoEnforce(system(cmd ~ "> " ~ filename) == 0);
    return readText!wstring(filename);
}
things seem to work for this case. But a proper fix would be to make readText
try to determine the encoding based on the prefix and then do the necessary
conversion before calling std.utf.validate.
readText currently looks like this;
S readText(S = string)(in char[] name)
{
    auto result = cast(S) read(name);
    std.utf.validate(result);
    return result;
}
-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
 Nov 05 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5173 PDT --- forgot to mention: this is on 2.050 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
 Nov 05 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5173 PDT --- Created an attachment (id=801) replacement std.file.readText that would fix the issue the attached std.file.readText function implements uses the UTF encoding detection "algorithm" described in TDPL and does the necessary conversions to fix the described bug. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
 Nov 05 2010








 
 
 
 d-bugmail puremagic.com 