digitalmars.D.bugs - [Issue 8384] New: Poor wchar/dchar* to string conversion support
- d-bugmail puremagic.com (31/31) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (56/56) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (18/22) Jul 13 2012 Is it something that was fixed recently (within the last two weeks)? My
- d-bugmail puremagic.com (7/8) Jul 13 2012 Sorry about that, misread your example. I guess, ideally, conversion bet...
- d-bugmail puremagic.com (24/24) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (41/41) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (7/7) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (11/11) Jul 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (12/13) Jul 13 2012 test!
- d-bugmail puremagic.com (13/13) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (17/17) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (21/28) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (17/17) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (14/14) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (12/12) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (11/11) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (7/7) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (13/15) Aug 15 2012 I don't think this argument is valid, because it assumes that all D user...
- d-bugmail puremagic.com (17/17) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (8/9) Aug 15 2012 a compile-time error on all pointer types?
- d-bugmail puremagic.com (12/12) Aug 15 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (10/10) Jan 13 2013 http://d.puremagic.com/issues/show_bug.cgi?id=8384
- d-bugmail puremagic.com (9/10) Jan 13 2013 http://d.puremagic.com/issues/show_bug.cgi?id=8384
http://d.puremagic.com/issues/show_bug.cgi?id=8384 Summary: Poor wchar/dchar* to string conversion support Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: thecybershadow gmail.com --- Comment #0 from Vladimir Panteleev <thecybershadow gmail.com> 2012-07-13 05:23:29 PDT --- import std.conv; import std.string; unittest { static void test(T)(T lp) { assert(format("%s", lp) == "Hello, world!"); assert(to!string(lp) == "Hello, world!"); } test("Hello, world!" .ptr); test("Hello, world!"w.ptr); test("Hello, world!"d.ptr); } wchar* conversion is commonly needed for Windows programming, as UTF-16 is the native encoding for Unicode Windows API functions. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 Jonathan M Davis <jmdavisProg gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jmdavisProg gmx.com --- Comment #1 from Jonathan M Davis <jmdavisProg gmx.com> 2012-07-13 12:00:53 PDT --- So, you expect %s on a pointer to give you the string that it points to? Why? It's pointer, not a string. It's going to convert the pointer. That works as expected. to!string should take null-terminated string and give you a string, and it does that. This code passes: import std.conv; import std.string; void main() { static void test(T)(T lp) { assert(to!string(lp), "hello world"); } test("Hello, world!" .ptr); test("Hello, world!"w.ptr); test("Hello, world!"d.ptr); } So, I'd say that as far as your code goes, there's nothing wrong with it. It functions exactly as expected. What _doesn't_ work is this: import std.conv; import std.string; void main() { static void test(T)(T lp) { assert(to!wstring(lp), "hello world"); assert(to!dstring(lp), "hello world"); } test("Hello, world!" .ptr); test("Hello, world!"w.ptr); test("Hello, world!"d.ptr); } The code doesn't even compile, giving these errors: /home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(819): Error: incompatible types for ((cast(immutable(dchar)[])_adDupT(&_D12TypeInfo_Aya6__initZ,value[cast(ulong)0..strlen(cast(const(char*))value)])) ? (null)): 'immutable(dchar)[]' and 'string' /home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(268): Error: template instance std.conv.toImpl!(immutable(dchar)[],immutable(char)*) error instantiating q.d(8): instantiated from here: to!(immutable(char)*) q.d(11): instantiated from here: test!(immutable(char)*) q.d(8): Error: template instance std.conv.to!(immutable(dchar)[]).to!(immutable(char)*) error instantiating q.d(11): instantiated from here: test!(immutable(char)*) q.d(11): Error: template instance q.main.test!(immutable(char)*) error instantiating -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #2 from Vladimir Panteleev <thecybershadow gmail.com> 2012-07-13 13:36:05 PDT ---to!string should take null-terminated string and give you a string, and it does that. This code passes:Is it something that was fixed recently (within the last two weeks)? My two-week-old dmd git build and dpaste still print offsets for wchar* and dchar*: http://dpaste.dzfl.pl/26a2b284So, you expect %s on a pointer to give you the string that it points to? Why?I think that, before all else, we should be looking for good reasons why format("%s", foo) and to!string(foo) produce different results. Why should one format the offset and the other do a conversion? Second, I believe that the principle of least surprise is making this case rather clear: if the programmer tries to print a char*, it's almost certain that they want to print the null-terminated string at the given address, rather than a hexadecimal representation of the address (which are rarely useful to the end-user). Generic code is the only exception I can think of, in which case a cast to void* is in order.What _doesn't_ work is this:I think this should call the appropriate toUTFx functions from std.utf. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #3 from Vladimir Panteleev <thecybershadow gmail.com> 2012-07-13 13:42:17 PDT ---I think this should call the appropriate toUTFx functions from std.utf.Sorry about that, misread your example. I guess, ideally, conversion between any pair of {|w|d}{char*|string} should work. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #4 from Jonathan M Davis <jmdavisProg gmx.com> 2012-07-13 13:59:09 PDT --- format and writeln are supposed to behave the same, because they both operate on format strings (they _don't_ currently behave 100% the same, but format's current implementation will be replaced with the new xformat's implementation in a few months - after the "scheduled for deprecation" time period). to!string is an entirely different beast. std.conv.to is asking for an explicit conversion to string, whereas format and writeln are converting according to the format specifiers, and %s indicates the default string representation of the type. char*, wchar*, and dchar* are pointers - _not_ strings - and should not be treated as strings. Pointers print their address with %s. Making char*, wchar*, and dchar* print themselves as strings would be inconsistent with other pointer types, and operating on char*, wchar*, and dchar* should be discouraged, not encouraged. to!string is treated differently, because you're asking for an explicit conversion, and we _do_ need to be able to convert null-terminated strings to D strings. So, while I can see your point, I really don't think that having format or writeln treat char*, wchar*, or dchar* as null-terminated strings is a good idea. We should provide a means of converting them to D strings but not do anything to encourage using them as-is without converting them. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 Vladimir Panteleev <thecybershadow gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Poor wchar/dchar* to string |std.conv.to should allow |conversion support |conversion between any pair | |of | |string/wstring/dstring/char | |*/wchar*/dchar* --- Comment #5 from Vladimir Panteleev <thecybershadow gmail.com> 2012-07-13 14:25:36 PDT --- OK, fair enough. I've updated the enhancement request's title according to my previous comment. Test: ----------------------------------------------------------------------------- import std.conv; void test1(T)(T lp) { test2!( string)(lp); test2!(wstring)(lp); test2!(dstring)(lp); test2!( char*)(lp); test2!( wchar*)(lp); test2!( dchar*)(lp); } void test2(D, S)(S lp) { D dest = to!D(lp); assert(to!string(dest) == "Hello, world!"); } unittest { test1("Hello, world!" ); test1("Hello, world!"w); test1("Hello, world!"d); test1("Hello, world!" .ptr); test1("Hello, world!"w.ptr); test1("Hello, world!"d.ptr); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #6 from Vladimir Panteleev <thecybershadow gmail.com> 2012-07-13 14:31:04 PDT --- Oh, I forgot about constness. I guess that raises the number of combinations to (2*3*3)^2 = 324. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 klickverbot <code klickverbot.at> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |code klickverbot.at --- Comment #7 from klickverbot <code klickverbot.at> 2012-07-13 14:37:07 PDT --- Hooray for using "static" foreach to conveniently enumerate all the cases to test! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #8 from Jonathan M Davis <jmdavisProg gmx.com> 2012-07-13 14:48:31 PDT ---Hooray for using "static" foreach to conveniently enumerate all the cases totest! Yeah. I do that all of the time when I have to test with multiple types (especially with strings), and I always push for string-related tests to do that when I see that someone is looking to submit code to Phobos for a function that takes one or more strings as templated types, and their tests don't do that. It's just one of those things that everyone who writes much in the way of unit tests in D should learn and know about. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #9 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 13:24:08 PDT --- Another case of confusion due to format treating C strings as pointers: http://stackoverflow.com/q/11975353/21501 I still think that the current behavior, regardless of how much it makes sense from a design/consistency/orthogonality/etc. perspective, is simply not useful and fails the principle of least surprise in most expected cases. I strongly believe that we should either forbid passing char pointers to format/writeln (and force the user to cast to void* or convert to a D string), or print them as C null-terminated strings. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #10 from Jonathan M Davis <jmdavisProg gmx.com> 2012-08-15 13:35:28 PDT --- char* acts identically to the other pointer types, and I fully believe that it should stay that way. We've pretty much removed all of the D features which involved either treating a string as char* or a char* as a string (including disallowing implicit conversion of string to const char*). The _only_ feature that the language has which supports that is the fact that string literals have a null character one past their end and will implicitly convert to const char*. It would be a huge mistake IMHO to support doing _anything_ with character pointers which treats them as strings without requiring an explicit conversion of some kind. Anyone who continues to think of char* as being a string in D is just asking for trouble. They need to learn to use strings correctly. If you really want to use char* as a string in functions like format or writeln, then simply either use to!string or ptr[0 .. strln(ptr)]. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #11 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 13:48:30 PDT --- Sorry, I don't think that your categorical point of view is constructive. As long as D will interface with C libraries and programs, people will continue to attempt to use C strings together or in place of D strings, and issues like the above will continue to appear. How often would a typical D user want to print / format the address of a character, versus the null-terminated string at that address?It would be a huge mistake IMHO to support doing _anything_ with character pointers which treats them as strings without requiring an explicit conversion of some kind.Why would it be a mistake? What exactly do we lose by allowing writeln/format to understand C strings?Anyone who continues to think of char* as being a string in D is just asking for trouble.What kind of trouble?They need to learn to use strings correctly.D printing an address when text was expected will sooner generate a "D sucks" reaction than a "Oops, I need to learn to use strings correctly" one.If you really want to use char* as a string in functions like format orwriteln, then simply either use to!string or ptr[0 .. strln(ptr)]. That's not really simple, considering some spots where that (verbose) modification needs to be made would be discovered only late at runtime, and even then the actual problem is not obvious to identify (as seen in the SO question above). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #12 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 13:56:00 PDT --- I would like to stress out a point that I hope could clear up my view of the logic that writeln/format should use. Printing/formatting memory addresses is extremely rarely useful! Except for some dirty debugging, I can't imagine a case where the user expects that passing a pointer to something to format would yield the hex representation of that address. I believe that printing a pointer as a hex address should be the fallback, last-resort behavior, if there is no better representation for the said type. (This also allows discussion of calling toString() on struct pointers.) For the rare case that the user intends to actually print a pointer, this is easily accomplished by a cast to size_t and using the appropriate hex format specifier. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #13 from Jonathan M Davis <jmdavisProg gmx.com> 2012-08-15 13:57:15 PDT --- Anyone who does not understand that char* is _not_ a string will continue to make mistakes like trying to concatenate a char* to a string ( http://stackoverflow.com/questions/11914070/why-can-i-not-concatenate-a-constchar-to-a-string-in-d ) or try and pass string directly to a C function. They will constantly run into problems when dealing with strings. char* is _not_ a string and should not be treated as such. Treating it as a string with something like writeln will just help further the misconception that char* is a string and hinder people learning and using D. D programmers need to understand the difference between char* and string. char* should _not_ be treated as special, because it's not. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #14 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 14:01:42 PDT --- First of all, you are conflating ignorance between the two string types with my arguments. Users who are aware that D has its own way of handling strings are still open to making frustrating mistakes. Second, getting unexpected output is not a good way to teach people about this. Hence my earlier proposal to make writeln/format REJECT char pointer types, on the basis that the user's intention is ambiguous (I don't think so personally, but obviously that's just my opinion). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #15 from Jonathan M Davis <jmdavisProg gmx.com> 2012-08-15 14:06:49 PDT --- I'm saying that we shouldn't treat char* differently from int* just because some newbies expect char* to act like a string. And if you know D, then you know that char* is _not_ a string, and I don't see how you could expect it to be treated as one. Either making char* act like a string or disallowing printing it would make it act differently from other pointer types just to appease the folks who mistakingly think that char* is a string. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #16 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 14:08:44 PDT --- Well, then how about removing the pointer-printing feature entirely, and issue a compile-time error on all pointer types? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #17 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 14:12:50 PDT ---And if you know D, then you know that char* is _not_ a string, and I don't see how you could expect it to be treated as one.I don't think this argument is valid, because it assumes that all D users are always aware of the types they pass to writeln/format. In the SO case, the argument is a function result, and the function's return type is not explicitly written in the user's code. People often expect the compiler to shout at them if they try to pass incompatible types to a function. writeln/format accept char pointers, but ultimately do something with them that in 99% of cases is simply not useful, and put the user in search of their mistake all across the data flow. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 Adam D. Ruppe <destructionator gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |destructionator gmail.com --- Comment #18 from Adam D. Ruppe <destructionator gmail.com> 2012-08-15 14:34:54 PDT --- I think rejecting might be the best option because if you treat it as a string, what if it doesn't have a 0 terminator? That could easily happen if you pass it a pointer to a D string. I don't think that is technically un- safe, but it could be a problem anyway to get an unexpected crash because of it. At least with to!string(char*) you might think about it for a minute and avoid the problem. So on one hand, I think it should just work, but on the other hand the compile time error might be the most sane. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #19 from Jonathan M Davis <jmdavisProg gmx.com> 2012-08-15 14:40:14 PDT ---Well, then how about removing the pointer-printing feature entirely, and issuea compile-time error on all pointer types? So, you're suggesting that we remove a useful feature because newbies coming from C/C++ keep mistakingly thinking that char* is a string? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #20 from Vladimir Panteleev <thecybershadow gmail.com> 2012-08-15 14:44:20 PDT --- Your formulation is misrepresenting the weight of the scales. Please seriously take into account the overall benefit for D for both decisions. The feature is nearly useless and more harmful, and "newbies coming from C/C++" is, again, a misrepresentation as discussed above. It is also incorrect - someone used to e.g. using SDL bindings on another language may expect that the types returned by the binding would be compatible with the language's native functionality. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 15 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384 Andrej Mitrovic <andrej.mitrovich gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrej.mitrovich gmail.com --- Comment #21 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2013-01-13 10:34:43 PST --- *** Issue 6157 has been marked as a duplicate of this issue. *** -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 13 2013
http://d.puremagic.com/issues/show_bug.cgi?id=8384 --- Comment #22 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2013-01-13 10:35:51 PST --- (In reply to comment #21)*** Issue 6157 has been marked as a duplicate of this issue. ***FYI: http://d.puremagic.com/issues/show_bug.cgi?id=6157 has an experimental implementation in the attachment (for conv.to), but I'm not an expert on things unicode. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 13 2013