www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Range functions expand char to dchar

reply "Matt Kline" <matt bitbashing.io> writes:
After seeing Walter's DConf presentation from this year, I've 
been making an effort to use range algorithms more, such as using 
chain() and joiner() as an alternative to array concatenation and 
std.array.join.

Unfortunately, doing so with strings has been problematic, as 
these algorithms expand strings into dstrings.

An example:

import std.algorithm;
import std.range;
import std.stdio;
import std.regex;

void main()
{
     // One would expect this to be a range of chars
     auto test = chain("foo", "bar", "baz");
     // prints "dchar"
     writeln(typeid(typeof(test.front)));

     auto arr = ["foo", "bar", "baz"];
     auto joined = joiner(arr, ", ");
     // Also "dchar"
     writeln(typeid(typeof(joined.front)));

     // Problems ensue if one assumes the result of joined is a 
char string.
     auto r = regex(joined);
     matchFirst("won't compile", r); // Compiler error
}

Whether by design or by oversight, this is quite undesirable. It 
violates the principle of least astonishment (one wouldn't expect 
joining a bunch of strings would result in a dstring), causing 
issues such as the one shown above. And, if I aim to use UTF-8 
consistently throughout my applications (see 
http://utf8everywhere.org/), what am I to do?
Sep 08 2015
next sibling parent reply "Matt Kline" <matt bitbashing.io> writes:
On Tuesday, 8 September 2015 at 17:52:13 UTC, Matt Kline wrote:

 Whether by design or by oversight, this is quite undesirable.
My apologies for double-posting, but is this intended behavior, or an unfortunate consequence of the metaprogramming used to determine the resulting type of these range functions?
Sep 08 2015
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08-Sep-2015 20:57, Matt Kline wrote:
 On Tuesday, 8 September 2015 at 17:52:13 UTC, Matt Kline wrote:

 Whether by design or by oversight, this is quite undesirable.
My apologies for double-posting, but is this intended behavior, or an unfortunate consequence of the metaprogramming used to determine the resulting type of these range functions?
Historical consequence of enabling auto-decoding for arrays of char and wchar (and only those). Today it's recognized that one should either wrap an array of char as code unit range or code point range explicitly using byUTF helper. -- Dmitry Olshansky
Sep 08 2015
prev sibling parent reply anonymous <anonymous example.com> writes:
On Tuesday 08 September 2015 19:52, Matt Kline wrote:

 An example:
 
 import std.algorithm;
 import std.range;
 import std.stdio;
 import std.regex;
 
 void main()
 {
      // One would expect this to be a range of chars
      auto test = chain("foo", "bar", "baz");
      // prints "dchar"
      writeln(typeid(typeof(test.front)));
 
      auto arr = ["foo", "bar", "baz"];
      auto joined = joiner(arr, ", ");
      // Also "dchar"
      writeln(typeid(typeof(joined.front)));
 
      // Problems ensue if one assumes the result of joined is a 
 char string.
      auto r = regex(joined);
      matchFirst("won't compile", r); // Compiler error
 }
 
 Whether by design or by oversight,
By design with regrets: http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.com
 this is quite undesirable. It 
 violates the principle of least astonishment (one wouldn't expect 
 joining a bunch of strings would result in a dstring),
The result is a range of dchars actually, strictly not a dstring.
 causing 
 issues such as the one shown above. And, if I aim to use UTF-8 
 consistently throughout my applications (see 
 http://utf8everywhere.org/), what am I to do?
You can use std.utf.byCodeUnit to get ranges of chars: ---- import std.algorithm; import std.array: array; import std.range; import std.stdio; import std.regex; import std.utf: byCodeUnit; void main() { auto test = chain("foo".byCodeUnit, "bar".byCodeUnit, "baz".byCodeUnit); pragma(msg, typeof(test.front)); /* "immutable(char)" */ auto arr = ["foo".byCodeUnit, "bar".byCodeUnit, "baz".byCodeUnit]; auto joined = joiner(arr, ", ".byCodeUnit); pragma(msg, typeof(joined.front)); /* "immutable(char)" */ /* Having char elements isn't enough. Need to turn the range into an array via std.array.array: */ auto r = regex(joined.array); matchFirst("won't compile", r); /* compiles */ } ---- Alternatively, since you have to materialize `joined` into an array anyway, you can use the dchar range and make a string from it when passing to `regex`: ---- import std.algorithm; import std.conv: to; import std.stdio; import std.regex; void main() { auto arr = ["foo", "bar", "baz"]; auto joined = joiner(arr, ", "); pragma(msg, typeof(joined.front)); /* "dchar" */ /* to!string now: */ auto r = regex(joined.to!string); matchFirst("won't compile", r); /* compiles */ } ----
Sep 08 2015
parent reply "Matt Kline" <matt bitbashing.io> writes:
On Tuesday, 8 September 2015 at 18:21:34 UTC, anonymous wrote:
 By design with regrets:
 http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.com

 On Thursday, 25 September 2014 at 19:40:29 UTC, Walter Bright 
 wrote:
 Top of my list would be the auto-decoding behavior of 
 std.array.front() on character arrays. Every time I'm faced 
 with that I want to throw a chair through the window.
At least I'm not alone. :)
 You can use std.utf.byCodeUnit to get ranges of chars:
A bit verbose, but I suppose that will do.
     /* Having char elements isn't enough. Need to turn the 
 range into an
     array via std.array.array: */
     auto r = regex(joined.array);
     matchFirst("won't compile", r); /* compiles */
 }
If we have a range of char elements, won't that do? regex() uses the standard isSomeString!S constraint to take any range of chars.
Sep 08 2015
next sibling parent anonymous <anonymous example.com> writes:
On Tuesday 08 September 2015 20:28, Matt Kline wrote:

 If we have a range of char elements, won't that do? regex() uses 
 the standard isSomeString!S constraint to take any range of chars.
isSomeString!S doesn't check if S is a range. It checks if S is "some string", meaning: "Char[], where Char is any of char, wchar or dchar, with or without qualifiers". http://dlang.org/phobos/std_traits.html#isSomeString Checking for ranges would be done with isInputRange, isForwardRange, etc. http://dlang.org/phobos/std_range_primitives.html
Sep 08 2015
prev sibling parent "Freddy" <Hexagonalstar64 gmail.com> writes:
On Tuesday, 8 September 2015 at 18:28:40 UTC, Matt Kline wrote:
 A bit verbose, but I suppose that will do.
You could use map --- import std.algorithm : map; import std.utf : byCodeUnit; import std.array : array; auto arr = ["foo", "bar", "baz"].map!(a => a.byCodeUnit).array; ---
Sep 09 2015