digitalmars.D - ranges of characters and the overabundance of traits that go with them

Jonathan M Davis via Digitalmars-d (398/398) Mar 11 2017 Okay, I'm sorry, but this is a wall of text, and I don't know how to

Jack Stouffer (38/64) Mar 12 2017 First off, I'd like to point out that creating specific overloads

Jonathan M Davis via Digitalmars-d (38/69) Mar 12 2017 I completely agree that in general, we should avoid implicit conversions

Jack Stouffer (7/9) Mar 12 2017 Sorry wrote that at 4 in the morning. Should have been clearer.

Jonathan M Davis via Digitalmars-d (47/56) Mar 12 2017 Yes, they clutter up code, but I don't know how reasonable it is to keep

Jonathan M Davis via Digitalmars-d (4/6) Mar 12 2017 In case it wasn't clear from the context, I meant specifically with

Andrea Fontana (2/8) Mar 13 2017 Very well-written and interesting post!

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

Okay, I'm sorry, but this is a wall of text, and I don't know how to
actually make it short. If you really want the TLDR version, look for the
=='s at the bottom which mark off the conclusion, but you're not going to
understand the reasoning if you skip the wall of text. Now, to the post...

Recently, there was a PR submitted for Phobos which tried to add
isStringLike to std.traits. There was some debate about what the name should
be, but ultimately, Walter pointed out that we really needed to step back
and examine all of the various traits that we have for strings and related
types in Phobos before we add anything like this, regardless of the name.
So, I've been mulling over this and looking at the current situation and
thought I should post about it.

The PR in question: https://github.com/dlang/phobos/pull/5137

First off, I think that the concept of anything "string-like" should be
tabled for now. String-like implies that it acts like a string and is
therefore at least attempting to be a drop-in replacement. Nothing in Phobos
currently tries to do that, and I think that discussions on that sort of
thing are better left for when we look at adding something like Andrei's
RCString to druntime or Phobos. What we're concerned with at the moment is
how strings, things that convert to strings, and ranges of characters are
dealt with in templated code.

The first thing to notice when when looking at strings and traits in Phobos
is that we have too many traits that are very similiar but not quite the
same. Each of them was added over time to solve a particular problem that
the existing traits didn't solve well enough or simply because it was a
concept that the submitter thought was one that we'd need to reuse. The
result is a mish-mash of traits with varying - and often confusing uses. We
have isSomeString, isNarrowString, isAutodecodableString, and
isConvertibleToString. And of course, related to those, we have isSomeChar,
ElementType, ElementEncodingType, isDynamicArray, isInputRange,
isForwardRange, isBidirectionalRange, isRandomAccessRange, hasSlicing,
isDynamicArray, isAggregateType, isStaticArray, is(T == enum),
is(T : const char[]), etc. all of which can be used in combination with or
instead of the main string traits. Exactly which combination of all of those
traits get used depends on what the function in question is trying to do,
and they're often used inconsistently. But I think that the main concern is
the various is*String traits. So,

isSomeString: True for any dynamic array of any character type as well as
any enum type whose base type is a dynamic array of any character type.

isNarrowString: The same as isSomeString except that arrays of dchar and
enum types with a base type that's a dynamic array of dchar do not qualify
as narrow strings.

isAutodecodableString: The same as isNarrowString except that it also
accepts user-defined types which implicitly convert to a dynamic array of
char or wchar.

isConvertibleToString: Any type that is _not_ a dynamic array of characters
but implicitly converts to a dynamic array of characters - so enums with
that as their base type, user-defined types which implicitly convert to such
dynamic arrays, and static arrays which implicitly convert to such dynamic
arrays.

Note that these are all similar but subtly different. Of particular note is
how this affects implicit conversions. Implicit conversions in generic code
are _bad_. It's trivial for a type to pass a template constraint due to an
implicit conversion and then not actually compile with the template - or
worse, to compile but have subtly wrong behavior. So, pretty much any time
that an implicit conversion is allowed with a template constraint, that
conversion needs to be forced. Once it's converted, then it can function as
the targetted type, because it _is_ the targetted type.

Honestly, I'm inclined to think that in most cases that conversion should be
done explicitly by the programmer rather than happening automatically,
because that's far less error-prone (e.g. this is what we do when we require
that you slice a static array to pass it to a range-based function rather
than having the range-based function do it for you). Unfortunately, that's
not always an option - e.g. when templatizing a function that previously
accepted string, if you don't want to break code, then you have to accept
anything that implicitly converts to string.

So, the first mistake here is that isSomeString accepts enums. Annoyingly,
that was actually my fault (Kenji actually fixed isSomeString years ago, and
at the time, I argued that it should accept enums, which in hindsight,
was dumb of me). :(

In some cases, having a templated function accept both strings and enums
_can_ work, but in most cases, it's going to be wrong. You can examine an
enum with a base type of string, but you can't mutate it or assign it
without risking either not compiling or ending up with a variable of an enum
type with an invalid value, and enums don't pass isInputRange even if their
base types is a range, so unlike strings, you can't use range-based
functions with them. So, having code like

auto foo(S)(S str)
    if(isSomeString!S)
{...}

is actually a recipe for disaster. It'll work with strings, but it likely
won't work properly with enums, and most programmers will likely forget to
test it with enums and won't catch the problem. Fortunately, I think that
isSomeString is used primarily in range-based code to differentiate strings
from other ranges, but it _is_ a concern that isSomeString accepts enums. As
such, I think that code using isSomeString in a context where an enum would
pass needs to either be changed so that the function accepts only strings
and uses !is(S == enum) to prevent enums from passing _or_, it should be
templatized in a manner which accepts anything which implicitly converts to
a string type while simultaneously forcing the conversion. e.g.

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

That gives you essentially the same semantics as

auto foo(string str)
{...}

except that it accepts any character type (as well as any constness for the
character type), and the conversion is done cleanly without needing any
overloads.

Ideally, isSomeString would be fixed to not accept enums, but that risks
breaking code, and I'd argue that in most cases where the fact that a
template accepts enums would be a problem, templatizing on the character
type is a better solution anyway, because it works with more code and forces
the conversion without needing any extra code.

Now, most of the code that would break if we changed isSomeString is
probably buggy anyway, but we would be risking breaking code if we made the
change. So, if we started from scratch, I would definitely do it, but I
don't know that we can reasonably do so now. We could of course do what
std.conv does internally and add isExactSomeString, which is what
isSomeString _should_ be, but that would mean adding yet another trait,
whereas ideally we'd be consolidating them. However, it's probably a better
approach than changing isSomeString if we don't want to risk breaking code.
Regardless, isSomeString's documentation should be updated to make this
problem clearer.

isNarrowString has the same problem as isSomeString, but AFAIK, its only
real use is in range-based code where the fact that it accepts enums isn't a
problem. So, the risk of breaking code when making it not accept enums is
probably much lower than doing that with isSomeString, but the benefit is
also lower, since the risk of it being used in a buggy manner is less. Aside
from the enum issue though, I think that isNarrowString is fine as-is.

However, I think that isAutodecodableString should definitely be deprecated,
and I think that we should at least consider deprecating
isConvertibleToString. But I'll go into that below, since to understand
that, I think that we first need to look at what we're trying to accomplish
with these traits.

In general, when you templatize a function involving strings, you're doing
one of

- create a generic function that works on any range (which just so happens
  to include strings)
- create a function that specifically operates on strings (but is templated
  so that it can operate on different constness or operate on different
  character types)
- create a function that specifically operates on a range of characters,
  which happens to include strings
- create a function that operates on strings or ranges of characters or
  operates on types that implicitly convert to a string type
- templatize a function that used to take string (or took any dynamic array
  of characters) and make it take any range of characters

If you're creating a generic function, it may not need to do anything
special for strings or arbitrary ranges of characters - this is especially
true if byCodeUnit has been used, and no decoding is taking place - but
sometimes, it may be necessary to specialize on narrow strings to avoid
unnecessary decoding (and the function may or may not then do any decoding
explicitly, depending on what it's trying to do). I think that that's
largely covered by isNarrowString. The constraint usually makes sure that
you're dealing with a range, and ideally, the overload dealing with
isNarrowString would be inside a static if rather than at the top level. If
it's at the top level, then there's some risk of an enum getting through,
because the constraint assumed that isNarrowString was enough to guarantee
that the type was a range, but that goes back to whether we should make
isSomeString and isNarrowString accept enums, and switching to a simpler,
top-level constraint and making the isNarrowString bit be internal with a
static if largely solves the problem in this case. So, I don't think that
this use case requires any changes to any of the string-related traits right
now.

As for functions that specifically operate on strings. As I said above when
discussing isSomeString, I think that the correct thing to do in their case
is to just templatize on the character type

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

That does mean that the function will accept types that implicit convert to
strings in addition to strings, but if you're looking for it to specifically
operate on strings, that should be just as okay as it would be if the
function explictly took string instead of being templated. Any code that
wants to avoid that though can do

auto foo(S)(S str)
    if(isSomeString!S && !(S == enum))
{...}

but there usually isn't any point in doing that instead of templatizing on
the character type. It just makes the function work with less code.

As for functions that specifically operate on ranges of characters (and
operate on strings as part of that), I think that they're pretty much in the
same boat as generic range-based functions. They have isNarrowString to
check if they want to specialize on narrow strings, but they also have
byCodeUnit, which they can use without checking isNarrowString, completely
avoiding having to check for narrow strings, though whether using byCodeUnit
is appropriate or not depends on what the function is doing. Still, I don't
think that anything special needs to be done with the existing traits for
this use case.

The last two cases are essentially the same (in both cases, you end up with
a function that accepts types that implicitly convert to a string type in
addition to accepting arbitrary ranges of characters). The primary
difference is whether you're creating a new function or templatizing an
existing one. The reason that I separated them is because I'm inclined to
argue that in most cases, when you're creating a range-based function, you
simply shouldn't be making it accept implicit conversions. It's clearer and
less error-prone for the caller to do the conversion. So, I don't think that
we normally should be creating such functions. However, we _do_ need to
templatize existing functions that take string, and what we need to do to
make that work is pretty much the same thing that we'd need to do when
creating a templated function that accepts types that implicitly convert to
a string type in addition to accepting arbitrary ranges of characters.

isConvertibleToString was created for this case. Basically, you have a
function that accepts string

auto foo(string str) {..}

and you want to templatize it, so you create an overload for ranges of
characters and another for types that implicitly convert to string. e.g.

auto foo(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R) &&
       !isConvertibleToString!S)
{...}

auto foo(T)(auto ref T convertible)
    is(isConvertibleToString!T)
{
    return foo!(StringTypeOf!T)(convertible);
}

Then the types that work as-is are accepted by the first overload, and those
that need to be converted are converted by the second overload and then
passed to the first.

Note the auto ref on the second overload. This is so that if a static array
is passed as a variable, then it won't be copied. And since we don't want
the auto ref on the overload taking ranges (since that would potentially
break code), we're forced to have two separate functions rather having a
single function that has an internal static if.

What I would _like_ to see done here is to have the convertible overload be
something more like

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

This makes the conversion automatic and negates the need for
isConvertibleToString. However, this overload would also accept strings,
which means that you either have to duplicate code between the functions, or
you need a helper function. e.g.

auto foo(R)(R range)
    is(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R))
{...}

And because one of the overloads is templated on character type, whereas
the other is templated on the entire type, there's no way to combine them.
If anything, it's harder to combine them than in the case where
isConvertibleToString is used. So, unless someone sees something I'm missing
here or comes up with a great idea that has escaped me (which is quite
possible), I think that we're stuck with isConvertibleToString so that we
can make functions that accept string accept arbitrary ranges of characters
without breaking code.

Alternatively, we _could_ deprecate the overload which takes
isConvertibleToString and ultimately make the function require explicit
conversions to be used like we would normally do with a range-based function
that we were writing from scratch, but you'd still need
isConvertibleToString during the deprecation phase. And I'm not sure that
it's worth going through that deprecation phase anyway.

Unfortunately, however, using this technique with a function that returns a
portion of the range is going to be wrong when it's passed a static array by
value (or if it's passed a user-defined type that implicitly converts to a
string but does so in a manner similar to slicing static arrays such that
the original variable can't go out of scope before the string does without
having the string refer to memory that is no longer valid). In the case
where you have

auto foo(C)(C[] str)
    if(isSomeChar!C)
{...}

the conversion takes place at the callsite and thus the return value is
potentially usable before the original goes out of scope - which is the
exact same situtation that we had when the function was not templated and
explicitly took string, whereas with

auto foo(T)(auto ref T convertible)
    is(isConvertibleToString!T)
{
    return foo!(StringTypeOf!T)(convertible);
}

unless it's passed by reference, it's going to go out of scope too early.
So, the safe thing to do would be to change it to using the three function
solution discussed above:

auto foo(R)(R range)
    is(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    is(isInputRange!R &&
       isSomeChar!(ElementType!R))
{...}

in which case, the only reason to keep isConvertibleToString is that in the
case where you know that the return value is not a piece of the original
range, then the two function solution works safely. But given how easy it is
to screw up with the two function solution (e.g. there's been confusion on
github about std.file functions that use this idiom and why they have auto
ref), I'm inclined to argue that the 3 function solution is what should
become the "official" way to templatize a function that accepts string and
make it accept arbitrary ranges of characters in addition to what it
accepted before. And in that case, it would make sense to deprecate
isConvertibleToString and thus reduce the number of stray string traits that
programmers have to understand use correctly.

So, that leaves us with isAutodecodableString. Notice that no use cases have
used it. The only use case I'm aware that makes any sense is byCodeUnit,
which does not work with static arrays (presumably, because they're not
ranges and slicing them for byCodeUnit is likely to be unsafe - though how
unsafe it is exactly the same as what we've been discussing for templatizing
functions that take string). And actually, I'm inclined to think that
byCodeUnit shouldn't have accepted types that implicitly converted to string
and should have required the conversion just like other templated code
normally does, but changing it now risks breaking code, and in the case of
something like an enum, it's probably handy, since byCodeUnit is the sort of
function that's going to be used at the beginning of a chain of ranges. That
being said, if byCodeUnit were to use the three function solution that I
just talked about for templatizing string functions, it could use static
arrays just as safely as any function that has a string parameter and
returns a slice of that string. So, isAutodecodable wouldn't be needed.

And if we didn't want to make that change,

isNarrowString!T || (isConvertibleToString&T && !isStaticArray!T)

could be used instead (the main reason that isConvertibleToString wasn't
used with byCodeUnit in the first place was that isConvertibleToString was
added later), or if we do decide to get rid of isConvertibleToString but
don't want byCodeUnit to accept static arrays, it would simple enough to
make its template constraint do the right thing without either
isAutodecodableString or isConvertibleToString.

In general though, I think that templated functions operating on ranges of
characters should either be accepting only ranges of characters and not
stuff that implicitly converts to them (as we typically do with range-based
functions), or they should accept arbitrary ranges of characters plus
everything that a function accepting strings explicitly would. So, I don't
think that using isAutodecodableString with byCodeUnit ultimately is the
right thing to do, particularly since it's used almost nowhere else. Aside
from isAutodecodableString's unit tests and byCodeUnit, the only other place
in Phobos that isAutodecodableString is used is a static if inside of
std.algorithm.comparison.equal which could have just as easily used
isNarrowString.

As such, I think that isAutodecodableString should definitely be deprecated,
and while it's more questionable with isConvertibleToString, given the
safety concerns, I think that we'd be better of deprecating it as well. And
fortunately, neither of them is particularly old, so not much code outside
of Phobos is going to be using them yet.

As for adding a new trait, which is what started this whole mess, the
issue that we have that a new trait would help solve is that any range-based
code that specifically operates on strings is going to end up with something
like

isInputRange!R && isSomeChar!(ElementType!R)

So, that could be repeated a fair bit in templated code. We _could_ shorten
that to something like isSomeCharRange!R, but as soon as you need a forward
range or greater, you're going to end up with something like

isForwardRange!R && isSomeCharRange!R

whereas right now it would be

isForwardRange!R && isSomeChar!(ElementType!R)

And that really doesn't add much value. Alternatively, it could also require
finite ranges (which is what the original PR proposed), but to make that
clear, you'd probably need something like isSomeFiniteCharRange rather than
isSomeCharRange, which eats away at its value even further (since it's even
longer), and it's often the case that we want to accept infinite ranges,
because we generally try and make algorithms lazy as much as we can.

So, I'm not at all convinced that we want to add a new trait for arbitrary
ranges of characters. And it's not like we're looking at adding something
like isIntegralRange. That sort of thing just doesn't scale, and beyond code
that only accepts basic input ranges, it doesn't even tend to make template
constraints much shorter. Regardless, as stated before, I think that talking
about a range of characters as being "string-like" is incorrect, because
strings are a lot more than finite, basic input ranges of characters.

============================================================================

So, in conclusion, with regards to isSomeString, isNarrowString,
isAutodecodableString, and isConvertibleToString, I propose that we

1. Deprecate isAutodecodableString and make the few places in Phobos that
   currently use it use isNarrowString or isConvertibleToString instead.

2. Strongly consider deprecating isConvertibleToString (given its safety
   concerns with static arrays) and instead use overloads that are
   templated on character type to accept implicit conversions to string
   rather than being templated on the entire type (the overload taking
   arbitrary ranges would of course still be templated on the entire type).

3. Consider changing isSomeString and isNarrowString so that they are false
   for enums - since they really should be - but we probably can't do that
   without breaking at least some code (albeit mostly code that's going to
   be buggy anyway). If we don't make that change, we should consider adding
   isExactSomeString to std.traits rather than just having it as internal to
   std.conv, and then for any non-range based code, we can tell folks to use
   isExactSomeString and avoid isSomeString if they really don't want to
   accept implicit conversions (whereas code accepting implicit conversions
   can just templatize on character type). Regardless, isSomeString and
   isNarrowString's documentation should be updated to make the problem
   with enums clear.

If it were entirely up to me, I would deprecate isAutodecodableString and
isConvertibleToString as soon as Phobos were fixed to not use them, but I'm
divided on whether I would fix isSomeString, since I don't know of a way to
do that without risking breaking code - even if it is likely to be code
that's buggy, and we'd prefer to not be breaking code without any kind of
deprecation phase.

And with regards to adding a new trait specifically for ranges of
characters, I don't think that it adds enough value to be worth it. It would
shorten some constraints, but there are many it would not, and it even the
ones that it would shorten wouldn't be shortened all that much.

- Jonathan M Davis

Mar 11 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
 ...

First off, I'd like to point out that creating specific overloads 
for alias this-ed structs is a bad idea, because you just have to 
ask the user to do this

MyCustomType { string b; alias b this; }

MyCustomType a;
foo!(string)(a);

instead of

MyCustomType a;
foo(a);

And I think making the conversion on the user is better, because 
once we go down the rabbit hole of offering custom template 
support for user defined alias this-ed types, there's almost no 
bottom. You have to do the same thing to every templated function 
that uses traits like isNumeric, isPointer, etc.

So with that in mind, your proposal for alias this overloads of

 auto foo(C)(C[] str)
     if(isSomeChar!C)
 {...}

Isn't really helpful for either because it still clutters code 
with an extra overload. I think the problem we need to accept is 
that alias this and template will always be sort of oil and water.

I don't know if we should change the current overloads, as people 
complained with std.file was range-ified and their code using 
DirEntry broke. That's honestly what started this whole mess, and 
honestly in hindsight we should have marked the issue as won't 
fix and told them to use the above method. But people (including 
me) wanted the regression fixed which is totally understandable.

Maybe we should deprecate the isConvertibleToString overloads, 
maybe not. But we really shouldn't add more.

As such, I think that code using isSomeString in a context where 
an
enum would pass needs to either be changed so that the function
accepts only strings and uses !is(S == enum) to prevent enums 
from passing

How much longer until we're at "Effective D: 55 things you 
shouldn't do because it's buggy and you don't know it".

*sigh* I know what my PRs are going to be for the next two weeks.

 So, that could be repeated a fair bit in templated code. We 
 _could_
 shorten that to something like isSomeCharRange!R, but as soon as
 you need a forward range or greater, you're going to end up with
 something like

 isForwardRange!R && isSomeCharRange!R
 whereas right now it would be
 isForwardRange!R && isSomeChar!(ElementType!R)

Well, that kills that idea, doesn't it?

 Consider changing isSomeString and isNarrowString so that they 
 are false
 for enums - since they really should be - but we probably can't 
 do that
 without breaking at least some code (albeit mostly code that's 
 going to
 be buggy anyway).

Can't agree here. We don't know how much code we'd be breaking. 
The move towards deprecation of old behavior is the best for 
users, and that's not really possible here. The best thing to do 
is make the docs very explicit.

 If we don't make that change, we should consider adding
 isExactSomeString to std.traits

Also disagree. Making more traits on top of an already confusing 
situation is a non-starter for me. Again, I think we just need to 
make the docs better here.

Mar 12 2017

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sunday, March 12, 2017 09:02:53 Jack Stouffer via Digitalmars-d wrote:
 On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
 ...

 First off, I'd like to point out that creating specific overloads
 for alias this-ed structs is a bad idea, because you just have to
 ask the user to do this

 MyCustomType { string b; alias b this; }

 MyCustomType a;
 foo!(string)(a);

 instead of

 MyCustomType a;
 foo(a);

 And I think making the conversion on the user is better, because
 once we go down the rabbit hole of offering custom template
 support for user defined alias this-ed types, there's almost no
 bottom. You have to do the same thing to every templated function
 that uses traits like isNumeric, isPointer, etc.

 So with that in mind, your proposal for alias this overloads of

 auto foo(C)(C[] str)
     if(isSomeChar!C)
 {...}

 Isn't really helpful for either because it still clutters code
 with an extra overload. I think the problem we need to accept is
 that alias this and template will always be sort of oil and water.

 I don't know if we should change the current overloads, as people
 complained with std.file was range-ified and their code using
 DirEntry broke. That's honestly what started this whole mess, and
 honestly in hindsight we should have marked the issue as won't
 fix and told them to use the above method. But people (including
 me) wanted the regression fixed which is totally understandable.

 Maybe we should deprecate the isConvertibleToString overloads,
 maybe not. But we really shouldn't add more.

I completely agree that in general, we should avoid implicit conversions
with functions. I thought that I made that clear. Allowing implicit
conversions with templates is error-prone and should be generally avoided.

The issue is when you templatize a function that already exist. If you take
a function that accepts string, and you try and make it so that it accepts
arbitrary ranges of characters, you have to still accept all of the implicit
conversions that the original function did, or you break code. So, it
becomes a question of the best way to write the templatized version of the
function.

As it stands, using isConvertibleToString is the solution that seems to have
been introduced to Phobos to fix that problem, and as I pointed out, that's
a risky solution at best and an outright wrong solution at worst, because it
does the conversion inside the function rather that at the call point, which
creates safety issues. And in those cases, I think that we should be
templatizing on character type like with

auto foo(R)(R range)
    if(!isSomeString!R &&
       isInputRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
    if(isInputRange!R && isSomeChar!(ElementType!R))
{...}

because that's safer. If we're talking about creating a new function rather
than templatizing an existing function, then I don't think that it makes
sense to add that extra overload. Just make it accept ranges of characters
and be done with it. But the problem of templatizing existing functions
still exists, and in that case, we're stuck doing _something_ to accept the
implicit conversions, or we break code.

- Jonathan M Davis

Mar 12 2017

Jack Stouffer <jack jackstouffer.com> writes:

On Sunday, 12 March 2017 at 09:26:08 UTC, Jonathan M Davis wrote:
 and in that case, we're stuck doing _something_ to accept the 
 implicit conversions, or we break code.

Sorry wrote that at 4 in the morning. Should have been clearer.

I'm saying we should, at best, offer an overload for alias 
this-ed structs only temporarily as a deprecated function. As I 
mentioned, I don't like these hacks, and in reality Phobos 
requires a lot more of them in order to be regression free. They 
also clutter up the code and docs.

Mar 12 2017

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sunday, March 12, 2017 17:22:09 Jack Stouffer via Digitalmars-d wrote:
 On Sunday, 12 March 2017 at 09:26:08 UTC, Jonathan M Davis wrote:
 and in that case, we're stuck doing _something_ to accept the
 implicit conversions, or we break code.

 Sorry wrote that at 4 in the morning. Should have been clearer.

 I'm saying we should, at best, offer an overload for alias
 this-ed structs only temporarily as a deprecated function. As I
 mentioned, I don't like these hacks, and in reality Phobos
 requires a lot more of them in order to be regression free. They
 also clutter up the code and docs.

Yes, they clutter up code, but I don't know how reasonable it is to keep
deprecating behavior in Phobos like that every time that we get around to
templatizing a string-based function so that it's range based. To make
matters worse, I don't think that it's actually possible to deprecate the
implicit conversions _and_ have it be safe in the case where a portion of
the range is returned from the function.

If isConvertibleToString is used - or any template constraint that is
templated on the whole type - then when the implicit conversion is done,
it's done inside the function, which in the case of static arrays and some
user-defined types would mean returning a reference to memory that is local
to the function (and thus invalid), whereas if the function is templated on
the character type, then the implict conversion occurs at the call site - so
it acts as it did when the function wasn't templated, and it's safe. But if
the function is templated on character type, then it's going to accept
strings, not just the implicit conversions. So, you can't deprecate it. So,
it looks to me like we're faced with either allowing deprecation and being
unsafe or being safe and not being able to do any deprecation. Certainly, I
can't think of any other way to accept an implicit conversion and have the
conversion be done at the call site, and without that, you have a safety
problem.

As such, even if the resulting breakage is acceptable, I don't see how we be
safely templatizing these functions and then deprecating the overloads that
deal with implicit conversions. In some cases, we could get away with it,
because no portion of the original range is returned, but in many cases, it
is returned, and then AFAIK, templatizing on the character type is the only
way to make it safe.

This whole problem would be reduced if we could actually get the implicit
conversion of static arrays to dynamic arrays deprecated and then removed
from the language, but last time I checked, Walter didn't like that idea,
and even if we did that, we'd still have the potential of something unsafe
happening with user-defined types implicitly converting to string where they
work correctly when the function does the conversion at the call site but do
not work correctly if the conversion is done internally.

Honestly, this issue feels a lot to me like the issue with isSomeString.
Ideally, we'd just fix it and deal with the broken code, as annoying as that
would be. But if we did that, we'd be breaking code. We can make a choice
that makes the language or library better in the long run, but it's going to
cost us in the short term.

In _some_ cases we could go through a deprecation phase with the implicit
conversions and avoid immediately breaking code, but it's likely to result
in a number of complaints if we do, and in many cases, we simply can't do it
safely - not unless someone has a bright idea about how we can force
implicit conversions to take place at the call site while still having the
function be templated on the entire type rather than just the character
type.

- Jonathan M Davis

Mar 12 2017

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Sunday, March 12, 2017 01:26:08 Jonathan M Davis via Digitalmars-d wrote:
 I completely agree that in general, we should avoid implicit conversions
 with functions.

In case it wasn't clear from the context, I meant specifically with
templated functions.

- Jonathan M Davis

Mar 12 2017

Andrea Fontana <nospam example.com> writes:

On Sunday, 12 March 2017 at 03:23:21 UTC, Jonathan M Davis wrote:
 Okay, I'm sorry, but this is a wall of text, and I don't know 
 how to actually make it short. If you really want the TLDR 
 version, look for the =='s at the bottom which mark off the 
 conclusion, but you're not going to understand the reasoning if 
 you skip the wall of text. Now, to the post...
 [...]

Very well-written and interesting post!

Mar 13 2017

D Programming

C/C++ Programming

Other

digitalmars.D - ranges of characters and the overabundance of traits that go with them