www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Implicit conversion of concatenation result to immutable

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Can somebody explain the logic behind the compiler disallowing 
both line 3 and 4 in

```d
const(char)[] x;
string y;
string z1 = x ~ y; // errors
string z2 = y ~ x; // errors
```

erroring as

```
Error: cannot implicitly convert expression `x ~ 
cast(const(char)[])y` of type `char[]` to `string`
Error: cannot implicitly convert expression `cast(const(char)[])y 
~ x` of type `char[]` to `string
````

Has this something to do with the compiler being defensive about 
possible in-place appending or prepending to the arguments (in 
this case `x` and `y`) passed to the array concatenation 
expression?

For instance, could `x ~ y` return either
- a back-extended slice `x[0 .. x.length + y.length]` with `y` 
appended at the back or
- a front-extended slice `y[-x.length .. y.length]` with `x` 
prepended to the front

provided the GC has information about available free memory there?

This problem regularly crops up for me during assembling of 
strings passed as a string parameter for instance an exception 
constructor.
Apr 01
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 5:21 PM, Per Nordlöw wrote:

 Error: cannot implicitly convert expression `x ~ cast(const(char)[])y` 
 of type `char[]` to `string`
That makes no sense. The compiler should allow that conversion. It can clearly prove that the result doesn't derive from the parameters. I'm surprised it doesn't return const(char)[]. -Steve
Apr 01
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 2:41 PM, Steven Schveighoffer wrote:

 On 4/1/21 5:21 PM, Per Nordl=C3=B6w wrote:

 Error: cannot implicitly convert expression `x ~ cast(const(char)[])y=
`
 of type `char[]` to `string`
That makes no sense. The compiler should allow that conversion. It can=
 clearly prove that the result doesn't derive from the parameters.

 I'm surprised it doesn't return const(char)[].

 -Steve
It should even return char[], no? Freshly copied data should belong to=20 the programmer. Ali
Apr 01
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 21:45:08 UTC, Ali Çehreli wrote:
 It should even return char[], no? Freshly copied data should 
 belong to the programmer.
Precisely.
Apr 01
prev sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 21:41:06 UTC, Steven Schveighoffer 
wrote:
 I'm surprised it doesn't return const(char)[].
I think it returning `char[]` is sound. What's unsound is that the compiler lacks knowledge of it being a unique slice (no aliasing). The situation is analogous with ```d alias T = int; const n = 42; auto x = new T[n]; // no conversion static assert(is(typeof(x) == T[])); immutable(T)[] y = new T[n]; // implicit conversion to immutable allowed static assert(is(typeof(y) == immutable(T)[])); ``` which is currently accepted by the compiler.
Apr 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 5:51 PM, Per Nordlöw wrote:
 On Thursday, 1 April 2021 at 21:41:06 UTC, Steven Schveighoffer wrote:
 I'm surprised it doesn't return const(char)[].
I think it returning `char[]` is sound.
Not saying it's unsound, it just surprises me.
 What's unsound is that the compiler lacks knowledge of it being a unique 
 slice (no aliasing).
It shouldn't lack this knowledge. That's not soundness, it's just annoying. I'm not saying I disagree with you, it's a limitation and I think there's no good explanation for the problem. To illustrate the point more cleanly: ```d auto concat(T, U)(T[] x, U[] y) pure { return x ~ y; } void main() { string x; const(char)[] y; string z = concat(x, y); // compiles } ``` -Steve
Apr 01
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 06:34:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 ```d
 auto concat(T, U)(T[] x, U[] y) pure
 {
     return x ~ y;
 }
 
 void main()
 {
     string x;
     const(char)[] y;
     string z = concat(x, y); // compiles
 }
 ```
[...] Put this way, the solution becomes obvious: `~` should be considered a pure operation. Then the compiler (in theory) ought to be able to infer uniqueness from `x ~ y`, and consequently allow implicit conversion to immutable. T -- Never criticize a man until you've walked a mile in his shoes. Then when you do criticize him, you'll be a mile away and he won't have his shoes.
Apr 01
next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 3:55 PM, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 06:34:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 [...]
 ```d
 auto concat(T, U)(T[] x, U[] y) pure
 {
      return x ~ y;
 }

 void main()
 {
      string x;
      const(char)[] y;
      string z = concat(x, y); // compiles
 }
 ```
[...] Put this way, the solution becomes obvious: `~` should be considered a pure operation. Then the compiler (in theory) ought to be able to infer uniqueness from `x ~ y`, and consequently allow implicit conversion to immutable. T
I admit I've been neglecting indirections. If we are dealing with const(S)[], S being a user defined type with indirections, then concatenation cannot be S[] because the original S object would not allow mutation through them. This is still within compiler's attribute inference, right? On the other hand, this probably would complicate template code: I can imagine a template code is tested with simple types and works but fails as soon as used with a const(S)[] type at a customer site. Ali
Apr 01
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 6:55 PM, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 06:34:04PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 [...]
 ```d
 auto concat(T, U)(T[] x, U[] y) pure
 {
      return x ~ y;
 }

 void main()
 {
      string x;
      const(char)[] y;
      string z = concat(x, y); // compiles
 }
 ```
[...] Put this way, the solution becomes obvious: `~` should be considered a pure operation. Then the compiler (in theory) ought to be able to infer uniqueness from `x ~ y`, and consequently allow implicit conversion to immutable.
It is considered pure, note that I'm using concatenation inside the function (which is marked pure). -Steve
Apr 02
prev sibling parent Q. Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 1 April 2021 at 22:34:04 UTC, Steven Schveighoffer 
wrote:
 To illustrate the point more cleanly:

 ```d
 auto concat(T, U)(T[] x, U[] y) pure
 {
     return x ~ y;
 }

 void main()
 {
     string x;
     const(char)[] y;
     string z = concat(x, y); // compiles
 }
 ```
Found by accident that the code does not compile with -dip1000 on 2.095 (run.dlang.org). Actually, I tried inclining the function template and to my surprise, whether it passes or not depends on -dip1000: ```D string z1 = (() => x ~ y)(); // fails with and without -dip1000 string z2 = ((x, y) => x ~ y)(x, y); // passes without, fails with -dip1000 ``` It makes no difference adding `pure` to any of those since it's inferred.
Apr 01
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 09:21:02PM +0000, Per Nordlw via Digitalmars-d wrote:
 Can somebody explain the logic behind the compiler disallowing both line 3
 and 4 in
 
 ```d
 const(char)[] x;
 string y;
 string z1 = x ~ y; // errors
 string z2 = y ~ x; // errors
 ```
 
 erroring as
 
 ```
 Error: cannot implicitly convert expression `x ~ cast(const(char)[])y` of
 type `char[]` to `string`
 Error: cannot implicitly convert expression `cast(const(char)[])y ~ x` of
 type `char[]` to `string
 ````
It is illegal to implicitly convert const to immutable, because there may be a mutable alias to the data somewhere. If so, it will violate immutability. For example: char[] evil; const(char)[] x = evil; // x now aliases evil string y = x; // y is now aliases evil <---- evil[0] = 'a'; // Oops, immutability violated The implicit conversion on the line marked `<----` is the cause of the problem. Now, when you append a string to a const(char)[], the compiler has to promote `string` to `const(char)[]` first, so that the operands of ~ have the same type. And obviously, the result of concatenating two const(char)[] must be const(char)[], since you don't know if one of them may have mutable aliases somewhere else. So the result must likewise be const(char)[]. One may argue that appending in general will reallocate, and once reallocated it will be unique, and there safe to implicitly convert to immutable. However, in general we cannot guarantee this, e.g., one of the strings could be empty and not reallocate at runtime, so it may continue to be aliased by some mutable reference somewhere else. So the result must be typed as const(char)[], along with the restriction that it cannot implicitly convert to immutable. [...]
 This problem regularly crops up for me during assembling of strings
 passed as a string parameter for instance an exception constructor.
Just use .idup on the result. T -- What's a "hot crossed bun"? An angry rabbit.
Apr 01
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 2:59 PM, H. S. Teoh wrote:

 the result of concatenating two
 const(char)[] must be const(char)[], since you don't know if one of them
 may have mutable aliases somewhere else.  So the result must likewise be
 const(char)[].

 One may argue that appending in general will reallocate, and once
 reallocated it will be unique, and there safe to implicitly convert to
 immutable.  However, in general we cannot guarantee this
Yes, that's tricky for append because one of many slices does own the potential bytes after the array and will append elements in there. However, concatenation always makes a new array, right? I think the result can be char[] in that case. Ali
Apr 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 03:07:09PM -0700, Ali ehreli via Digitalmars-d wrote:
 On 4/1/21 2:59 PM, H. S. Teoh wrote:
[...]
 One may argue that appending in general will reallocate, and once
 reallocated it will be unique, and there safe to implicitly convert
 to immutable.  However, in general we cannot guarantee this
Yes, that's tricky for append because one of many slices does own the potential bytes after the array and will append elements in there. However, concatenation always makes a new array, right? I think the result can be char[] in that case.
[...] If one of the arguments is an empty array, does concatenation allocate a new array anyway? Or does it simply return the other argument? (I don't know.) If not, then we cannot make it implicitly convertible. T -- Genius may have its limitations, but stupidity is not thus handicapped. -- Elbert Hubbard
Apr 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 6:34 PM, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 03:07:09PM -0700, Ali Çehreli via Digitalmars-d wrote:
 On 4/1/21 2:59 PM, H. S. Teoh wrote:
[...]
 One may argue that appending in general will reallocate, and once
 reallocated it will be unique, and there safe to implicitly convert
 to immutable.  However, in general we cannot guarantee this
Yes, that's tricky for append because one of many slices does own the potential bytes after the array and will append elements in there. However, concatenation always makes a new array, right? I think the result can be char[] in that case.
[...] If one of the arguments is an empty array, does concatenation allocate a new array anyway? Or does it simply return the other argument? (I don't know.) If not, then we cannot make it implicitly convertible.
Yes, always an allocation. See point 5 here: https://dlang.org/spec/arrays.html#array-concatenation -Steve
Apr 01
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 22:35:57 UTC, Steven Schveighoffer 
wrote:
 Yes, always an allocation. See point 5 here: 
 https://dlang.org/spec/arrays.html#array-concatenation
Good, then the implicit conversion should be allowed. Anybody up for the job? If not, I'm gonna look into it.
Apr 01
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 3:44 PM, Per Nordl=C3=B6w wrote:
 On Thursday, 1 April 2021 at 22:35:57 UTC, Steven Schveighoffer wrote:
 Yes, always an allocation. See point 5 here:=20
 https://dlang.org/spec/arrays.html#array-concatenation
=20 Good, then the implicit conversion should be allowed. Anybody up for th=
e=20
 job? If not, I'm gonna look into it.
As I mentioned elsewhere in this thread, the element type must not have=20 indirections though. If S is a struct with indirections, concatenating=20 const(S)[] should still produce const(S)[]. Ali
Apr 01
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 23:21:56 UTC, Ali Çehreli wrote:
 As I mentioned elsewhere in this thread, the element type must 
 not have indirections though. If S is a struct with 
 indirections, concatenating const(S)[] should still produce 
 const(S)[].
Thanks. Got something working that allows the implicit conversion at https://github.com/dlang/dmd/pull/12341 for the sample code ```d safe pure unittest { const(char)[] x; string y; string z1 = x ~ y; // now passes string z2 = y ~ x; // now passes } ``` but nothing else. Feel perfectly free to fill in the details or even take over the PR.
Apr 01
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 21:59:21 UTC, H. S. Teoh wrote:
 Just use .idup on the result.
That creates an unnecessary GC allocation in cases such ```d string x; const(char)[] y; throw new Exception(x ~ y.idup) ``` which, imho, is very much worth considering avoiding the need for. The implicit conversion could be special cased in the compiler to be allowed only on a `CatExp` being an r-value, naturally non-aliased. As mentioned above this is analogous with conversion rules of new expressions.
Apr 01
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 22:17:08 UTC, Per Nordlöw wrote:
 On Thursday, 1 April 2021 at 21:59:21 UTC, H. S. Teoh wrote:
 Just use .idup on the result.
That creates an unnecessary GC allocation in cases such ```d string x; const(char)[] y; throw new Exception(x ~ y.idup) ```
The alternative ```d throw new Exception((x ~ y).idup) ``` , I now realized you referred to, is also unsound because `x ~ y` is already a unique freshly allocated unaliased slice that can be freely implicitly converted to `immutable`.
Apr 01
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 5:59 PM, H. S. Teoh wrote:

 Now, when you append a string to a const(char)[], the compiler has to
 promote `string` to `const(char)[]` first, so that the operands of ~
 have the same type. And obviously, the result of concatenating two
 const(char)[] must be const(char)[], since you don't know if one of them
 may have mutable aliases somewhere else.  So the result must likewise be
 const(char)[].
But it's not. auto z = x ~ y; pragma(msg, typeof(z)); // char[] This is what's confusing me. The compiler somehow knows it can do this implicit cast, but doesn't know that the result is unique. It should be obvious.
 One may argue that appending in general will reallocate, and once
 reallocated it will be unique, and there safe to implicitly convert to
 immutable.  However, in general we cannot guarantee this, e.g., one of
 the strings could be empty and not reallocate at runtime, so it may
 continue to be aliased by some mutable reference somewhere else. So the
 result must be typed as const(char)[], along with the restriction that
 it cannot implicitly convert to immutable.
a ~ b will always allocate new memory, it's in the spec. -Steve
Apr 01
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 21:59:21 UTC, H. S. Teoh wrote:
 Just use .idup on the result.
In the meanwhile `.assumeUnique` from `std.exception` should be preferred over `.idup` unless duplicate memory allocations is of intrinsic value. ;)
Apr 01
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 1 April 2021 at 21:21:02 UTC, Per Nordlöw wrote:
 Can somebody explain the logic behind the compiler disallowing 
 both line 3 and 4 in
I'm very intrigued by the fact the this issue hasn't been discussed more. It smells to me like this behaviour is somewhat by design...because I don't think it is at all difficult to fix. During implicit conversion checking, just peek into the expression and see if it's CatExp and if so treat it as a unique reference and allow conversion from mutable non-mutable. However, note that to solve this in the general case we have to involve data flow and escap e analysis. And have a tag associated with a variable that indicates whether it is aliased or not (on, yes, maybe). Consider, for instance, ```d auto c = a ~ b; e = d; // `d` aliased to `e` f = foo(e); // `e` maybe aliased to `f` immutable g = e; // allowed? ```
Apr 01