www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Shouldn't hasSwappableElements work on char arrays?

reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Would it be wrong if hasSwappableElements worked on char arrays?

Look:

import std.stdio;
import std.algorithm;
import std.range;
import std.traits;

void main()
{
    char[] r = "abc".dup;

    // fails:
    static assert(hasSwappableElements!(char[]));   
    
    // fails because reverse uses hasSwappableElements(r)
    // as a constraint:
    reverse(r);
    
    // But this works just fine..
    swap(r[0], r[2]);
    
    assert(r == "cba");
}

If you comment out the static assert and the reverse, you'll see that swap
works fine on char arrays if you give it an index. 

Here's an experimental implementation of hasSwappableElements that could work
for char[]'s:

import std.stdio;
import std.algorithm;
import std.range : isForwardRange, ElementType;
import std.traits;

template hasSwappableElements(R)
{
    enum bool hasSwappableElements = isForwardRange!(R) && is(typeof(
    {
        auto r = [ElementType!(R).init];
        swap(r[0], r[0]);
    }()));
}

void main()
{
    char[] r = "abc".dup;

    // now works:
    static assert(hasSwappableElements!(char[]));   
    
    swap(r[0], r[2]);
    assert(r == "cba");
}

Here's another thing that's interesting. If I replace "auto r" with "R r" in
the modified hasSwappableElements implementation, then the assert fails:

import std.stdio;
import std.algorithm;
import std.range : isForwardRange, ElementType;
import std.traits;

template hasSwappableElements(R)
{
    enum bool hasSwappableElements = isForwardRange!(R) && is(typeof(
    {
        R r = [ElementType!(R).init];
        swap(r[0], r[0]);
    }()));
}

void main()
{
    // Fails
    static assert(hasSwappableElements!(char[]));   
}

The same thing happens if I replace "R" with "char[]". This might be some kind
of bug?
Feb 24 2011
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Now I see why using char[] fails. It's because [ElementType!(R).init];
returns a dchar[].
Feb 24 2011
parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Andrej Mitrovic Wrote:

 Now I see why using char[] fails. It's because [ElementType!(R).init];
 returns a dchar[].

Yep, Unicode for the win. dchar[] is swappable.
Feb 24 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, February 24, 2011 11:42:38 Andrej Mitrovic wrote:
 On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:
 Andrej Mitrovic Wrote:
 Now I see why using char[] fails. It's because [ElementType!(R).init];
 returns a dchar[].

Yep, Unicode for the win. dchar[] is swappable.

Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal..

It's because the type of an expression has nothing to do with what it's assigned to. So, the type of "test" is string, not dchar[] (on top of the fact that - on Linux at least - "test" _is_ immutable, so assigning it to a dchar[] without duping it is bad anyway). So, the result of the expression on the right-hand side of the assignment does _not_ match the type of the variable being assigned to (or initialized in this case).
 Still, I don't see why char arrays should fail on hasSwappableElements
 when swap can be used on char arrays?

It's because char arrays are ranges of dchar, _not_ char. So, you _can't_ swap them. And since each individual char is potentially meaningless on its own (since in anything other than straight ASCII, you're going to need multiple chars per character), swapping them makes no sense. hasSwappableElements deals with ranges, not arrays. And char[] is a range of dchar, not char. - Jonathan M Davis
Feb 24 2011
parent =?ISO-8859-1?Q?Ali_=C7ehreli?= <acehreli yahoo.com> writes:
On 02/24/2011 12:14 PM, Jonathan M Davis wrote:

 the type of "test" is string

Sorry to take it out of context but that statement is not always correct. String literals can be string, wstring, or dstring: void foo(string c, wstring w, dstring d) {} void main() { foo("c", "w", "d"); // <- this compiles // But the following fails to compile: // string s; // foo(s, s, s); } Ali
Feb 24 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/24/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 "test" _is_ immutable, so assigning it to a dchar[] without
 duping it is bad anyway).

Can't the compiler figure that out on its own?
Feb 24 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 24 Feb 2011 15:33:52 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 2/24/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 "test" _is_ immutable, so assigning it to a dchar[] without
 duping it is bad anyway).

Can't the compiler figure that out on its own?

It did figure that out (that it was bad) and told you not to do it :) But what you are asking is for the compiler to implicitly dup it. I have thought this might be good to have in the past as well, but it's also not too bad to have to type "test"d.dup. So while having the compiler save you a bit of typing would be good, it's not the end of the world to require it. -Steve
Feb 24 2011
prev sibling next sibling parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Andrej Mitrovic Wrote:

 Yes. And you know what's going to happen next, right? Everyone is
 going to create their own implementation of a string type because of
 these non-issues. Happens in C/C++ all the time, I see it in almost
 every mid-large codebase out there.

Well, aside from discussions to create a proper string type, Text in Tango and mText on dprogramming.org I have yet to see people using a special string type. I'm not sure why you are complaining that compiler is preventing you from doing something stupid. An array of char is not swappable, a range of char is not swappable. D is not in the habit of hiding complexity, trying to swap a string requires a conversion to dchar or at least handling a sequence of char so it is best to state that is what is happening in the code.
Feb 24 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/24/11 3:13 PM, Jesse Phillips wrote:
 Andrej Mitrovic Wrote:

 Yes. And you know what's going to happen next, right? Everyone is
 going to create their own implementation of a string type because of
 these non-issues. Happens in C/C++ all the time, I see it in almost
 every mid-large codebase out there.

Well, aside from discussions to create a proper string type, Text in Tango and mText on dprogramming.org I have yet to see people using a special string type. I'm not sure why you are complaining that compiler is preventing you from doing something stupid. An array of char is not swappable, a range of char is not swappable. D is not in the habit of hiding complexity, trying to swap a string requires a conversion to dchar or at least handling a sequence of char so it is best to state that is what is happening in the code.

Swapping a char[] correctly (preserving the proper code units) without using additional storage is a very interesting problem. Andrei
Feb 24 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/24/11, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 But what you are asking is for the compiler to implicitly dup it.

Only when the lhs is a mutable type. If it's immutable (string), then you don't have to dup it. Hence: string a = "abc"; string b = "abc"; assert(&a[0] == &b[0]); There's no point in duping the literal in this case, it would just waste memory.
I have
 thought this might be good to have in the past as well, but it's also not
 too bad to have to type "test"d.dup.  So while having the compiler save
 you a bit of typing would be good, it's not the end of the world to
 require it.

Of course it's not that hard. But when things can be safely automated, I don't see why they shouldn't be. Unless I'm missing some important factor of duping string literals that was not mentioned already. Btw, "test"d.dup is actually pretty nice. I would have used it before, but I didn't know I could use a postfix form /and/ dup it like that.
Feb 24 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 24 Feb 2011 16:21:08 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 Of course it's not that hard. But when things can be safely automated,
 I don't see why they shouldn't be. Unless I'm missing some important
 factor of duping string literals that was not mentioned already.

It's a 'hidden allocation'. It leads to low performance code that looks like it's really fast. There are plenty of examples of hidden allocation in D already which I would hope we could get rid of, I wouldn't want to add more. For example, try using an AA literal as an enum, and then use that enum in lots of places. Guess what? Each time you use it, the runtime constructs a new instance of the AA! I think we should strive to require explicit requests for allocations as much as possible. -Steve
Feb 24 2011
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 wait, you thought char[] was an array?  You poor poor soul ;)
 
 I predict we shall get 1-2 questions/claims of incredulity like this a  
 month until we get a real string type.

There's a need for both unicode strings, and simpler strings of 7 bit ASCII chars (both mutable and immutable. The immutable ones must not allow to change their length. Their hashing value may be computed lazily even for the immutable strings). A ubyte[] is not a good enough replacement for an ASCII string. Even a puny language like Python3 has recognized this. Bye, bearophile
Feb 24 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 Honestly, I think that the need for actual ASCII strings is quite rare and
that 
 it _should_ not be encouraged.

I need ASCII strings (or mutable/immutable arrays of ASCII chars) all the time, they come from English text, genomic data, etc. Bye, bearophile
Feb 24 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:
 Andrej Mitrovic Wrote:

 Now I see why using char[] fails. It's because [ElementType!(R).init];
 returns a dchar[].

Yep, Unicode for the win. dchar[] is swappable.

Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal.. Still, I don't see why char arrays should fail on hasSwappableElements when swap can be used on char arrays?
Feb 24 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 24 Feb 2011 14:42:38 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 2/24/11, Jesse Phillips <jessekphillips+D gmail.com> wrote:
 Andrej Mitrovic Wrote:

 Now I see why using char[] fails. It's because [ElementType!(R).init];
 returns a dchar[].

Yep, Unicode for the win. dchar[] is swappable.

Oh looks like you're right. I can use reverse on dchar[]. Weird, I thought I've already tried that. P.S. Why do I have to use this gibberish syntax?: dchar[] test = to!(dchar[])("test"); Isn't the compiler smart enough to do this for me automatically? It's a string literal..

A string literal is immutable, dchar[] is mutable. These should work: immutable(dchar)[] test = "test"; dstring test = "test"; auto test = "test"d;
 Still, I don't see why char arrays should fail on hasSwappableElements
 when swap can be used on char arrays?

wait, you thought char[] was an array? You poor poor soul ;) I predict we shall get 1-2 questions/claims of incredulity like this a month until we get a real string type. -Steve
Feb 24 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, February 24, 2011 13:55:43 bearophile wrote:
 Steven Schveighoffer:
 wait, you thought char[] was an array?  You poor poor soul ;)
 
 I predict we shall get 1-2 questions/claims of incredulity like this a
 month until we get a real string type.

There's a need for both unicode strings, and simpler strings of 7 bit ASCII chars (both mutable and immutable. The immutable ones must not allow to change their length. Their hashing value may be computed lazily even for the immutable strings). A ubyte[] is not a good enough replacement for an ASCII string. Even a puny language like Python3 has recognized this.

Honestly, I think that the need for actual ASCII strings is quite rare and that it _should_ not be encouraged. However, it would be trivial to declare wrappers for char and wchar (e.g. charRange and wcharRange) which actually use char or wchar as their element type if it's really needed. In most cases, however, using unicode strings is what should be happening, so the fact that char[] doesn't work as a range is a _good_ thing. The only real problem with it is the fact that foreach doesn't use dchar as its default iteration type when iterating over arrays of char or wchar. - Jonathan M Davis
Feb 24 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/24/11, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 A string literal is immutable, dchar[] is mutable.  These should work:

 immutable(dchar)[] test = "test";
 dstring test = "test";
 auto test = "test"d;

Ah right, the postfix form. That's what i was looking for. I know a literal is immutable, I've tried "test".dup, but it still complained that it's not of type dchar. Anywho..
 wait, you thought char[] was an array?  You poor poor soul ;)

Yes. And you know what's going to happen next, right? Everyone is going to create their own implementation of a string type because of these non-issues. Happens in C/C++ all the time, I see it in almost every mid-large codebase out there. But w/e, strings are perfect in D etc etc..
Feb 24 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, February 24, 2011 14:55:34 bearophile wrote:
 Jonathan M Davis:
 Honestly, I think that the need for actual ASCII strings is quite rare
 and that it _should_ not be encouraged.

I need ASCII strings (or mutable/immutable arrays of ASCII chars) all the time, they come from English text, genomic data, etc.

And I would strongly argue that you shouldn't be using ASCII for text unless you _need_ to. Unicode does the job just fine and doesn't run into problems when you end up having to have non-ASCII characters. Far, far too many programs have been written with the assumption that ASCII was good enough and then had to be altered to work with unicode later. Using pure ASCII should be an optimization and only done if it's necessary. For something like genomic data, there's a good chance that such an optimization would be necessary because you needed a random-access range and you risk using too much memory using dstrings (and you know that all of the characters are valid chars, because they're limited to the few characters used to hold genomic data). But most people aren't dealing with genomic data. They're usually dealing with text, and text should be unicode. Assuming that all you're ever going to need is ASCII characters is generally unwise when dealing with text. - Jonathan M Davis
Feb 24 2011