www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 12325] New: Major performance problem with std.array.front

https://d.puremagic.com/issues/show_bug.cgi?id=12325

           Summary: Major performance problem with std.array.front
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bugzilla digitalmars.com


--- Comment #0 from Walter Bright <bugzilla digitalmars.com> 2014-03-08
19:31:09 PST ---
Throughout D's history, there are regular and repeated proposals to redesign
D's view of char[] to pretend it is not UTF-8, but UTF-32. I.e. so D will
automatically generate code to decode and encode on every attempt to index
char[].

I have strongly objected to these proposals on the grounds that:

1. It is a MAJOR performance problem to do this.

2. Very, very few manipulations of strings ever actually need decoded values.

3. D is a systems/native programming language, and systems/native programming
languages must not hide the underlying representation (I make similar arguments
about proposals to make ints issue errors on overflow, etc.).

4. Users should choose when decode/encode happens, not the language.

and I have been successful at heading these off. But one slipped by me. See
this in std.array:

   property dchar front(T)(T[] a)  safe pure if (isNarrowString!(T[]))
  {
    assert(a.length, "Attempting to fetch the front of an empty array of " ~
           T.stringof);
    size_t i = 0;
    return decode(a, i);
  }

What that means is that if I implement an algorithm that accepts, as input, an
InputRange of char's, it will ALWAYS try to decode it. This means that even:

   from.copy(to)

will decode 'from', and then re-encode it for 'to'. And it will do it SILENTLY.
The user won't notice, and he'll just assume that D performance sux. Even if he
does notice, his options to make his code run faster are poor.

If the user wants decoding, it should be explicit, as in:

    from.decode.copy(encode!to)

The USER should decide where and when the decoding goes. 'decode' should be
just another algorithm.

(Yes, I know that std.algorithm.copy() has some specializations to take care of
this. But these specializations would have to be written for EVERY algorithm,
which is thoroughly unreasonable. Furthermore, copy()'s specializations only
apply if BOTH source and destination are arrays. If just one is, the
decode/encode penalty applies.)

Is there any hope of fixing this?

Newsgroup discussion: http://forum.dlang.org/post/lfbbcn$2th7$1 digitalmars.com

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 08 2014