www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RE: std.array.array broken?

reply "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
In reference to this thread:
http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org

Personally I think it was a mistake providing unsafe APIs by 
default. If I would have had it my way, I would introduce:

byLine -> safe, doesn't reuse a buffer
byLineBuffer -> reuses a buffer

That way you get safe-by-default operations for the vast majority 
of users, and a speedy version for those who need it when they 
need it.

This is similar to how the new regex APIs encode in their name 
exactly what they do, e.g. the new matchAll is self-describing 
rather than guessing whether match() has a default mode of "g" 
that matches all or not.

It's probably too late to change byLine now. But warnings and 
notes in the comments have so far been unfruitful. I can't 
imagine many people are aware of warnings, and some warnings are 
so ridiculously long that it makes you question why a function 
was made to have so many caveats. For a classic example, read the 
warnings for toUTFz: http://dlang.org/phobos/std_utf.html#.toUTFz

Safe and simple should be the default, leave the "if 
((cast(size_t)p & 3) && *p == '\0') return str.ptr" wizardry for 
a separately named function that provides these speed benefits at 
the cost of safety.
Feb 01 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 Personally I think it was a mistake providing unsafe APIs by 
 default. If I would have had it my way, I would introduce:

 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

I agree. I proposed something related lot of time ago, see (the original title of this ER was "Safer stdin.byLine()"): http://d.puremagic.com/issues/show_bug.cgi?id=4474 Bye, bearophile
Feb 01 2014
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
 In reference to this thread:
 http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org


 Personally I think it was a mistake providing unsafe APIs by default. If
 I would have had it my way, I would introduce:

 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

No. Too much breakage. Andrei
Feb 01 2014
prev sibling next sibling parent "deed" <none none.none> writes:
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu 
wrote:
 On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
 In reference to this thread:
 http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org


 Personally I think it was a mistake providing unsafe APIs by 
 default. If
 I would have had it my way, I would introduce:

 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

No. Too much breakage. Andrei

From the docs it appears as array() will handle the required copying. std.array.array doc: --- Returns a newly-allocated dynamic array consisting of a copy of the input range, static array, dynamic array, or class or struct with an opApply function r. Note that narrow strings are handled as a special case in an overload. --- std.stdio.byLine doc: --- Returns an input range set up to read from the file handle one line at a time. The element type for the range will be Char[]. Range primitives may throw StdioException on I/O error. Note: Each front will not persist after popFront is called, so the caller must copy its contents (e.g. by calling to!string) if retention is needed. ---
Feb 02 2014
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu 
wrote:
 On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
 In reference to this thread:
 http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org


 Personally I think it was a mistake providing unsafe APIs by 
 default. If
 I would have had it my way, I would introduce:

 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

No. Too much breakage. Andrei

Agreed. I wonder if the problem can be fixed another way: 1. Introduce a new function ("File.lines" perhaps) which is like byLine, but safe, and has an option to re-use a buffer (but isn't default). 2. After a while, remove documentation for byLine, but leave it in Phobos. This way, newcomers will never see byLine and will get safe behaviour by default with "lines", and existing code will continue to work using the undocumented byLine.
Feb 02 2014
prev sibling next sibling parent "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
On Sunday, 2 February 2014 at 01:03:25 UTC, Andrei Alexandrescu 
wrote:

 I would have had it my way, I would introduce:

 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

No. Too much breakage.

How exactly is it breakage? The user code: - will not stop to compile - will not stop to link - will still produce expected results The only thing that can "break" is that the user code will lose performance where it actually does make an explicit copy. This can be solved by introducing a message with pragma(msg), directing users to get rid of unnecessary copying. The message could stick around for several releases.
Feb 02 2014
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 2/2/14, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 2/1/14, 3:07 PM, Andrej Mitrovic wrote:
 byLine -> safe, doesn't reuse a buffer
 byLineBuffer -> reuses a buffer

No. Too much breakage.

No, I meant before the function was even introduced. But for future new APIs we could be more careful. byLine resulting in "strange results" is one of the most asked about things in IRC and the DForums, here's a short list of threads I could find in a quick search: std.array.array broken? http://forum.dlang.org/thread/ouyuujnzzvfkvxbfzyak forum.dlang.org#post-ouyuujnzzvfkvxbfzyak:40forum.dlang.org Reading file by line, weird result http://forum.dlang.org/thread/iklwhshvwqbubzpvfcgu forum.dlang.org csvReader byLine http://forum.dlang.org/thread/mailman.1694.1340281202.24740.digitalmars-d puremagic.com#post-mailman.1713.1340376472.24740.digitalmars-d:40puremagic.com persistent byLine http://forum.dlang.org/thread/ksj7b6$86b$1 digitalmars.com array(file.byLine()) is a problem http://forum.dlang.org/thread/bug-6495-3 http.d.puremagic.com%2Fissues%2F std.stdio.ByLine is not true input range http://forum.dlang.org/thread/bug-8084-3 http.d.puremagic.com%2Fissues%2F Read Complete File to Array of Lines http://forum.dlang.org/thread/aimdwqgymyuajjbsycfj forum.dlang.org#post-mefabsmxvzwahzdlkvnp:40forum.dlang.org File.byLine should return dups? http://forum.dlang.org/thread/hubkh9$1k6$1 digitalmars.com Safer stdin.byLine() http://forum.dlang.org/thread/bug-4474-3 http.d.puremagic.com/issues/
Feb 02 2014