www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why does std.string.splitLines return an array?

reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
std.string.splitLines returns an array, which is pretty grody.  Why not 
return a lazily-evaluated range struct so that we can avoid allocations 
on this simple but common operation?
Oct 21 2012
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Chad J:

 std.string.splitLines returns an array, which is pretty grody.  
 Why not return a lazily-evaluated range struct so that we can 
 avoid allocations on this simple but common operation?

splitLines is probably modeled on the str.splitlines() string method of Python, that returns a list (array) of strings (because originally Python was eager). In Phobos there is both a split() and splitter(), they are eager and lazy. So maybe you want a splitterLines(). I have asked for a lazy splitLines, vote here: http://d.puremagic.com/issues/show_bug.cgi?id=4764 But I have suggested for a different naming: http://d.puremagic.com/issues/show_bug.cgi?id=5838 See also: http://d.puremagic.com/issues/show_bug.cgi?id=6730 http://d.puremagic.com/issues/show_bug.cgi?id=7689 And especially: http://d.puremagic.com/issues/show_bug.cgi?id=8013 Bye, bearophile
Oct 21 2012
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
 std.string.splitLines returns an array, which is pretty grody.  Why not 
 return a lazily-evaluated range struct so that we can avoid allocations 
 on this simple but common operation?

If you want a lazy range, then use std.algorithm.splitter. std.string operates on and returns strings, not general ranges. - Jonathan M Davis
Oct 21 2012
parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
 On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
 std.string.splitLines returns an array, which is pretty grody.  Why not
 return a lazily-evaluated range struct so that we can avoid allocations
 on this simple but common operation?

If you want a lazy range, then use std.algorithm.splitter. std.string operates on and returns strings, not general ranges. - Jonathan M Davis

std.algorithm.splitter is simply not acceptable for this. It doesn't have this kind of logic: bool matchLineEnd( string text, size_t pos ) { if ( pos+1 < text.length && text[pos] == '\r' && text[pos+1] == '\n' ) return true; else if ( pos < text.length && (text[pos] == '\r' || text[pos] == '\n') ) return true; else return false; } I've never used std.algorithm.splitter for line splitting, despite trying. It's always more effective to write your own. I'm with bearophile on this one: http://d.puremagic.com/issues/show_bug.cgi?id=4764 I think his suggestions about naming also just make *sense*. I'm not sure how practical some of those naming changes would be if there is a lot of wild D2 code that uses the current weirdly-named stuff that emphasizes eager evaluation and extraneous allocations. I'm not sure how necessary it is to even /have/ functions that return arrays when there are lazy versions: the result of a lazy function can always be fed to std.array.array(range). Heh, even parentheses nesting is nicely handled by UFCS now.
Oct 21 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/22/12 1:05 AM, Chad J wrote:
 On 10/21/2012 06:35 PM, Jonathan M Davis wrote:
 On Sun, 2012-10-21 at 18:00 -0400, Chad J wrote:
 std.string.splitLines returns an array, which is pretty grody. Why not
 return a lazily-evaluated range struct so that we can avoid allocations
 on this simple but common operation?

If you want a lazy range, then use std.algorithm.splitter. std.string operates on and returns strings, not general ranges. - Jonathan M Davis

std.algorithm.splitter is simply not acceptable for this. It doesn't have this kind of logic: bool matchLineEnd( string text, size_t pos ) { if ( pos+1 < text.length && text[pos] == '\r' && text[pos+1] == '\n' ) return true; else if ( pos < text.length && (text[pos] == '\r' || text[pos] == '\n') ) return true; else return false; }

Agreed. We should add splitter() accepting only one argument of some string type. It would use the line splitting logic above. Could you please adapt your code to do this and package it in a pull request? Thanks! Andrei
Oct 22 2012