digitalmars.D.learn - std.range.byLine

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (12/12) Sep 10 2014 I'm missing a range variant of byLine that can operate on strings

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/4) Sep 10 2014 Or some other Phobos module.
=?UTF-8?B?QWxpIMOHZWhyZWxp?= (14/25) Sep 10 2014 There is std.ascii.newline. The following works where newline is '\n'

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (25/31) Sep 10 2014 Ok, great.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/16) Sep 10 2014 IMHO, this should be added to std.string and restricted to

monarch_dodra (13/29) Sep 11 2014 Well, the issue is that this isn't very portable for *reading*,

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (12/24) Sep 11 2014 Good idea. So its "just" a matter of extending splitter with

monarch_dodra (7/23) Sep 11 2014 Hum... no, those are the correct splitting elements. However, I

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (7/13) Sep 11 2014 So why not simply change the order of the keys to
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/4) Sep 11 2014 Anyway, it shouldn't be too hard to express this in a new range.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/4) Sep 11 2014 I guess what we need is a variant of splitter with a more greedy

H. S. Teoh via Digitalmars-d-learn (9/14) Sep 11 2014 Why not just use std.regex?

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (11/16) Sep 12 2014 I'll try the lazy variant of std.regex
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/6) Sep 12 2014 Shouldn't you use

monarch_dodra (4/10) Sep 12 2014 Probably not, as (AFAIK) the splitter engine *itself* will *also*

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/7) Sep 12 2014 I ended up with this.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I'm missing a range variant of byLine that can operate on strings 
instead of just File.

This is such a common feature so I believe it should have its 
place in std.range.

My suggestion is to define this using

splitter!(std.uni.isNewline)

but I'm missing std.uni.isNewline.

I'm guessing the problem here is that newline separators can be 1 
or 2 bytes long. that is it Separator must be of the same time as 
Range.

Should I add an overload in PR?

Destroy.

Sep 10 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 10 September 2014 at 21:06:30 UTC, Nordlöw wrote:
 This is such a common feature so I believe it should have its 
 place in std.range.

Or some other Phobos module.

Sep 10 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 09/10/2014 02:06 PM, "Nordlöw" wrote:
 I'm missing a range variant of byLine that can operate on strings
 instead of just File.

 This is such a common feature so I believe it should have its place in
 std.range.

 My suggestion is to define this using

 splitter!(std.uni.isNewline)

 but I'm missing std.uni.isNewline.

 I'm guessing the problem here is that newline separators can be 1 or 2
 bytes long. that is it Separator must be of the same time as Range.

 Should I add an overload in PR?

 Destroy.

There is std.ascii.newline. The following works where newline is '\n' 
e.g. on my Linux system. :)

import std.ascii;
import std.algorithm;
import std.range;

void main()
{
     assert("foo\nbar\n"
            .splitter(newline)
            .filter!(a => !a.empty)
            .equal([ "foo", "bar" ]));
}

Ali

Sep 10 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 10 September 2014 at 22:29:55 UTC, Ali Çehreli 
wrote:
     assert("foo\nbar\n"
            .splitter(newline)
            .filter!(a => !a.empty)
            .equal([ "foo", "bar" ]));
 }

 Ali

Ok, great.

So I got.

auto byLine(Range)(Range input) if (isForwardRange!Range)
{
     import std.algorithm: splitter;
     import std.ascii: newline;
     static if (newline.length == 1)
     {
         return input.splitter(newline.front);
     }
     else
     {
         return input.splitter(newline);
     }
}

unittest
{
     import std.algorithm: equal;
     assert(equal("a\nb".byLine, ["a", "b"]));
}

One thing still:

Is my optimization for newline.length == 1 unnecessary or perhaps 
even wrong?

Sep 10 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 10 September 2014 at 22:45:08 UTC, Nordlöw wrote:
 auto byLine(Range)(Range input) if (isForwardRange!Range)
 {
     import std.algorithm: splitter;
     import std.ascii: newline;
     static if (newline.length == 1)
     {
         return input.splitter(newline.front);
     }
     else
     {
         return input.splitter(newline);
     }
 }

IMHO, this should be added to std.string and restricted to 
isSomeString. Should I do a PR?

Sep 10 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 10 September 2014 at 23:01:44 UTC, Nordlöw wrote:
 On Wednesday, 10 September 2014 at 22:45:08 UTC, Nordlöw wrote:
 auto byLine(Range)(Range input) if (isForwardRange!Range)
 {
    import std.algorithm: splitter;
    import std.ascii: newline;
    static if (newline.length == 1)
    {
        return input.splitter(newline.front);
    }
    else
    {
        return input.splitter(newline);
    }
 }

 IMHO, this should be added to std.string and restricted to 
 isSomeString. Should I do a PR?

Well, the issue is that this isn't very portable for *reading*, 
as even on linux, you may read files with "\r\n" line endings 
(It's "standard" for csv files, for example), or read "\n" 
terminated files on windows.

The issue is that (currently) we don't have any splitter that 
operates on multiple needles. *That'd* be what needs to be 
written (probably not too hard either, since "find" already 
exists).

We also have splitLines, 

good enough for you by any chance? Or do you need it to actually 
be lazy?

Sep 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 10:19:17 UTC, monarch_dodra 
wrote:
 Well, the issue is that this isn't very portable for *reading*, 
 as even on linux, you may read files with "\r\n" line endings 
 (It's "standard" for csv files, for example), or read "\n" 
 terminated files on windows.
 The issue is that (currently) we don't have any splitter that 
 operates on multiple needles. *That'd* be what needs to be 
 written (probably not too hard either, since "find" already 
 exists).

Good idea. So its "just" a matter of extending splitter with 
std.algorithm.find with these three keys:
- \n
- \r
- \r\n
then? Or are there more encodings to choose from?

 We also have splitLines, 

 good enough for you by any chance? Or do you need it to 
 actually be lazy?

Lazyness is good in this case because my input files are 
Gigabytes in size :) I'm playing around with single-pass-parsing 
ConceptNet5 CSV-files at

https://github.com/nordlow/justd/blob/master/conceptnet5.d

Sep 11 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 11 September 2014 at 20:03:26 UTC, Nordlöw wrote:
 On Thursday, 11 September 2014 at 10:19:17 UTC, monarch_dodra 
 wrote:
 Well, the issue is that this isn't very portable for 
 *reading*, as even on linux, you may read files with "\r\n" 
 line endings (It's "standard" for csv files, for example), or 
 read "\n" terminated files on windows.
 The issue is that (currently) we don't have any splitter that 
 operates on multiple needles. *That'd* be what needs to be 
 written (probably not too hard either, since "find" already 
 exists).

 Good idea. So its "just" a matter of extending splitter with 
 std.algorithm.find with these three keys:
 - \n
 - \r
 - \r\n
 then? Or are there more encodings to choose from?

Hum... no, those are the correct splitting elements. However, I 
don't think that would actually work, as "find" will privilege 
the first whole element to match as a "hit", so "\r\n" never be 
hit (rather, it will be hit twice, in the form of two individual 
line breaks `\r` and '\n').

Bummer...

Sep 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 21:29:16 UTC, monarch_dodra 
wrote:
 Hum... no, those are the correct splitting elements. However, I 
 don't think that would actually work, as "find" will privilege 
 the first whole element to match as a "hit", so "\r\n" never be 
 hit (rather, it will be hit twice, in the form of two 
 individual line breaks `\r` and '\n').

 Bummer...

So why not simply change the order of the keys to
- \r\n
- \r
- \n

then?

Sep 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 21:29:16 UTC, monarch_dodra 
wrote:
 Bummer...

Anyway, it shouldn't be too hard to express this in a new range.

Sep 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 21:54:39 UTC, Nordlöw wrote:
 Anyway, it shouldn't be too hard to express this in a new range.

I guess what we need is a variant of splitter with a more greedy 
alias template parameter that will digest two or one bytes.

Sep 11 2014

"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:

On Thu, Sep 11, 2014 at 10:31:33PM +0000, "Nordl�w" via Digitalmars-d-learn
wrote:
 On Thursday, 11 September 2014 at 21:54:39 UTC, Nordl�w wrote:
Anyway, it shouldn't be too hard to express this in a new range.

 
 I guess what we need is a variant of splitter with a more greedy alias
 template parameter that will digest two or one bytes.

Why not just use std.regex?

	foreach (line; myInput.split(regex(`\n|\r\n|\r`)))
	{
		...
	}


T

-- 
The problem with the world is that everybody else is stupid.

Sep 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 22:39:40 UTC, H. S. Teoh via 
Digitalmars-d-learn > Why not just use std.regex?
 	foreach (line; myInput.split(regex(`\n|\r\n|\r`)))
 	{
 		...
 	}


 T

I'll try the lazy variant of std.regex

  	foreach (line; myInput.splitter(regex(`\n|\r\n|\r`)))
  	{
  		...
  	}

I wonder if this is compatible with a ctRegex aswell. I'll try 
later.



Thx

Sep 12 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 11 September 2014 at 22:39:40 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 	foreach (line; myInput.split(regex(`\n|\r\n|\r`)))

Shouldn't you use

  	foreach (line; myInput.split(regex("\n|\r\n|\r")))

here?

Sep 12 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 12 September 2014 at 13:25:22 UTC, Nordlöw wrote:
 On Thursday, 11 September 2014 at 22:39:40 UTC, H. S. Teoh via 
 Digitalmars-d-learn wrote:
 	foreach (line; myInput.split(regex(`\n|\r\n|\r`)))

 Shouldn't you use

  	foreach (line; myInput.split(regex("\n|\r\n|\r")))

 here?

Probably not, as (AFAIK) the splitter engine *itself* will *also* 
escape the passed in characters. IE: It literally needs the 
characters '\' and 'n'.

Sep 12 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Friday, 12 September 2014 at 14:16:07 UTC, monarch_dodra wrote:
 Probably not, as (AFAIK) the splitter engine *itself* will 
 *also* escape the passed in characters. IE: It literally needs 
 the characters '\' and 'n'.

I ended up with this.

https://github.com/nordlow/justd/blob/30806a85a5c976f3e891ca11bde3d87a16ecf5e6/algorithm_ex.d#L1858

Does it seem ok?

Sep 12 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - std.range.byLine