digitalmars.D.learn - Reading files using delimiters/terminators

Rekel (12/12) Dec 26 2020 I'm trying to read a file with entries seperated by '\n\n' (empty

Jesse Phillips (6/18) Dec 26 2020 Unfortunately std.csv is character based and not string.

Rekel (3/8) Dec 27 2020 But I'm not using csv right? Additionally, shouldnt byLine also

Jesse Phillips (4/15) Dec 27 2020 Right, you weren't using csv. I'm not familiar with the file

=?UTF-8?Q?Ali_=c3=87ehreli?= (16/30) Dec 26 2020 byLine should work:
oddp (10/12) Dec 27 2020 For that specific puzzle I simply did:

Rekel (15/25) Dec 27 2020 Oh my, all these things are new to me, haha, thanks a lot! I'll

Rekel (4/7) Dec 27 2020 Update;

Mike Parker (7/10) Dec 27 2020 The very first paragraph at the top of the `std.file`

=?UTF-8?Q?Ali_=c3=87ehreli?= (30/41) Dec 27 2020 splitter() is a lazy range algorithm. split() is a range algorithm as
oddp (14/22) Dec 27 2020 split gives you a newly allocated array with the results, splitter is la...

Rekel (3/4) Dec 28 2020 This seems very promising :)

Steven Schveighoffer (7/21) Dec 29 2020 Are you on Windows? If so, your double newlines might be \r\n\r\n,

Rekel (12/16) Dec 30 2020 I've tried \r\n\r\n as well, which sadly also did not work.

Rekel <paultjeadriaanse gmail.com> writes:

I'm trying to read a file with entries seperated by '\n\n' (empty 
line), with entries containing '\n'. I thought the 
File.readLine(KeepTerminator, Terminator) might work, as it seems 
to accept strings as terminators, since there seems to have been 
a thread regarding '\r\n' seperators.

I don't know if there's some underlying reason, but when I try to 
use "\n\n" as a terminator, I end up getting the entire file into 
1 char[], so it's not delimited.

Should this work or is there a reason one cannot use byLine like 
this?

For context, I'm trying this with the puzzle input of day 6 of 
this year's advent of code. (https://adventofcode.com/)

Dec 26 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Sunday, 27 December 2020 at 00:13:30 UTC, Rekel wrote:
 I'm trying to read a file with entries seperated by '\n\n' 
 (empty line), with entries containing '\n'. I thought the 
 File.readLine(KeepTerminator, Terminator) might work, as it 
 seems to accept strings as terminators, since there seems to 
 have been a thread regarding '\r\n' seperators.

 I don't know if there's some underlying reason, but when I try 
 to use "\n\n" as a terminator, I end up getting the entire file 
 into 1 char[], so it's not delimited.

 Should this work or is there a reason one cannot use byLine 
 like this?

 For context, I'm trying this with the puzzle input of day 6 of 
 this year's advent of code. (https://adventofcode.com/)

Unfortunately std.csv is character based and not string. 
https://dlang.org/phobos/std_csv.html#.csvReader

But your use case sounds like splitter is more aligned with your 
needs.

https://dlang.org/phobos/std_algorithm_iteration.html#.splitter

Dec 26 2020

Rekel <paultjeadriaanse gmail.com> writes:

On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips wrote:
 Unfortunately std.csv is character based and not string. 
 https://dlang.org/phobos/std_csv.html#.csvReader

 But your use case sounds like splitter is more aligned with 
 your needs.

 https://dlang.org/phobos/std_algorithm_iteration.html#.splitter

But I'm not using csv right? Additionally, shouldnt byLine also 
work with "\r\n"?

Dec 27 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Sunday, 27 December 2020 at 13:21:44 UTC, Rekel wrote:
 On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips 
 wrote:
 Unfortunately std.csv is character based and not string. 
 https://dlang.org/phobos/std_csv.html#.csvReader

 But your use case sounds like splitter is more aligned with 
 your needs.

 https://dlang.org/phobos/std_algorithm_iteration.html#.splitter

 But I'm not using csv right? Additionally, shouldnt byLine also 
 work with "\r\n"?

Right, you weren't using csv. I'm not familiar with the file 
terminater to known why it didn't work.

byline would allow \r\n as well as \n

Dec 27 2020

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/26/20 4:13 PM, Rekel wrote:
 I'm trying to read a file with entries seperated by '\n\n' (empty line), 
 with entries containing '\n'. I thought the 
 File.readLine(KeepTerminator, Terminator) might work, as it seems to 
 accept strings as terminators, since there seems to have been a thread 
 regarding '\r\n' seperators.
 
 I don't know if there's some underlying reason, but when I try to use 
 "\n\n" as a terminator, I end up getting the entire file into 1 char[], 
 so it's not delimited.
 
 Should this work or is there a reason one cannot use byLine like this?
 
 For context, I'm trying this with the puzzle input of day 6 of this 
 year's advent of code. (https://adventofcode.com/)

byLine should work:

import std.stdio;

void main() {
   auto f = File("deneme.d");

   // Warning: byLine reuses an internal buffer. Call byLineCopy
   // if potentially parsed strings into the line need to persist.
   foreach (line; f.byLine) {
     if (line.length == 0) {
       writeln("EMPTY LINE");

     } else {
       writeln(line);
     }
   }
}

Ali

Dec 26 2020

oddp <oddp posteo.de> writes:

On 27.12.20 01:13, Rekel via Digitalmars-d-learn wrote:
 For context, I'm trying this with the puzzle input of day 6 of this year's
advent of code. 
 (https://adventofcode.com/)

For that specific puzzle I simply did:

foreach (group; readText("input").splitter("\n\n")) { ... }

Since the input is never that big, I prefer reading in the whole thing and then
do the processing.

Also, on other days, when the input is more uniform, there's always 
https://dlang.org/library/std/file/slurp.html which makes reading it in even
easier, e.g. day02:

alias Record = Tuple!(int, "low", int, "high", char, "needle", string, "hay");
auto input = slurp!Record("input", "%d-%d %s: %s");

P.S.: would've loved to have had multiwayIntersection in the stdlib for day06
part2, especially when 
there's already multiwayUnion in setops. fold!setIntersection felt a bit clunky.

Dec 27 2020

Rekel <paultjeadriaanse gmail.com> writes:

On Sunday, 27 December 2020 at 13:27:49 UTC, oddp wrote:
 foreach (group; readText("input").splitter("\n\n")) { ... }

 Also, on other days, when the input is more uniform, there's 
 always https://dlang.org/library/std/file/slurp.html which 
 makes reading it in even easier, e.g. day02:

 alias Record = Tuple!(int, "low", int, "high", char, "needle", 
 string, "hay");
 auto input = slurp!Record("input", "%d-%d %s: %s");

 P.S.: would've loved to have had multiwayIntersection in the 
 stdlib for day06 part2, especially when there's already 
 multiwayUnion in setops. fold!setIntersection felt a bit clunky.

Oh my, all these things are new to me, haha, thanks a lot! I'll 
be looking into those (slurp & tuple). By the way, is there a 
reason to use either 'splitter' or 'split'? I'm not sure I see 
why the difference would matter in the end.

Sidetangent, don't mean to bash the learning tour, as it's been 
really useful for getting started, but I'm surprised stuff like 
tuples and files arent mentioned there.
Especially since the documentation tends to trip me up, with 
stuff like 'isSomeString' mentioning 'built in string types', 
while I haven't been able to find that concept elsewhere, let 
alone functionality one can expect in this case (like .length and 
the like), and stuff like 'countUntil' not being called 
'indexOf', although it also exists and does basically the same 
thing. Also assumeUnique seems to be a thing?

Dec 27 2020

Rekel <paultjeadriaanse gmail.com> writes:

On Sunday, 27 December 2020 at 23:12:46 UTC, Rekel wrote:
 Sidetangent, don't mean to bash the learning tour, as it's been 
 really useful for getting started, but I'm surprised stuff like 
 tuples and files arent mentioned there.

Update;
Any clue why there's both "std.file" and "std.io.File"?
I was mostly unaware of the former.

Dec 27 2020

Mike Parker <aldacron gmail.com> writes:

On Sunday, 27 December 2020 at 23:18:37 UTC, Rekel wrote:

 Update;
 Any clue why there's both "std.file" and "std.io.File"?
 I was mostly unaware of the former.

The very first paragraph at the top of the `std.file` 
documentation explains it:

"Functions in this module handle files as a unit, e.g., read or 
write one file at a time. For opening files and manipulating them 
via handles refer to module std.stdio."

https://dlang.org/phobos/std_file.html

Dec 27 2020

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/27/20 3:12 PM, Rekel wrote:

 is there a reason to use
 either 'splitter' or 'split'? I'm not sure I see why the difference
 would matter in the end.

splitter() is a lazy range algorithm. split() is a range algorithm as 
well but it is eager; it will put the results in an array that it grows. 
The string elements would not be copies of the original range; they will 
still be just the pair of .ptr and .length but it can be expensive if 
there are a lot of parts.

Further, if you want to process just a small number of the initial 
parts, then being eager would be wasteful.

As all lazy range algorithms, splitter() is just an iteration object 
waiting to be used. It does not allocate any array but serves the parts 
one by one. You can filter the parts as you iterate over or you can stop 
at any point. For example, the following would take the first 3 
non-empty lines:

import std.stdio;
import std.range;
import std.algorithm;

void main() {
   auto s = "hello\n\nworld\n\n\nand\nmoon";
   writefln!"%(%s, %)"(s.splitter('\n').filter!(part => 
!part.empty).take(3));
}

 Sidetangent, don't mean to bash the learning tour, as it's been really
 useful for getting started, but I'm surprised stuff like tuples and
 files arent mentioned there.

Alternative place to search: :)

   http://ddili.org/ders/d.en/ix.html

 Especially since the documentation tends to trip me up, with stuff like
 'isSomeString' mentioning 'built in string types', while I haven't been
 able to find that concept elsewhere,

Built in strings are just arrays of character types: char[], wchar[], 
and dchar[]. Commonly used by their respective immutable aliases: 
string, wstring, and dstring.

 'countUntil' not being called 'indexOf'

countUntil() is more general because it works with any range while 
indexOf requires a string.

 assumeUnique seems to be a thing?

That appears in the index I posted above as well. ;)

Ali

Dec 27 2020

oddp <oddp posteo.de> writes:

On 28.12.20 00:12, Rekel via Digitalmars-d-learn wrote:
 is there a reason to use either 'splitter' or 'split'?

split gives you a newly allocated array with the results, splitter is lazy
equivalent and doesn't 
allocate. Feel free using either, doesn't matter much with these small puzzle
inputs.

 Sidetangent, don't mean to bash the learning tour, as it's been really useful
for getting started, 
 but I'm surprised stuff like tuples and files arent mentioned there.
 Especially since the documentation tends to trip me up, with stuff like
'isSomeString' mentioning 
 'built in string types', while I haven't been able to find that concept
elsewhere, let alone 
 functionality one can expect in this case (like .length and the like), and
stuff like 'countUntil' 
 not being called 'indexOf', although it also exists and does basically the
same thing. Also 
 assumeUnique seems to be a thing?

Might be worth discussing that in a new topic. The stdlib is vast and has tons
of useful utilities, 
not all of which can be explained in detail in a series of overview posts.
Ali's "Programming in D" 
[1], which has a free online version, functions as an excellent in-depth
introduction to the 
language, going over all the important topics.

Regarding function names and docs: Yes, some might seem slightly off coming
from other languages 
(e.g. find vs. dropWhile, until vs. takeWhile, cumulativeFold vs
scan/accumulate, etc.), but it's 
all in there somewhere, implemented with the most care to not waste precious
cycles. Might makes it 
harder to grok going over the implementation or docs for very the first time,
but it gets easier 
after a while. Furthermore, alternative names are often times mentioned in the
docs so a quick 
google search should bring you to the right place.

[1] http://ddili.org/ders/d.en/index.html

Dec 27 2020

Rekel <paultjeadriaanse gmail.com> writes:

 http://ddili.org/ders/d.en/index.html

This seems very promising :)
I doubt I'd still be considering D if it weren't for this awesome 
learning forum, thanks for all the help!

Dec 28 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/26/20 7:13 PM, Rekel wrote:
 I'm trying to read a file with entries seperated by '\n\n' (empty line), 
 with entries containing '\n'. I thought the 
 File.readLine(KeepTerminator, Terminator) might work, as it seems to 
 accept strings as terminators, since there seems to have been a thread 
 regarding '\r\n' seperators.
 
 I don't know if there's some underlying reason, but when I try to use 
 "\n\n" as a terminator, I end up getting the entire file into 1 char[], 
 so it's not delimited.
 
 Should this work or is there a reason one cannot use byLine like this?
 
 For context, I'm trying this with the puzzle input of day 6 of this 
 year's advent of code. (https://adventofcode.com/)

Are you on Windows? If so, your double newlines might be \r\n\r\n, 
depending on what editor you used to create the input. Use a hexdump 
program to see what the newlines are in your input file.

Now, you would think that the underlying C stream would do this for you. 
I'm not sure how it works exactly, as I don't use Windows.

-Steve

Dec 29 2020

Rekel <paultjeadriaanse gmail.com> writes:

On Tuesday, 29 December 2020 at 14:50:41 UTC, Steven 
Schveighoffer wrote:
 Are you on Windows? If so, your double newlines might be 
 \r\n\r\n, depending on what editor you used to create the 
 input. Use a hexdump program to see what the newlines are in 
 your input file.

I've tried \r\n\r\n as well, which sadly also did not work.
Using vscode I have also switched between CRLF and LF, which also 
did not do the trick.
I'm getting the sense the implementation might have a specific 
workaround for \r\n / CRLF line-endings, though I haven't checked 
the sourcecode yet.

Note that this is not really a problem for me specifically, I've 
long used a different approach, however it seemed like a design 
issue. I'll try replicating this in isolation later, maybe 
something was wrong last time I tried.

Dec 30 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Reading files using delimiters/terminators