digitalmars.D.learn - line terminators

NonNull (17/17) Sep 28 2022 Hello,

rassoc (19/23) Sep 28 2022 If you have structured data, you can use byRecord [1] to read the import...

NonNull (2/7) Sep 28 2022 Thanks --- very helpful.

NonNull <non-null use.startmail.com> writes:

Hello,

I notice that readln from std.stdio has '\n' as the default line 
terminator. What about multiple line terminators in UTF-8 being 
used in one input file, such as '\n', NEL, LS, PS? And in Windows 
"\r\n" is a line terminator, and what if NEL, LS, PS exist in a 
Windows UTF-8 text file as well?

The following explains the details when reading Unicode.
https://en.wikipedia.org/wiki/Newline#Unicode

If I want to read a text file line by line, treating any one of 
these things as a newline, there would seem to be no canned way 
to do that in std.stdio .

1.) What is the best way to achieve this result in D?

It is convenient to read text files of unknown origin in this 
fashion. Additionally discarding the newlines however they are 
represented is convenient.

2.) What about reading UTF-16LE text files line by line (e.g. 
from Windows, with a BOM)?

Sep 28 2022

rassoc <rassoc posteo.de> writes:

On 9/28/22 21:36, NonNull via Digitalmars-d-learn wrote:
 If I want to read a text file line by line, treating any one of these things
as a newline, there would seem to be no canned way to do that in std.stdio .
 
 1.) What is the best way to achieve this result in D?
 

If you have structured data, you can use byRecord [1] to read the important
parts right into a tuple.

Should it be unstructured data, then there's lineSplitter [2] which handles all
of the above newline specifics, I think. Sadly, it's working on text and not on
files.

So, for small files you could just read the whole file via readText [3] and
process it via lineSplitter.

If the file is rather large, then there's the option to use memory-mapped files
instead:

```d
import std;
void main() {
	scope mmfile = new MmFile("largefile.txt");
	auto data = cast(string) mmfile[];
	foreach (line; data.lineSplitter) {
		// process line, os handles memory-mapped file buffering
	}
}
```

Hope that helps.

[1] https://dlang.org/library/std/stdio/file.by_record.html
[2] https://dlang.org/library/std/string/line_splitter.html
[3] https://dlang.org/library/std/file/read_text.html

Sep 28 2022

NonNull <non-null use.startmail.com> writes:

On Wednesday, 28 September 2022 at 21:17:16 UTC, rassoc wrote:
 On 9/28/22 21:36, NonNull via Digitalmars-d-learn wrote:
 [...]

 If you have structured data, you can use byRecord [1] to read 
 the important parts right into a tuple.

 [...]

Thanks --- very helpful.

Sep 28 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - line terminators