www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - What's the simplest way to read a file token by token?

reply "Carl Sturtivant" <sturtivant gmail.com> writes:
What's the simplest way in D to read a file token by token, where 
the read tokens are D strings, and they are separated in the file 
by arbitrary non-zero amounts of white space (including spaces, 
tabs and newlines at a minimum)?
Aug 10 2013
parent reply "Carl Sturtivant" <sturtivant gmail.com> writes:
On Saturday, 10 August 2013 at 17:09:29 UTC, Carl Sturtivant 
wrote:
 What's the simplest way in D to read a file token by token, 
 where the read tokens are D strings, and they are separated in 
 the file by arbitrary non-zero amounts of white space 
 (including spaces, tabs and newlines at a minimum)?
I couldn't find a function that did just this, and various ways I implemented it seemed too complex. Is there such a function in a D library?
Aug 10 2013
next sibling parent reply Tobias Pankrath <lists pankrath.net> writes:
On 10.08.2013 19:34, Carl Sturtivant wrote:
 On Saturday, 10 August 2013 at 17:09:29 UTC, Carl Sturtivant wrote:
 What's the simplest way in D to read a file token by token, where the
 read tokens are D strings, and they are separated in the file by
 arbitrary non-zero amounts of white space (including spaces, tabs and
 newlines at a minimum)?
I couldn't find a function that did just this, and various ways I implemented it seemed too complex. Is there such a function in a D library?
There are some candidates for std.d.lexer on the way. Try for example: https://github.com/Hackerpilot/Dscanner/blob/master/stdx/d/lexer.d
Aug 10 2013
parent "Carl Sturtivant" <sturtivant gmail.com> writes:
On Saturday, 10 August 2013 at 18:45:55 UTC, Tobias Pankrath 
wrote:
 There are some candidates for std.d.lexer on the way. Try for 
 example:
 https://github.com/Hackerpilot/Dscanner/blob/master/stdx/d/lexer.d
OK, but that seems to solve a more difficult problem: my tokens are all separated by non-empty white space. In other languages I've found a function that simply reads token by token in this simple situation.
Aug 10 2013
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, August 10, 2013 19:34:20 Carl Sturtivant wrote:
 On Saturday, 10 August 2013 at 17:09:29 UTC, Carl Sturtivant
 
 wrote:
 What's the simplest way in D to read a file token by token,
 where the read tokens are D strings, and they are separated in
 the file by arbitrary non-zero amounts of white space
 (including spaces, tabs and newlines at a minimum)?
I couldn't find a function that did just this, and various ways I implemented it seemed too complex. Is there such a function in a D library?
If you have a string (or any range of dchar) already, you can use std.algorith.splitter: import std.algorithm; void main() { auto str = "hello world goodbye charlie."; assert(equal(splitter(str), ["hello", "world", "goodbye", "charlie."])); } However, reading from a file is quite a bit more problematic, as we don't have proper stream stuff yet (we're still waiting on std.io to be finished so that we can have that). And that means that what we have for reading files is a lot less flexible. In general, you're probably going to be reading it in line by line with std.stdio.byLine, in chunks of bytes via std.stdio.byChunk, or all at once with std.file.readText. Something that does what you want could certainly be built on top of either byLine or byChunk without a lot of effort, but it obviously doesn't work right out of the box. readText will work great (since you can just use splitter on its result), but it does mean reading the entire file in at once. Still, in most cases, that's what I'd do. It's only going to be a problem if the file is going to be particularly large, and since splitter is just slicing the string that you give it (rather than copying it), you shouldn't end up with the file in memory more than once. At some point, we will have full, range-compatible stream support in Phobos, and the situation will definitely improve, but for now, those are probably your best options. - Jonathan M Davis
Aug 10 2013
parent "Carl Sturtivant" <sturtivant gmail.com> writes:
Thank you so much, that's exactly the kind of reply I was seeking!

On Sunday, 11 August 2013 at 00:13:10 UTC, Jonathan M Davis wrote:
 On Saturday, August 10, 2013 19:34:20 Carl Sturtivant wrote:
 On Saturday, 10 August 2013 at 17:09:29 UTC, Carl Sturtivant
 
 wrote:
 What's the simplest way in D to read a file token by token,
 where the read tokens are D strings, and they are separated 
 in
 the file by arbitrary non-zero amounts of white space
 (including spaces, tabs and newlines at a minimum)?
I couldn't find a function that did just this, and various ways I implemented it seemed too complex. Is there such a function in a D library?
If you have a string (or any range of dchar) already, you can use std.algorith.splitter: import std.algorithm; void main() { auto str = "hello world goodbye charlie."; assert(equal(splitter(str), ["hello", "world", "goodbye", "charlie."])); } However, reading from a file is quite a bit more problematic, as we don't have proper stream stuff yet (we're still waiting on std.io to be finished so that we can have that). And that means that what we have for reading files is a lot less flexible. In general, you're probably going to be reading it in line by line with std.stdio.byLine, in chunks of bytes via std.stdio.byChunk, or all at once with std.file.readText. Something that does what you want could certainly be built on top of either byLine or byChunk without a lot of effort, but it obviously doesn't work right out of the box. readText will work great (since you can just use splitter on its result), but it does mean reading the entire file in at once. Still, in most cases, that's what I'd do. It's only going to be a problem if the file is going to be particularly large, and since splitter is just slicing the string that you give it (rather than copying it), you shouldn't end up with the file in memory more than once. At some point, we will have full, range-compatible stream support in Phobos, and the situation will definitely improve, but for now, those are probably your best options. - Jonathan M Davis
Aug 11 2013