www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - splitting numbers from a test file

reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
Hello I am trying to read in a set of numbers from a text file.
The file in questions looks something like this:

35  2  0  1
     0    0.49463548699999998  0.88077994719999997    0
     1    0.60672109949999997  0.2254208717    0


After each line I want to check how many numbers were on the line
I just read. My code to read this file looks like:

1 import std.stdio;
2 import std.conv;
3
4 int main( string[] argv ) {
5    real[] numbers_read;
6    size_t line_count=1;
7
8    auto f = std.stdio.File("test.txt", "r");
9    foreach( char[] s; f.byLine() ) {
10     string line = std.string.strip( to!string(s) );
11     auto parts = std.array.splitter( line );
12     writeln("There are ", parts.length, " numbers in line ",
line_count++);
13     foreach(string p; parts) {
14     numbers_read ~= to!real(p);
15      }
16    }
17    f.close();
18    return 0;
19 }

When I try to compile this I get an error:
test.d(12): Error undefined identifier 'length;

However, shouldn't splitter be returning an array (thats what the
docs seem to show)? What is the type of 'parts'? (I tried using
std.traits to figure this out, but that just generated more
syntax errors for me).

Cheers,

Craig
Sep 18 2012
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Craig Dillabaugh:

 8    auto f = std.stdio.File("test.txt", "r");
 9    foreach( char[] s; f.byLine() ) {
 10     string line = std.string.strip( to!string(s) );
 11     auto parts = std.array.splitter( line );
 12     writeln("There are ", parts.length, " numbers in line ",
 line_count++);
 13     foreach(string p; parts) {
 14     numbers_read ~= to!real(p);
 15      }
 16    }
 17    f.close();
 18    return 0;
 19 }

 When I try to compile this I get an error:
 test.d(12): Error undefined identifier 'length;

Here to!string() is probably unnecessary, it's a wasted allocation. splitter() returns a lazy range that doesn't know its length. To solve your problem there are two main solutions: to use split() instead of splitter(), or to use walkLength() on the range given by splitter(). In theory splitter() should faster, but in practice this isn't always true. Keep in mind that "real" is usually more than 64 bits long, and it's not so fast. Maybe nowdays there are other ways to load that data, I don't know if readfln("%(%f %)%") or something similar works. Bye, bearophile
Sep 18 2012
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 09/18/2012 07:50 PM, Craig Dillabaugh wrote:

 11 auto parts = std.array.splitter( line );
 12 writeln("There are ", parts.length, " numbers in line ",

 When I try to compile this I get an error:
 test.d(12): Error undefined identifier 'length;

That is a very common confusion with ranges.
 However, shouldn't splitter be returning an array (thats what the
 docs seem to show)?

No, parts is a lazy range, which is ready to serve its elements as needed. If you want to convert its elements to an array eagerly, you can call std.array.array: import std.array; // ... writeln("There are ", array(parts).length, " numbers in line ", line_count++);
 What is the type of 'parts'?

writeln(typeid(parts)); or writeln(typeof(parts).stringof); Ali -- D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html
Sep 18 2012
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, September 19, 2012 04:50:45 Craig Dillabaugh wrote:
 Hello I am trying to read in a set of numbers from a text file.
 The file in questions looks something like this:
 
 35  2  0  1
      0    0.49463548699999998  0.88077994719999997    0
      1    0.60672109949999997  0.2254208717    0
 
 
 After each line I want to check how many numbers were on the line
 I just read. My code to read this file looks like:
 
 1 import std.stdio;
 2 import std.conv;
 3
 4 int main( string[] argv ) {
 5    real[] numbers_read;
 6    size_t line_count=1;
 7
 8    auto f = std.stdio.File("test.txt", "r");
 9    foreach( char[] s; f.byLine() ) {
 10     string line = std.string.strip( to!string(s) );
 11     auto parts = std.array.splitter( line );
 12     writeln("There are ", parts.length, " numbers in line ",
 line_count++);
 13     foreach(string p; parts) {
 14     numbers_read ~= to!real(p);
 15      }
 16    }
 17    f.close();
 18    return 0;
 19 }
 
 When I try to compile this I get an error:
 test.d(12): Error undefined identifier 'length;
 
 However, shouldn't splitter be returning an array (thats what the
 docs seem to show)? What is the type of 'parts'? (I tried using
 std.traits to figure this out, but that just generated more
 syntax errors for me).

The docs do not show that splitter returns an array, because it doesn't. It returns a lazy range type which finds each successive element as you iterate over it. It doesn't have a length property, because it's length isn't known until you iterate over it. You have three options: 1. Use std.array.split, which returns an array (so, it's eager and requires additional memory allocations to create the array, but you'll have its length without having to iterate over it multiple times). 2. Use std.range.walkLength to get the length of the range. If a range has a length property, then walkLength just returns that, otherwise it iterates over the whole range and counts its elements. So, you won't get extra memory allocations, but you'll have to iterate over the range twice. 3. Simply count up the number of elements as you iterate over them and _then_ print out the length. Also, theres no need to convert s to a string like that. If you were saving the string or needed an actual string instead of char[], then that would make sense, but you're just splitting it and then converting it to a number. char[] will work just fine for that. So, something like this would probably be better import std.conv; import std.stdio; import std.string; void main() { real[] numbers_read; size_t line_count = 0; auto f = std.stdio.File("test.txt", "r"); foreach(line; f.byLine()) { line = strip(line); auto parts = std.array.splitter(line); size_t length = 0; foreach(p; parts) { numbers_read ~= to!real(p); ++length; } writeln("There are ", length, " numbers in line ", ++line_count); } } If you aren't familiar with ranges, then read this http://ddili.org/ders/d.en/ranges.html But ranges are used quite heavily in Phobos, so you should be familiar with them if you intend to use D. - Jonathan M Davis
Sep 18 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 09/18/2012 09:56 PM, Craig Dillabaugh wrote:
 On Wednesday, 19 September 2012 at 04:03:44 UTC, Jonathan M Davis
 wrote:

 The documentation says that it returns a range.


 From:
 http://dlang.org/phobos/std_array.html#splitter

 The documentation (copied and pasted) for splitter reads:

 auto splitter(C)(C[] s);
 Splits a string by whitespace.

 Example:
 auto a = " a bcd ef gh ";
 assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

It is unfortunate that there is also the other splitter, which at least implies ranges: :-/ http://dlang.org/phobos/std_algorithm.html#splitter Yes, the documentation can be much better. For example, the documentation for the second splitter above looks exacly like the other one, except that one says "using an element as a separator." while the other one says "using another range as a separator". I think it is a ddoc limitation: Template constraints are not included in documentation yet. Ali
Sep 18 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 09/19/2012 10:22 AM, Craig Dillabaugh wrote:

 Thank you for your help. Also Ali thanks for your book, motivated
 by this little problem I've started reading your Chapter on
 ranges. It is very helpful.

Thank you. :) Obviously, I am aware of its shortcomings. Especially, the difference between a container and a range must be stressed. The chapter touches on that idea at different places but I don't think it is spelled out sufficiently. To be improved some time in the future... :) Ali
Sep 19 2012
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Wednesday, 19 September 2012 at 02:58:33 UTC, bearophile wrote:
 Craig Dillabaugh:

 8    auto f = std.stdio.File("test.txt", "r");
 9    foreach( char[] s; f.byLine() ) {
 10     string line = std.string.strip( to!string(s) );
 11     auto parts = std.array.splitter( line );
 12     writeln("There are ", parts.length, " numbers in line ",
 line_count++);
 13     foreach(string p; parts) {
 14     numbers_read ~= to!real(p);
 15      }
 16    }
 17    f.close();
 18    return 0;
 19 }

 When I try to compile this I get an error:
 test.d(12): Error undefined identifier 'length;

Here to!string() is probably unnecessary, it's a wasted allocation. splitter() returns a lazy range that doesn't know its length. To solve your problem there are two main solutions: to use split() instead of splitter(), or to use walkLength() on the range given by splitter(). In theory splitter() should faster, but in practice this isn't always true. Keep in mind that "real" is usually more than 64 bits long, and it's not so fast. Maybe nowdays there are other ways to load that data, I don't know if readfln("%(%f %)%") or something similar works. Bye, bearophile

Thanks very much. I tried the strip() without to!string and got a syntax error when I tried to compile. Cheers, Craig
Sep 18 2012
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Wednesday, 19 September 2012 at 03:12:21 UTC, Jonathan M Davis
wrote:
 On Wednesday, September 19, 2012 04:50:45 Craig Dillabaugh 
 wrote:
 Hello I am trying to read in a set of numbers from a text file.
 The file in questions looks something like this:
 
 35  2  0  1
      0    0.49463548699999998  0.88077994719999997    0
      1    0.60672109949999997  0.2254208717    0
 
 
 After each line I want to check how many numbers were on the 
 line
 I just read. My code to read this file looks like:
 
 1 import std.stdio;
 2 import std.conv;
 3
 4 int main( string[] argv ) {
 5    real[] numbers_read;
 6    size_t line_count=1;
 7
 8    auto f = std.stdio.File("test.txt", "r");
 9    foreach( char[] s; f.byLine() ) {
 10     string line = std.string.strip( to!string(s) );
 11     auto parts = std.array.splitter( line );
 12     writeln("There are ", parts.length, " numbers in line ",
 line_count++);
 13     foreach(string p; parts) {
 14     numbers_read ~= to!real(p);
 15      }
 16    }
 17    f.close();
 18    return 0;
 19 }
 
 When I try to compile this I get an error:
 test.d(12): Error undefined identifier 'length;
 
 However, shouldn't splitter be returning an array (thats what 
 the
 docs seem to show)? What is the type of 'parts'? (I tried using
 std.traits to figure this out, but that just generated more
 syntax errors for me).

The docs do not show that splitter returns an array, because it doesn't. It returns a lazy range type which finds each successive element as you iterate over it. It doesn't have a length property, because it's length isn't known until you iterate over it. You have three options:

Thanks, a few others have pointed that out to me too. But as a D newbie how would I have any clue what splitter returns since the return type is auto? The is an example in the docs. auto a = " a bcd ef gh "; assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][])); I guessed that since the return of splitter was equal to : ["", "a", "bcd", "ef", "gh"][] it was returning some sort of 2D array! When a function returns an 'auto' in the Phobos is this generally indicative of the return value being a range?
 1. Use std.array.split, which returns an array (so, it's eager 
 and requires
 additional memory allocations to create the array, but you'll 
 have its length
 without having to iterate over it multiple times).

 2. Use std.range.walkLength to get the length of the range. If 
 a range has a
 length property, then walkLength just returns that, otherwise 
 it iterates over
 the whole range and counts its elements. So, you won't get 
 extra memory
 allocations, but you'll have to iterate over the range twice.

 3. Simply count up the number of elements as you iterate over 
 them and _then_
 print out the length.

 Also, theres no need to convert s to a string like that. If you 
 were saving
 the string or needed an actual string instead of char[], then 
 that would make
 sense, but you're just splitting it and then converting it to a 
 number. char[]
 will work just fine for that. So, something like this would 
 probably be better

to remove leading/trailing whitespace and I was getting syntax errors when I called strip() on the char[]. Just calling split works as you say.
 import std.conv;
 import std.stdio;
 import std.string;

 void main()
 {
     real[] numbers_read;
     size_t line_count = 0;

     auto f = std.stdio.File("test.txt", "r");
     foreach(line; f.byLine())
     {
         line = strip(line);
         auto parts = std.array.splitter(line);
         size_t length = 0;

         foreach(p; parts)
         {
             numbers_read ~= to!real(p);
             ++length;
         }

         writeln("There are ", length, " numbers in line ", 
 ++line_count);
     }
 }

 If you aren't familiar with ranges, then read this

 http://ddili.org/ders/d.en/ranges.html

 But ranges are used quite heavily in Phobos, so you should be 
 familiar with
 them if you intend to use D.

 - Jonathan M Davis

Sep 18 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, September 19, 2012 05:36:36 Craig Dillabaugh wrote:
 Thanks, a few others have pointed that out to me too.  But as a D
 newbie how would I have any clue what splitter returns since the
 return type is auto?

The documentation says that it returns a range. Presumably then, the problem is that you're not familiar with ranges, and that needs to be handled better. We really need a proper article/tutorial on the main site which explains them, and we don't. But I don't know what we'd do differently in the documentation for functions in general. Ranges are a concept that are used quite heavily in Phobos, and it wouldn't make sense to try and explain them for every function that uses them.
 The is an example in the docs.
 
 auto a = " a     bcd   ef gh ";
 assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

It would have used == if it were an array. equal operates on ranges, so if it's used, odds are that the types on the right and left sides are different.
 I guessed that since the return of splitter was equal to :
 ["", "a", "bcd", "ef", "gh"][]
 it was returning some sort of 2D array!
 
 When a function returns an 'auto' in the Phobos is this generally
 indicative of the return value being a range?

That's the most common, but it's not always the case. It will usually say in the documentation though (and if you're familiar with ranges, it's generally fairly obvious if the return type is a range just based on what the function is doing), and in this case it does.
 I think my problem was that I was trying to call strip on it first
 to remove leading/trailing whitespace and I was getting syntax
 errors when I called strip() on the char[]. Just calling split works as
 you say.

strip works just fine on a char[]. I don't know why you were having problems with it. Maybe you're using an older release of the compiler and strip used to take a string rather than being templated on character type? I don't know. If you're on 2.060 though, strip should work just fine with char[]. - Jonathan M Davis
Sep 18 2012
prev sibling next sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Wednesday, 19 September 2012 at 04:03:44 UTC, Jonathan M Davis
wrote:
 On Wednesday, September 19, 2012 05:36:36 Craig Dillabaugh 
 wrote:
 Thanks, a few others have pointed that out to me too.  But as 
 a D
 newbie how would I have any clue what splitter returns since 
 the
 return type is auto?

The documentation says that it returns a range. Presumably then, the problem is that you're not familiar with ranges, and that needs to be handled better. We really need a proper article/tutorial on the main site which explains them, and we don't. But I don't know what we'd do differently in the documentation for functions in general. Ranges are a concept that are used quite heavily in Phobos, and it wouldn't make sense to try and explain them for every function that uses them.

http://dlang.org/phobos/std_array.html#splitter The documentation (copied and pasted) for splitter reads: auto splitter(C)(C[] s); Splits a string by whitespace. Example: auto a = " a bcd ef gh "; assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][])); I have this awful feeling that I am missing something blatantly obvious here, and that by posting this reply I am leaving a permanent testament to my stupidity on the internet, but I really want to understand this ... I just want to figure out how you can explicitly say "the documentation says it returns a range" based on that! Is is simply because you recognize the range from the assert statement in the example? I am sure the Phobos developers have better things to do then writing documentation that coddles newbies, but could the documentation not say: auto splitter(C)(C[] s); Splits a string by whitespace. Returns an InputRange of all substrings. Or something to that affect. Thanks again for your time. clip ....
Sep 18 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, September 19, 2012 06:56:23 Craig Dillabaugh wrote:
 From:
 http://dlang.org/phobos/std_array.html#splitter

Ah. I was looking at std.algorithm.splitter (which operates on generic ranges and separators) which _does_ explicitly say that it returns a range. Yeah. The documentation on std.array.splitter is incredibly sparse. It doesn't even state the result is lazy (though if it did, it would be bound to say that it was a lazy range, which would then mean that it was stating that the return type was a range), making the difference between it and split not at all obvious. That should be fixed. Internally, it just does return std.algorithm.splitter!(std.uni.isWhite)(s); - Jonathan M Davis
Sep 18 2012
prev sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Wednesday, 19 September 2012 at 06:09:38 UTC, Ali Çehreli
wrote:
 On 09/18/2012 09:56 PM, Craig Dillabaugh wrote:
 On Wednesday, 19 September 2012 at 04:03:44 UTC, Jonathan M

 wrote:

 The documentation says that it returns a range.


 From:
 http://dlang.org/phobos/std_array.html#splitter

 The documentation (copied and pasted) for splitter reads:

 auto splitter(C)(C[] s);
 Splits a string by whitespace.

 Example:
 auto a = " a bcd ef gh ";
 assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

It is unfortunate that there is also the other splitter, which at least implies ranges: :-/ http://dlang.org/phobos/std_algorithm.html#splitter Yes, the documentation can be much better. For example, the documentation for the second splitter above looks exacly like the other one, except that one says "using an element as a separator." while the other one says "using another range as a separator". I think it is a ddoc limitation: Template constraints are not included in documentation yet. Ali

Ali and Johnathan: Thank you for your help. Also Ali thanks for your book, motivated by this little problem I've started reading your Chapter on ranges. It is very helpful. Cheers, Craig
Sep 19 2012