www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - File.byLine ought to return dups?

reply Graham Fawcett <fawcett uwindsor.ca> writes:
Hi folks,

I expected the following program to print the lines of a file in
reverse:

    // bad.d
    import std.stdio;
    import std.range;
    
    void main() {
      auto f = File("bad.d");
      foreach(char[] line; retro(array(f.byLine())))
        writeln(line);
    }

However, this produces very unusual output: fragments of the same
lines are printed repeatedly. I suspect it's because the byLine()
'generator' is not dup'ing the arrays it reads from the file.

This works as expected:

    // good.d
    import std.stdio;
    import std.range;
    
    Retro!(char[][]) retroLines(File f) {
     char[][] lines;
      foreach(line; f.byLine())
        lines ~= line.dup;         // note the .dup
      return retro(lines);
     }
    
    void main() {
      auto f = File("good.d");
      foreach(line; retroLines(f))
        writeln(line);
    }

If you remove the '.dup', then this behaves badly as well.

So is this a bug in File.byLine, or am I just using it badly? :)

Thanks,
Graham
Jun 04 2010
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 04 Jun 2010 15:27:03 -0400, Graham Fawcett <fawcett uwindsor.ca>  
wrote:

 Hi folks,

 I expected the following program to print the lines of a file in
 reverse:

     // bad.d
     import std.stdio;
     import std.range;
    void main() {
       auto f = File("bad.d");
       foreach(char[] line; retro(array(f.byLine())))
         writeln(line);
     }

 However, this produces very unusual output: fragments of the same
 lines are printed repeatedly. I suspect it's because the byLine()
 'generator' is not dup'ing the arrays it reads from the file.

 This works as expected:

     // good.d
     import std.stdio;
     import std.range;
    Retro!(char[][]) retroLines(File f) {
      char[][] lines;
       foreach(line; f.byLine())
         lines ~= line.dup;         // note the .dup
       return retro(lines);
      }
    void main() {
       auto f = File("good.d");
       foreach(line; retroLines(f))
         writeln(line);
     }

 If you remove the '.dup', then this behaves badly as well.

 So is this a bug in File.byLine, or am I just using it badly? :)

The latter. File is re-using the buffer for each line, so you are seeing the data get overwritten. This is for performance reasons. Not everyone wants incur heap allocations for every line of a file ;) As you showed, it's possible to get the desired behavior if you need it. The reverse would be impossible. Now, that being said, a nice addition would be to create a duper range that allows you to do one expression: foreach(char[] line; retro(array(duper(f.byLine())))) -Steve
Jun 04 2010
prev sibling next sibling parent Adam Ruppe <destructionator gmail.com> writes:
Looking at the code, it appears to be by design, though not explictly
documented - it probably should be.

Something to note is if you specifically ask for a string in the
foreach, it will refuse to compile - the compiler knows it is a
mutable char[] too.
Jun 04 2010
prev sibling parent Graham Fawcett <fawcett uwindsor.ca> writes:
Sorry for the double-post to .announce -- I had deleted my .announce 
post, but obviously not thoroughly enough. I'll follow up on the list.

Graham


On Fri, 04 Jun 2010 15:41:43 -0400, Steven Schveighoffer wrote:

 On Fri, 04 Jun 2010 15:27:03 -0400, Graham Fawcett <fawcett uwindsor.ca>
 wrote:
 
 Hi folks,

 I expected the following program to print the lines of a file in
 reverse:

     // bad.d
     import std.stdio;
     import std.range;
    void main() {
       auto f = File("bad.d");
       foreach(char[] line; retro(array(f.byLine())))
         writeln(line);
     }

 However, this produces very unusual output: fragments of the same lines
 are printed repeatedly. I suspect it's because the byLine() 'generator'
 is not dup'ing the arrays it reads from the file.

 This works as expected:

     // good.d
     import std.stdio;
     import std.range;
    Retro!(char[][]) retroLines(File f) {
      char[][] lines;
       foreach(line; f.byLine())
         lines ~= line.dup;         // note the .dup
       return retro(lines);
      }
    void main() {
       auto f = File("good.d");
       foreach(line; retroLines(f))
         writeln(line);
     }

 If you remove the '.dup', then this behaves badly as well.

 So is this a bug in File.byLine, or am I just using it badly? :)

The latter. File is re-using the buffer for each line, so you are seeing the data get overwritten. This is for performance reasons. Not everyone wants incur heap allocations for every line of a file ;) As you showed, it's possible to get the desired behavior if you need it. The reverse would be impossible. Now, that being said, a nice addition would be to create a duper range that allows you to do one expression: foreach(char[] line; retro(array(duper(f.byLine())))) -Steve

Jun 04 2010