www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading large files, writing large files?

reply AEon <aeon2001 lycos.de> writes:
Rethinking the way I normally handle files, since now I am faced with 
possibly very huge (100MB and) log files. Dito I need to save large log 
files. So it does not seem to be a good idea to use my, sofar preferred 
method:

// Ensure file exists
if( ! std.file.exists(cfgPathFile) )
...

// Read complete cfg file into array, removes \r\n via splitlines()
char[][] cfgText = std.string.splitlines( cast(char[]) 
std.file.read(cfgPathFile) );

Etc... I have very much come to like splitlines, and read, but with
100 MB log files, loading all that into RAM may turn out ugly?


Let's say I'd ignore the RAM issue for a moment, how would I properly 
use std.file.write() to write into a file?


The method I fear will need to be applied for such huge files is 
something like this (posted by Martin in this newsgroup):

import std.stream;

void readfile(char[] fn)
{
     File f = new File();
     char[] l;
     f.open(fn);
     while(!f.eof())
     {
         l = f.readLine();
         printf("line: %.*s\n", l);
     }
     f.close();
}

That would be pretty much the ANSI C way... ieek :)... Is there any way 
to avoid the latter method? And go the nicer D way, as in the first code 
example?

AEon
Mar 27 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 28 Mar 2005 05:13:36 +0200, AEon <aeon2001 lycos.de> wrote:
 Rethinking the way I normally handle files, since now I am faced with  
 possibly very huge (100MB and) log files. Dito I need to save large log  
 files. So it does not seem to be a good idea to use my, sofar preferred  
 method:

 // Ensure file exists
 if( ! std.file.exists(cfgPathFile) )
 ...

 // Read complete cfg file into array, removes \r\n via splitlines()
 char[][] cfgText = std.string.splitlines( cast(char[])  
 std.file.read(cfgPathFile) );

 Etc... I have very much come to like splitlines, and read, but with
 100 MB log files, loading all that into RAM may turn out ugly?


 Let's say I'd ignore the RAM issue for a moment, how would I properly  
 use std.file.write() to write into a file?


 The method I fear will need to be applied for such huge files is  
 something like this (posted by Martin in this newsgroup):

 import std.stream;

 void readfile(char[] fn)
 {
      File f = new File();
      char[] l;
      f.open(fn);
      while(!f.eof())
      {
          l = f.readLine();
          printf("line: %.*s\n", l);
      }
      f.close();
 }

 That would be pretty much the ANSI C way... ieek :)... Is there any way  
 to avoid the latter method? And go the nicer D way, as in the first code  
 example?

Try this... import std.c.stdlib; import std.stream; import std.stdio; class LineReader(Source) : Source { int opApply(int delegate(inout char[]) dg) { int result = 0; char[] line; while(!eof()) { line = readLine(); if (!line) break; result = dg(line); if (result) break; } return result; } int opApply(int delegate(inout size_t, inout char[]) dg) { int result = 0; size_t lineno; char[] line; for(lineno = 1; !eof(); lineno++) { line = readLine(); if (!line) break; result = dg(lineno,line); if (result) break; } return result; } } int main(char[][] args) { LineReader!(BufferedFile) f; if (args.length < 2) usage(); f = new LineReader!(BufferedFile)(); f.open(args[1],FileMode.In); foreach(char[] line; f) { writefln("READ[",line,"]"); } f.close(); f.open(args[1],FileMode.In); foreach(size_t lineno, char[] line; f) { writefln("READ[",lineno,"][",line,"]"); } f.close(); return 0; } void usage() { writefln("USAGE: test29 <file>"); writefln(""); exit(1); } Regan
Mar 27 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 void readfile(char[] fn)
 {
      File f = new File();
      char[] l;
      f.open(fn);
      while(!f.eof())
      {
          l = f.readLine();
          printf("line: %.*s\n", l);
      }
      f.close();
 }


one tiny improvement would be to combine the new File() with the open(fn) into new File(fn).
 That would be pretty much the ANSI C way... ieek :)... Is there any way 
 to avoid the latter method? And go the nicer D way, as in the first code 
 example?

Try this... class LineReader(Source) : Source {

That's pretty nice. Maybe opApply iterating over lines should be built into Stream. That would resemble the standard Perl style of reading a file line-by-line. I'll poke around with that. It should be pretty easy and it would make line processing with stream much easier to use.
Mar 27 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 27 Mar 2005 22:52:52 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 That would be pretty much the ANSI C way... ieek :)... Is there any way
 to avoid the latter method? And go the nicer D way, as in the first  
 code
 example?

Try this... class LineReader(Source) : Source {

That's pretty nice.

:)
 Maybe opApply iterating over lines should be built into Stream.

That would be nice.
 That would resemble the standard Perl style of reading a file
 line-by-line. I'll poke around with that. It should be pretty easy and it
 would make line processing with stream much easier to use.

Agreed. Regan
Mar 27 2005
prev sibling parent reply AEon <aeon2001 lycos.de> writes:
Trying to understand what you did, here. There seem to be several 
concepts I am still missing...

 import std.c.stdlib;
 import std.stream;
 import std.stdio;
 
 class LineReader(Source) : Source

You seem to be "shadowing" some parent class called Source?
 {
     int opApply(int delegate(inout char[]) dg)

Alas I still have no idea what "delegate" does, and why it needs to be used?
     {
         int result = 0;
         char[] line;
 
         while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?
             if (!line) break;

"if line == null" then break... no idea what this is good for.
             result = dg(line);
             if (result) break;

Don't understand these lines either. Can it be that you are filling up a "buffer" with all the lines of the stream, until you reach an empty line, to let foreach then scan that "buffer" like it does for any other array? If so that could possibly use up a lot of RAM?!
         }
        
         return result;
     }
     
     int opApply(int delegate(inout size_t, inout char[]) dg)
     {       
         int result = 0;
         size_t lineno;

Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).
         char[] line;
 
         for(lineno = 1; !eof(); lineno++)
         {
             line = readLine();
             if (!line) break;
             result = dg(lineno,line);
             if (result) break;
         }
                
         return result;
     }
 }

AFAICT you defined 2 "structures" that will let the user use foreach on "f.open" streams. One version that will "just" read lines another that will also let you retrieve the line numbers as well.
 int main(char[][] args)
 {

     LineReader!(BufferedFile) f;
     f = new LineReader!(BufferedFile)();

Can be reduced to: LineReader!(BufferedFile) f = new LineReader!(BufferedFile)(); making the equivalent coding to File f = new File(); more obvious. IOW you seem to have defined a new stream?
     if (args.length < 2) usage();   
     
     f.open(args[1],FileMode.In);
     foreach(char[] line; f)

Is this default behavior? I.e. that foreach can parse streams? AFAICT this is the the new speciality of your stream, right? Very nice.
     {
         writefln("READ[",line,"]");
     }
     f.close();
     
     f.open(args[1],FileMode.In);
     foreach(size_t lineno, char[] line; f)

Neat.
     {
         writefln("READ[",lineno,"][",line,"]");
     }
     f.close();
     
     return 0;   
 }

I noted when testing this code, that it will only read the lines of a stream until an empty line is encountered. Is this indeed intended? AEon
Mar 28 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"AEon" <aeon2001 lycos.de> wrote in message 
news:d290lj$1ukr$1 digitaldaemon.com...
 Trying to understand what you did, here. There seem to be several concepts 
 I am still missing...

 import std.c.stdlib;
 import std.stream;
 import std.stdio;

 class LineReader(Source) : Source

You seem to be "shadowing" some parent class called Source?

The class is templatized. It is a way of subclassing any stream subclass. I think it would also work to do class LineReader(Source : Stream) : Source to force the class Source to be a Stream or Stream subclass.
 {
     int opApply(int delegate(inout char[]) dg)

Alas I still have no idea what "delegate" does, and why it needs to be used?

opApply is used to implement 'foreach' in classes. See http://www.digitalmars.com/d/statement.html#foreach Also for info about delegate see http://www.digitalmars.com/d/function.html
     {
         int result = 0;
         char[] line;

         while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?

It subclasses Stream.
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the foreach.
             result = dg(line);
             if (result) break;

Don't understand these lines either.

This is part of the foreach magic.
 Can it be that you are filling up a "buffer" with all the lines of the 
 stream, until you reach an empty line, to let foreach then scan that 
 "buffer" like it does for any other array? If so that could possibly use 
 up a lot of RAM?!

         }
        return result;
     }
     int opApply(int delegate(inout size_t, inout char[]) dg)
     {       int result = 0;
         size_t lineno;

Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).

on 32 bit machine size_t is uint. On 64 bit it is ulong.
Mar 28 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 28 Mar 2005 13:43:08 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "AEon" <aeon2001 lycos.de> wrote in message
 news:d290lj$1ukr$1 digitaldaemon.com...
 Trying to understand what you did, here. There seem to be several  
 concepts
 I am still missing...

 class LineReader(Source) : Source

You seem to be "shadowing" some parent class called Source?

The class is templatized. It is a way of subclassing any stream subclass. I think it would also work to do class LineReader(Source : Stream) : Source to force the class Source to be a Stream or Stream subclass.

Good point, that is probably more correct.
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the foreach.

I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true. Regan
Mar 28 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 29 Mar 2005 10:39:49 +1200, Regan Heath wrote:


[snip]

 
 I think readLine is broken. It needs to return "" and not null.
 The difference being that "" has a non null "line.ptr" and "line is null"  
 is not true.

I've mentioned this before. D can not guarantee that a coder will always be able to distinguish between an empty line and an uninitialized line. I believe the two are distinct and useful idioms, and I know that it is theoretically possible, but sometimes when you pass a "", it gets received as null; however not in all situations. :-( -- Derek Parnell Melbourne, Australia http://www.dsource.org/projects/build v1.16 released 29/03/2005 9:24:10 AM
Mar 28 2005
prev sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the foreach.

I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.

IMO the right way to check if a string is empty is asking if the length is 0. Setting an array's length to 0 automatically sets the ptr to null. So relying on any specific behavior of the ptr of a 0 length array is dangerous at best (since it would rely on always slicing to resize). For example the statement str.length = str.length; does nothing if length > 0 and sets the ptr to null if length == 0. One can argue about D's behavior about nulling the ptr but that's the current situation. Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.
Mar 28 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the foreach.

I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.

IMO the right way to check if a string is empty is asking if the length is 0.

No. You cannot tell empty from null with length, eg. char[] isnull = null; char[] isempty = ""; assert(isnull.length == 0); assert(isempty.length == 0); compile, run, no asserts.
 Setting an array's length to 0 automatically sets the ptr to null. So
 relying on any specific behavior of the ptr of a 0 length array is  
 dangerous at best (since it would rely on always slicing to resize).

I agree. I currently use "is" or "===" to tell them apart. eg. char[] isnull = null; char[] isempty = ""; assert(isnull === null); assert(isempty !== null); I, at first, suspected the behaviour above to be a side effect of D's behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be null for ""). If this behaviour were removed ptr would have 'nothing' to point at. However... char[] isempty; char[] test; test.length = 3; test[0] = 'a'; test[1] = 'b'; test[2] = 'c'; isempty = test[0..0]; assert(isempty.length == 0); assert(isempty !== null); it appears not, but, as you mention:
 For example the statement
   str.length = str.length;
 does nothing if length > 0 and sets the ptr to null if length == 0.

isempty.length = isempty.length; assert(isempty.length == 0); assert(isempty !== null); asserts on the 2nd assert statement as it has set the ptr to null.
 One can argue about D's behavior about nulling the ptr but that's the
 current situation.

Indeed. Setting length to 0, should IMO create an empty string, not un-assign or free the string. Setting the reference to null should un-assign or free the string. To be honest I don't really care what it does *so long as* I can tell an empty string (array assigned to something with length 0) apart from one that does not exist (unassigned array, init to null). The simple fact of the matter being that in some situations these two things need to be treated differently. In some cases an AA and the "in" operator can be used as a workaround, as "in" checks for existance. I didn't think of this idea immediately (someone else suggested it). It would be nice if the functionality was more immediately apparent. To clarify I don't want to make it harder to treat them the same, which you can currently do with "if (length == 0)" I just want a guaranteed method of telling them apart.
 Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.

If the array ptr is null the result will be null, right? I don't see a problem with this. Regan
Mar 28 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsodiv9b023k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the foreach.

I think readLine is broken. It needs to return "" and not null. The difference being that "" has a non null "line.ptr" and "line is null" is not true.

IMO the right way to check if a string is empty is asking if the length is 0.

No. You cannot tell empty from null with length, eg. char[] isnull = null; char[] isempty = ""; assert(isnull.length == 0); assert(isempty.length == 0); compile, run, no asserts.

uhh - I think we have different definition of the word "empty". I take it you define empty to be non-null ptr and 0 length, correct? I take empty to mean anything that compares as equal to "". In D length==0 is equivalent to =="": str.length == 0 iff str == "" That is why I consider testing length to be the simplest/fastest way to test for "empty". For example int main() { char[] x; x = new char[5]; assert(x != ""); assert(x.length != 0); x = x[0..0]; assert(x == ""); assert(x.length == 0); char[] y = ""; assert(y == ""); assert(y.length == 0); char[] z = null; assert(y == ""); assert(y.length == 0); return 0; }
 Setting an array's length to 0 automatically sets the ptr to null. So
 relying on any specific behavior of the ptr of a 0 length array is 
 dangerous at best (since it would rely on always slicing to resize).

I agree. I currently use "is" or "===" to tell them apart. eg. char[] isnull = null; char[] isempty = ""; assert(isnull === null); assert(isempty !== null); I, at first, suspected the behaviour above to be a side effect of D's behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be null for ""). If this behaviour were removed ptr would have 'nothing' to point at. However... char[] isempty; char[] test; test.length = 3; test[0] = 'a'; test[1] = 'b'; test[2] = 'c'; isempty = test[0..0]; assert(isempty.length == 0); assert(isempty !== null); it appears not, but, as you mention:

It is also true that char[] isempty = ""; char[] isempty2 = test[0..0]; assert( isempty !== isempty2);
 For example the statement
   str.length = str.length;
 does nothing if length > 0 and sets the ptr to null if length == 0.

isempty.length = isempty.length; assert(isempty.length == 0); assert(isempty !== null); asserts on the 2nd assert statement as it has set the ptr to null.
 One can argue about D's behavior about nulling the ptr but that's the
 current situation.

Indeed. Setting length to 0, should IMO create an empty string, not un-assign or free the string. Setting the reference to null should un-assign or free the string. To be honest I don't really care what it does *so long as* I can tell an empty string (array assigned to something with length 0) apart from one that does not exist (unassigned array, init to null).

ah - here I can see what empty means to you. It is true our definitions of "empty" differ.
 The simple fact of the matter being that in some situations these two 
 things need to be treated differently.

That's what "is" and !== are for. But those are rare occasions I would bet.
 In some cases an AA and the "in" operator can be used as a workaround, as 
 "in" checks for existance. I didn't think of this idea immediately 
 (someone else suggested it). It would be nice if the functionality was 
 more immediately apparent.

 To clarify I don't want to make it harder to treat them the same, which 
 you can currently do with "if (length == 0)" I just want a guaranteed 
 method of telling them apart.

 Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.

If the array ptr is null the result will be null, right? I don't see a problem with this.

I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked) By the way, when would you like readLine to return a null string as opposed to an non-null-zero-length string?
Mar 28 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D? Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a construct which we use to access some data. This construct IMO needs the ability to (1) indicate the (non)existance of the data (2) give us access to the data. In C I would use a pointer eg. char *ptr = NULL; ptr = NULL; //no value exists ptr = ""; //value exists, it is empty. The humble pointer can indicate that no data exists, by pointing at NULL (which is defined to be an invalid address for data). The pointer can indicate the existing data by pointing at it's address. The data it points to may be empty if it "contains nothing" (what that means depends on the data itself). D's char[] is a reference not a pointer. A reference should be able to represent 1 & 2 above but it's implementation in D blurs the distinction between "non existant" and "existing but empty" due to it's relationship with null and it's behaviour when setting length to 0. In short: - A char[] should not go from "empty" to "non existant" without being explicitly assigned to "non existant" (AKA null). - "empty" (AKA "") should not compare equal to "non existant" (AKA null). It appears to me that the only reliable way in D to indicate "non existant" is to throw an exception. Perhaps this is acceptable, perhaps it's the D way and I simply have to get used to it. <snip>
 Perhaps it should be illegal to implicitly cast a dynamic array to a  
 ptr.

If the array ptr is null the result will be null, right? I don't see a problem with this.

I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked)

I don't think this is necessary.
 By the way, when would you like readLine to return a null string as  
 opposed to an non-null-zero-length string?

At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars. Regan
Mar 29 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 29 Mar 2005 22:47:53 +1200, Regan Heath wrote:

 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>  
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D? Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a construct which we use to access some data. This construct IMO needs the ability to (1) indicate the (non)existance of the data (2) give us access to the data. In C I would use a pointer eg. char *ptr = NULL; ptr = NULL; //no value exists ptr = ""; //value exists, it is empty. The humble pointer can indicate that no data exists, by pointing at NULL (which is defined to be an invalid address for data). The pointer can indicate the existing data by pointing at it's address. The data it points to may be empty if it "contains nothing" (what that means depends on the data itself). D's char[] is a reference not a pointer. A reference should be able to represent 1 & 2 above but it's implementation in D blurs the distinction between "non existant" and "existing but empty" due to it's relationship with null and it's behaviour when setting length to 0. In short: - A char[] should not go from "empty" to "non existant" without being explicitly assigned to "non existant" (AKA null). - "empty" (AKA "") should not compare equal to "non existant" (AKA null). It appears to me that the only reliable way in D to indicate "non existant" is to throw an exception. Perhaps this is acceptable, perhaps it's the D way and I simply have to get used to it. <snip>
 Perhaps it should be illegal to implicitly cast a dynamic array to a  
 ptr.

If the array ptr is null the result will be null, right? I don't see a problem with this.

I was suggesting making it illegal so that casually testing !line would be illegal. Instead it would have to be !line.ptr which makes it more obvious what is actually being tested (ie - the length is ignored and just the ptr is checked)

I don't think this is necessary.
 By the way, when would you like readLine to return a null string as  
 opposed to an non-null-zero-length string?

At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.

All of this is well said and presented. I'm in total agreement with this point of view. An empty string is a string that is empty. -- Derek Parnell Melbourne, Australia 29/03/2005 9:03:46 PM
Mar 29 2005
prev sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?

What you describe is ok with me but I don't think it maps well to D's arrays. To me I don't really look at existance or non-existance but instead the following two rules 1) all arrays have a well-defined length 2) arrays with non-zero length have a well-defined pointer One can tread carefully to preserve pointers with 0 length arrays but it takes effort.
 By the way, when would you like readLine to return a null string as 
 opposed to an non-null-zero-length string?

At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.

The foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.
Mar 29 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?

What you describe is ok with me but I don't think it maps well to D's arrays.

Exactly my point. It would only take a few small changes to "fix" the problem as I see it.
 To me I don't really look at existance or non-existance but instead the  
 following two rules
 1) all arrays have a well-defined length
 2) arrays with non-zero length have a well-defined pointer
 One can tread carefully to preserve pointers with 0 length arrays but it
 takes effort.

Indeed. So, how do you handle existance/non-existance?
 By the way, when would you like readLine to return a null string as
 opposed to an non-null-zero-length string?

At the end of file. readLine() - null means no lines "exist". readLine() - "" means a line "exists" but is "emtpy" of chars.

The foreach will stop automatically at eof. It's like a foreach stopping at the end of an array when it has no more elements. It doesn't run once more with null - it just stops.

Which foreach? My one? Assume now that I remove the eof() check. What happens now? Regan
Mar 29 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsoe3wkh323k2f5 nrage.netwin.co.nz...
 On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?

What you describe is ok with me but I don't think it maps well to D's arrays.

Exactly my point. It would only take a few small changes to "fix" the problem as I see it.

Java arrays have the semantics you describe. They distinguish between null/empty/non-empty and none compare as equal to the others. In fact even trying to compare a null array throws an exception much like trying to call opEquals on a null object reference throws an exception. It's a very reasonable thing to do. The main trouble with Java array semantics is that APIs wind up choosing between null and empty fairly randomly and so many Java array bugs are introduced by guessing some function returns "empty" when it in fact returns null. It's easier to focus instead on only distinguishing empty/non-empty, which is what D does. One can think up APIs where having a third, null, choice would be useful but almost all the time the practical uses of an array are covered by empty/non-empty.
 The foreach will stop automatically at eof. It's like a foreach stopping 
 at the end of an array when it has no more elements. It doesn't run once 
 more with null - it just stops.

Which foreach? My one? Assume now that I remove the eof() check. What happens now?

It would iterate forever just like any loop that doesn't have an ending condition.
Mar 29 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 29 Mar 2005 19:17:55 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoe3wkh323k2f5 nrage.netwin.co.nz...
 On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing." In my mind something is "empty" if it: a. contains nothing. b. exists. It cannot be "empty" if it contains something. It cannot be "empty" if it does not exist. So, my first question. How do I represent "non existant" in D?

What you describe is ok with me but I don't think it maps well to D's arrays.

Exactly my point. It would only take a few small changes to "fix" the problem as I see it.

Java arrays have the semantics you describe. They distinguish between null/empty/non-empty and none compare as equal to the others. In fact even trying to compare a null array throws an exception much like trying to call opEquals on a null object reference throws an exception. It's a very reasonable thing to do.

Ok.
 The main trouble with Java array semantics is that
 APIs wind up choosing between null and empty fairly randomly and so many
 Java array bugs are introduced by guessing some function returns "empty"
 when it in fact returns null.

I can see how if the situation does not call for a distinction between "exists but is empty" and "does not exist" then the programmer may choose either "" or null to indicate no value. The choice will likely be based on thier personal preference and/or "fear of null" (a phenomenon I have encountered before) I don't see this possibility as being a good reason to limit flexibility in this way.
 It's easier to focus instead on only
 distinguishing empty/non-empty, which is what D does.

You mean, limit flexibility for the sake of simplicity. I don't like it.
 One can think up APIs where having a third, null, choice would be useful  
 but almost all the time the practical uses of an array are covered by  
 empty/non-empty.

I think it depends on style and the sort of code you write as to whether the situations where a null choice is "required"* are common or not. Personally I come across them often. I also believe that some people just don't see the need for a distinction, i.e. the current readLine implementation. *(required is perhaps the wrong word, you can probably work around most situation, but the workaround generally is just that, and sub-optimal)
 The foreach will stop automatically at eof. It's like a foreach  
 stopping
 at the end of an array when it has no more elements. It doesn't run  
 once
 more with null - it just stops.

Which foreach? My one? Assume now that I remove the eof() check. What happens now?

It would iterate forever

Not if readLine were implemented the way I assumed it would have been.
 just like any loop that doesn't have an ending
 condition.

Bollocks. :) The ending condition is readLine() returning null (indicating no more lines "exist"). Regan
Mar 29 2005
prev sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 28 Mar 2005 15:25:57 +0200, AEon <aeon2001 lycos.de> wrote:
 Trying to understand what you did, here. There seem to be several  
 concepts I am still missing...

Ben has done a fairly good job of explaining it. I'll have a go too, the combination of our efforts will hopefully explain "everything". :)
 import std.c.stdlib;
 import std.stream;
 import std.stdio;
  class LineReader(Source) : Source

You seem to be "shadowing" some parent class called Source?

This technique is called a "Snap-On". I am creating a new template class "LineReader" which is a child class of an unspecified (at this stage) class. Later when I say: "LineReader!(BufferedFile) f;" it specifies that "Source" is "BufferedFile".
 {
     int opApply(int delegate(inout char[]) dg)

Alas I still have no idea what "delegate" does, and why it needs to be used?

A delegate is like a function pointer, except that a delegate points to a (non-static) class member function. So calling it is like calling a class member on a class. In this case the delegate is part of the "magic" that makes foreach work on a custom class like LineReader.
     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?

Because LineReader is a child class of BufferedFile, which is a stream. The readLine call above calls the readLine of the parent class BufferedFile.
             if (!line) break;

"if line == null" then break... no idea what this is good for.

I was trying to stop at the end of the file, it appears this stops on blank lines. IMO readLine is broken, it is returning null for a blank line, it should return "". The difference between null and "" in the case of char[] is that null has a null .ptr and "is null" is true, so... if (!line.ptr) break; if (line is null) break; statements should only fire when line is null and not "". But it appears readLine does not differentiate between null and "".
             result = dg(line);
             if (result) break;

Don't understand these lines either.

As Ben said, it's part of the foreach "magic", his links should explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.
 Can it be that you are filling up a "buffer" with all the lines of the  
 stream, until you reach an empty line, to let foreach then scan that  
 "buffer" like it does for any other array? If so that could possibly use  
 up a lot of RAM?!

No. I am reading one line at a time. When I call the delegate I am effectively executing the body of the foreach statement with the line I pass. Then I discard the line and read the next one. So only 1 line is in memory at a time.
         }
                return result;
     }
         int opApply(int delegate(inout size_t, inout char[]) dg)
     {               int result = 0;
         size_t lineno;

Why did you use size_t for lineno, would int now also work? (I tested this and it works fine to replace all size_t with int).

As Ben mentioned, size_t is either a 32 or 64 bit type depending on the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.
         char[] line;
          for(lineno = 1; !eof(); lineno++)
         {
             line = readLine();
             if (!line) break;
             result = dg(lineno,line);
             if (result) break;
         }
                        return result;
     }
 }

AFAICT you defined 2 "structures" that will let the user use foreach on "f.open" streams. One version that will "just" read lines another that will also let you retrieve the line numbers as well.

Not 2 structures in the sense of D structs but 2 methods allowing foreach on my new class LineReader, which extends BufferedFile (by adding the foreach ability).
 int main(char[][] args)
 {

     LineReader!(BufferedFile) f;
     f = new LineReader!(BufferedFile)();

Can be reduced to: LineReader!(BufferedFile) f = new LineReader!(BufferedFile)(); making the equivalent coding to File f = new File(); more obvious.

You could, I have chosen not to allocate the class till after my error checking, but then I could have moved "LineReader!(BufferedFile) f;" to after the error checking also.. I guess I'm used to C. :)
 IOW you seem to have defined a new stream?

Yes. I have extended/added foreach-ability to any Stream class.
     if (args.length < 2) usage();           f.open(args[1],FileMode.In);
     foreach(char[] line; f)

Is this default behavior? I.e. that foreach can parse streams? AFAICT this is the the new speciality of your stream, right? Very nice.

It's new speciality of my stream. I think we should add it to Streams though. In addition we could add foreach(char c; f) {} to read characters one at a time.
     {
         writefln("READ[",line,"]");
     }
     f.close();
         f.open(args[1],FileMode.In);
     foreach(size_t lineno, char[] line; f)

Neat.
     {
         writefln("READ[",lineno,"][",line,"]");
     }
     f.close();
         return 0;   }

I noted when testing this code, that it will only read the lines of a stream until an empty line is encountered. Is this indeed intended?

No it was not intended. IMO readLine is broken. Regan
Mar 28 2005
parent reply AEon <aeon2001 lycos.de> writes:
Regan Heath wrote (Ben read your feedback as well thanx):

     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?

Because LineReader is a child class of BufferedFile, which is a stream. The readLine call above calls the readLine of the parent class BufferedFile.

Ah... that is one of the things I really hate as a OOP beginner. It is very difficult to check where the heck certain "behavior" comes from. If the programmer is indeed fully aware of the parent classes, that may be clearer, but when I only see the "new" code, I find it very confusing. I am not even sure one *can* look up the original definition of the parent classes?
             result = dg(line);
             if (result) break;

Don't understand these lines either.

As Ben said, it's part of the foreach "magic", his links should explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.

Reminds me that I don't actually understand D, and that I only use certain code sniplets all over the place sofar. :)
 Why did you use size_t for lineno, would int now also work? (I tested  
 this and it works fine to replace all size_t with int).

As Ben mentioned, size_t is either a 32 or 64 bit type depending on the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.

Aha... IIRC there was something like that in ANSI C as well... I never trusted it ;)... so size_t is something like a special optimization case. I.e. when do you decide to use good old int, and when do you feel size_t would be a better choice?
 IOW you seem to have defined a new stream?

Yes. I have extended/added foreach-ability to any Stream class.

Neat indeed. BTW, I decided to go the simple way: File lg = open_Read_Log( glb.log ); File mg = open_Write_Mlg( metafile ); File open_Read_Log( char[] logfile ) { char[13] warn = "open_Read_Log"; if( ! std.file.exists(logfile) ) { Err(warn, "Can't open *read* your log file... '"~logfile~"'", "Ensure log file exists and double check path!"); exit(1); } // Define/create "handle" for logfile READ File lg = new File( logfile, FileMode.In ); // If logfile open error: "Error: file '...' not found" return lg; } etc. What surprised me in open_Read_Log(), when comparing it to my ANSI C code: if (fgets(line, M2AXCHR, link)==NULL){ if(ferror(link)!=0){ puts("Error during log read..."); exit(1); } clearerr(link); break;} You can check for file existence. But you do not seem to be able to handle "new File( logfile, FileMode.In )" errors... i.e. if something happens D, will exit with an internal Error message. Presumably one could "catch" such errors to provide own error messages? Same seems to be the case with while( ! lg.eof() ) { line = lg.readLine(); } Should a readLine() error occur, then D trows a internal Error message. I am not sure I *really* want to catch errors, should this be possible in the above 2 cases. But maybe that could be useful? AEon
Mar 28 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 29 Mar 2005 05:35:07 +0200, AEon <aeon2001 lycos.de> wrote:
 Regan Heath wrote (Ben read your feedback as well thanx):

     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?

stream. The readLine call above calls the readLine of the parent class BufferedFile.

Ah... that is one of the things I really hate as a OOP beginner. It is very difficult to check where the heck certain "behavior" comes from. If the programmer is indeed fully aware of the parent classes, that may be clearer, but when I only see the "new" code, I find it very confusing. I am not even sure one *can* look up the original definition of the parent classes?

In this case you can look in dmd\src\phobos\std\stream.d for the class definition of BufferedFile. You may be interested in an old thread on method name resolution: http://www.digitalmars.com/d/archives/digitalmars/D/6928.html It's kinda involved but relevant to your comments above as the method name resolution affects the behaviour of a derived class. The idea being D's method name resolution makes it simpler/explicit WRT the behaviour of classes with overloaded methods.
             result = dg(line);
             if (result) break;

Don't understand these lines either.

explain it. If not, let us know how the docs are deficient and hopefully someone can improve them.

Reminds me that I don't actually understand D, and that I only use certain code sniplets all over the place sofar. :)

I wouldn't worry overmuch. I still find it hard to remember how to code things like opApply, I copy/paste from the docs and then modify each time I do it.
 Why did you use size_t for lineno, would int now also work? (I tested   
 this and it works fine to replace all size_t with int).

the underlying OS/processor. I believe the idea is that using it chooses the most "sensible" type for holding "size" values on the current OS/processor.

Aha... IIRC there was something like that in ANSI C as well... I never trusted it ;)... so size_t is something like a special optimization case. I.e. when do you decide to use good old int, and when do you feel size_t would be a better choice?

Good question. I would use 'int' when the size of the type is important, i.e. I need 32 bits. I would use size_t when the size is unimportant, so long as it is "big enough".
 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal Error  
 message.

 Presumably one could "catch" such errors to provide own error messages?

Yes. try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }
 Same seems to be the case with

      while( ! lg.eof() )	
      {
 	line = lg.readLine();
      }

 Should a readLine() error occur, then D trows a internal Error message.

try { while( ! lg.eof() ) { line = lg.readLine(); } } catch (ReadException e) { writefln("READ ERROR - ",e); }
 I am not sure I *really* want to catch errors, should this be possible  
 in the above 2 cases. But maybe that could be useful?

Exceptions are the recommended error handling mechanism for D. The argument/confusion centers around what is worthy of an exception and what is not. For example IMO in the code above not being able to open a file is exceptional (you have assumed it exists by opening in FileMode.In), but, reaching the end of the file is not exceptional as it's guaranteed to happen eventually. Uncaught exceptions are automatically handled by the default handler, for trivial applications allowing it to handle your exceptions (like the failure to open a file) might be exactly what you want. It's your choice. Regan
Mar 29 2005
parent reply AEon <aeon2001 lycos.de> writes:
Regan Heath wrote:

 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal 
 Error  message.

 Presumably one could "catch" such errors to provide own error messages?

Yes. try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }

Have read several examples by now. Is there a complete list of catch "keywords"? The D documentions mentions a few, but probably not all? e.g. catch (ArrayBoundsError) catch (Object o) catch (std.asserterror.AssertError ae)
 Same seems to be the case with

      while( ! lg.eof() )   
      {
     line = lg.readLine();
      }

 Should a readLine() error occur, then D trows a internal Error message.

try { while( ! lg.eof() ) { line = lg.readLine(); } } catch (ReadException e) { writefln("READ ERROR - ",e); }

Ahh... info like that could be helpful in the official docs.
 I am not sure I *really* want to catch errors, should this be 
 possible  in the above 2 cases. But maybe that could be useful?

Exceptions are the recommended error handling mechanism for D. The argument/confusion centers around what is worthy of an exception and what is not. For example IMO in the code above not being able to open a file is exceptional (you have assumed it exists by opening in FileMode.In), but, reaching the end of the file is not exceptional as it's guaranteed to happen eventually. Uncaught exceptions are automatically handled by the default handler, for trivial applications allowing it to handle your exceptions (like the failure to open a file) might be exactly what you want. It's your choice.

Well in the above examples it would basically just give me the chance to write out my own messages. But since these cases are serious, there is nothing much one could save. AEon
Mar 29 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 29 Mar 2005 19:33:30 +0200, AEon <aeon2001 lycos.de> wrote:
 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal  
 Error  message.

 Presumably one could "catch" such errors to provide own error messages?

try { File f = new File(logfile, FileMode.In); } catch (OpenException e) { writefln("OPEN ERROR - ",e); }

Have read several examples by now. Is there a complete list of catch "keywords"? The D documentions mentions a few, but probably not all? e.g. catch (ArrayBoundsError) catch (Object o) catch (std.asserterror.AssertError ae)

Each "catch keyword" is a class derived from the Exception or Error classes. They are defined in the modules that use them. I agree it would be nice to have a complete list. Eventually I can imagine a documentation generator listing all the exceptions that can be thrown by a function. Regan
Mar 29 2005