digitalmars.D.learn - Reading large files, writing large files?

AEon (33/33) Mar 27 2005 Rethinking the way I normally handle files, since now I am faced with

Regan Heath (68/100) Mar 27 2005 Try this...

Ben Hinkle (7/25) Mar 27 2005 [snip]

Regan Heath (6/20) Mar 27 2005 :)

AEon (27/88) Mar 28 2005 You seem to be "shadowing" some parent class called Source?

Ben Hinkle (14/51) Mar 28 2005 The class is templatized. It is a way of subclassing any stream subclass...

Regan Heath (7/27) Mar 28 2005 Good point, that is probably more correct.

Derek Parnell (13/16) Mar 28 2005 On Tue, 29 Mar 2005 10:39:49 +1200, Regan Heath wrote:
Ben Hinkle (10/20) Mar 28 2005 IMO the right way to check if a string is empty is asking if the length ...

Regan Heath (52/75) Mar 28 2005 No. You cannot tell empty from null with length, eg.

Ben Hinkle (38/110) Mar 28 2005 uhh - I think we have different definition of the word "empty". I take i...

Regan Heath (39/51) Mar 29 2005 "empty" - "Holding or containing nothing."

Derek Parnell (8/76) Mar 29 2005 All of this is well said and presented. I'm in total agreement with this
Ben Hinkle (12/27) Mar 29 2005 What you describe is ok with me but I don't think it maps well to D's

Regan Heath (8/42) Mar 29 2005 Exactly my point. It would only take a few small changes to "fix" the

Ben Hinkle (15/43) Mar 29 2005 Java arrays have the semantics you describe. They distinguish between

Regan Heath (23/76) Mar 29 2005 Ok.

Regan Heath (49/126) Mar 28 2005 Ben has done a fairly good job of explaining it. I'll have a go too, the...

AEon (51/82) Mar 28 2005 Ah... that is one of the things I really hate as a OOP beginner. It is

Regan Heath (42/93) Mar 29 2005 In this case you can look in dmd\src\phobos\std\stream.d for the class

AEon (11/60) Mar 29 2005 Have read several examples by now. Is there a complete list of catch

Regan Heath (6/24) Mar 29 2005 Each "catch keyword" is a class derived from the Exception or Error

AEon <aeon2001 lycos.de> writes:

Rethinking the way I normally handle files, since now I am faced with 
possibly very huge (100MB and) log files. Dito I need to save large log 
files. So it does not seem to be a good idea to use my, sofar preferred 
method:

// Ensure file exists
if( ! std.file.exists(cfgPathFile) )
...

// Read complete cfg file into array, removes \r\n via splitlines()
char[][] cfgText = std.string.splitlines( cast(char[]) 
std.file.read(cfgPathFile) );

Etc... I have very much come to like splitlines, and read, but with
100 MB log files, loading all that into RAM may turn out ugly?


Let's say I'd ignore the RAM issue for a moment, how would I properly 
use std.file.write() to write into a file?


The method I fear will need to be applied for such huge files is 
something like this (posted by Martin in this newsgroup):

import std.stream;

void readfile(char[] fn)
{
     File f = new File();
     char[] l;
     f.open(fn);
     while(!f.eof())
     {
         l = f.readLine();
         printf("line: %.*s\n", l);
     }
     f.close();
}

That would be pretty much the ANSI C way... ieek :)... Is there any way 
to avoid the latter method? And go the nicer D way, as in the first code 
example?

AEon

Mar 27 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 28 Mar 2005 05:13:36 +0200, AEon <aeon2001 lycos.de> wrote:
 Rethinking the way I normally handle files, since now I am faced with  
 possibly very huge (100MB and) log files. Dito I need to save large log  
 files. So it does not seem to be a good idea to use my, sofar preferred  
 method:

 // Ensure file exists
 if( ! std.file.exists(cfgPathFile) )
 ...

 // Read complete cfg file into array, removes \r\n via splitlines()
 char[][] cfgText = std.string.splitlines( cast(char[])  
 std.file.read(cfgPathFile) );

 Etc... I have very much come to like splitlines, and read, but with
 100 MB log files, loading all that into RAM may turn out ugly?


 Let's say I'd ignore the RAM issue for a moment, how would I properly  
 use std.file.write() to write into a file?


 The method I fear will need to be applied for such huge files is  
 something like this (posted by Martin in this newsgroup):

 import std.stream;

 void readfile(char[] fn)
 {
      File f = new File();
      char[] l;
      f.open(fn);
      while(!f.eof())
      {
          l = f.readLine();
          printf("line: %.*s\n", l);
      }
      f.close();
 }

 That would be pretty much the ANSI C way... ieek :)... Is there any way  
 to avoid the latter method? And go the nicer D way, as in the first code  
 example?

Try this...

import std.c.stdlib;
import std.stream;
import std.stdio;

class LineReader(Source) : Source
{
	int opApply(int delegate(inout char[]) dg)
	{
		int result = 0;
		char[] line;

		while(!eof())
		{
			line = readLine();
			if (!line) break;
			result = dg(line);
			if (result) break;
		}
		
		return result;
	}
	
	int opApply(int delegate(inout size_t, inout char[]) dg)
	{		
		int result = 0;
		size_t lineno;
		char[] line;

		for(lineno = 1; !eof(); lineno++)
		{
			line = readLine();
			if (!line) break;
			result = dg(lineno,line);
			if (result) break;
		}
				
		return result;
	}
}

int main(char[][] args)
{
	LineReader!(BufferedFile) f;
	
	if (args.length < 2) usage();	
	f = new LineReader!(BufferedFile)();
	
	f.open(args[1],FileMode.In);
	foreach(char[] line; f)
	{
		writefln("READ[",line,"]");
	}
	f.close();
	
	f.open(args[1],FileMode.In);
	foreach(size_t lineno, char[] line; f)
	{
		writefln("READ[",lineno,"][",line,"]");
	}
	f.close();
	
	return 0;	
}

void usage()
{
	writefln("USAGE: test29 <file>");
	writefln("");
	exit(1);
}

Regan

Mar 27 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 void readfile(char[] fn)
 {
      File f = new File();
      char[] l;
      f.open(fn);
      while(!f.eof())
      {
          l = f.readLine();
          printf("line: %.*s\n", l);
      }
      f.close();
 }


one tiny improvement would be to combine the new File() with the open(fn) 
into new File(fn).

 That would be pretty much the ANSI C way... ieek :)... Is there any way 
 to avoid the latter method? And go the nicer D way, as in the first code 
 example?

 Try this...
 class LineReader(Source) : Source
 {

[snip]

That's pretty nice. Maybe opApply iterating over lines should be built into 
Stream. That would resemble the standard Perl style of reading a file 
line-by-line. I'll poke around with that. It should be pretty easy and it 
would make line processing with stream much easier to use.

Mar 27 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 27 Mar 2005 22:52:52 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 That would be pretty much the ANSI C way... ieek :)... Is there any way
 to avoid the latter method? And go the nicer D way, as in the first  
 code
 example?

 Try this...
 class LineReader(Source) : Source
 {

 [snip]

 That's pretty nice.

:)

 Maybe opApply iterating over lines should be built into Stream.

That would be nice.

 That would resemble the standard Perl style of reading a file
 line-by-line. I'll poke around with that. It should be pretty easy and it
 would make line processing with stream much easier to use.

Agreed.

Regan

Mar 27 2005

AEon <aeon2001 lycos.de> writes:

Trying to understand what you did, here. There seem to be several 
concepts I am still missing...

 import std.c.stdlib;
 import std.stream;
 import std.stdio;
 
 class LineReader(Source) : Source

You seem to be "shadowing" some parent class called Source?

 {
     int opApply(int delegate(inout char[]) dg)

Alas I still have no idea what "delegate" does, and why it needs to be used?

     {
         int result = 0;
         char[] line;
 
         while(!eof())
         {
             line = readLine();

How come readLine() knows of the stream?

             if (!line) break;

"if line == null" then break... no idea what this is good for.

             result = dg(line);
             if (result) break;

Don't understand these lines either.

Can it be that you are filling up a "buffer" with all the lines of the 
stream, until you reach an empty line, to let foreach then scan that 
"buffer" like it does for any other array? If so that could possibly use 
up a lot of RAM?!

         }
        
         return result;
     }
     
     int opApply(int delegate(inout size_t, inout char[]) dg)
     {       
         int result = 0;
         size_t lineno;

Why did you use size_t for lineno, would int now also work? (I tested 
this and it works fine to replace all size_t with int).

         char[] line;
 
         for(lineno = 1; !eof(); lineno++)
         {
             line = readLine();
             if (!line) break;
             result = dg(lineno,line);
             if (result) break;
         }
                
         return result;
     }
 }

AFAICT you defined 2 "structures" that will let the user use foreach on 
"f.open" streams. One version that will "just" read lines another that 
will also let you retrieve the line numbers as well.


 int main(char[][] args)
 {

     LineReader!(BufferedFile) f;
     f = new LineReader!(BufferedFile)();

Can be reduced to:

     LineReader!(BufferedFile) f = new LineReader!(BufferedFile)();

making the equivalent coding to

     File f = new File();

more obvious. IOW you seem to have defined a new stream?

     if (args.length < 2) usage();   
     
     f.open(args[1],FileMode.In);
     foreach(char[] line; f)

Is this default behavior? I.e. that foreach can parse streams? AFAICT 
this is the the new speciality of your stream, right? Very nice.

     {
         writefln("READ[",line,"]");
     }
     f.close();
     
     f.open(args[1],FileMode.In);
     foreach(size_t lineno, char[] line; f)

Neat.

     {
         writefln("READ[",lineno,"][",line,"]");
     }
     f.close();
     
     return 0;   
 }


I noted when testing this code, that it will only read the lines of a 
stream until an empty line is encountered. Is this indeed intended?

AEon

Mar 28 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"AEon" <aeon2001 lycos.de> wrote in message 
news:d290lj$1ukr$1 digitaldaemon.com...
 Trying to understand what you did, here. There seem to be several concepts 
 I am still missing...

 import std.c.stdlib;
 import std.stream;
 import std.stdio;

 class LineReader(Source) : Source

 You seem to be "shadowing" some parent class called Source?

The class is templatized. It is a way of subclassing any stream subclass. I 
think it would also work to do
class LineReader(Source : Stream) : Source
to force the class Source to be a Stream or Stream subclass.

 {
     int opApply(int delegate(inout char[]) dg)

 Alas I still have no idea what "delegate" does, and why it needs to be 
 used?

opApply is used to implement 'foreach' in classes. See 
http://www.digitalmars.com/d/statement.html#foreach
Also for info about delegate see http://www.digitalmars.com/d/function.html

     {
         int result = 0;
         char[] line;

         while(!eof())
         {
             line = readLine();

 How come readLine() knows of the stream?

It subclasses Stream.

             if (!line) break;

 "if line == null" then break... no idea what this is good for.

I think this isn't needed. I think it probably is why blank lines stop the 
foreach.

             result = dg(line);
             if (result) break;

 Don't understand these lines either.

This is part of the foreach magic.

 Can it be that you are filling up a "buffer" with all the lines of the 
 stream, until you reach an empty line, to let foreach then scan that 
 "buffer" like it does for any other array? If so that could possibly use 
 up a lot of RAM?!

         }
        return result;
     }
     int opApply(int delegate(inout size_t, inout char[]) dg)
     {       int result = 0;
         size_t lineno;

 Why did you use size_t for lineno, would int now also work? (I tested this 
 and it works fine to replace all size_t with int).

on 32 bit machine size_t is uint. On 64 bit it is ulong.

Mar 28 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 28 Mar 2005 13:43:08 -0500, Ben Hinkle <bhinkle mathworks.com>  
wrote:
 "AEon" <aeon2001 lycos.de> wrote in message
 news:d290lj$1ukr$1 digitaldaemon.com...
 Trying to understand what you did, here. There seem to be several  
 concepts
 I am still missing...

 class LineReader(Source) : Source

 You seem to be "shadowing" some parent class called Source?

 The class is templatized. It is a way of subclassing any stream  
 subclass. I
 think it would also work to do
 class LineReader(Source : Stream) : Source
 to force the class Source to be a Stream or Stream subclass.

Good point, that is probably more correct.

             if (!line) break;

 "if line == null" then break... no idea what this is good for.

 I think this isn't needed. I think it probably is why blank lines stop  
 the
 foreach.

I think readLine is broken. It needs to return "" and not null.
The difference being that "" has a non null "line.ptr" and "line is null"  
is not true.

Regan

Mar 28 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 29 Mar 2005 10:39:49 +1200, Regan Heath wrote:


[snip]

 
 I think readLine is broken. It needs to return "" and not null.
 The difference being that "" has a non null "line.ptr" and "line is null"  
 is not true.

I've mentioned this before. D can not guarantee that a coder will always be
able to distinguish between an empty line and an uninitialized line. I
believe the two are distinct and useful idioms, and I know that it is
theoretically possible, but sometimes when you pass a "", it gets received
as null; however not in all situations. :-(

-- 
Derek Parnell
Melbourne, Australia
http://www.dsource.org/projects/build v1.16 released
29/03/2005 9:24:10 AM

Mar 28 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

             if (!line) break;

 "if line == null" then break... no idea what this is good for.

 I think this isn't needed. I think it probably is why blank lines stop 
 the
 foreach.

 I think readLine is broken. It needs to return "" and not null.
 The difference being that "" has a non null "line.ptr" and "line is null" 
 is not true.

IMO the right way to check if a string is empty is asking if the length is 
0. Setting an array's length to 0 automatically sets the ptr to null. So 
relying on any specific behavior of the ptr of a 0 length array is dangerous 
at best (since it would rely on always slicing to resize). For example the 
statement
  str.length = str.length;
does nothing if length > 0 and sets the ptr to null if length == 0.
One can argue about D's behavior about nulling the ptr but that's the 
current situation. Perhaps it should be illegal to implicitly cast a dynamic 
array to a ptr.

Mar 28 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
             if (!line) break;

 "if line == null" then break... no idea what this is good for.

 I think this isn't needed. I think it probably is why blank lines stop
 the
 foreach.

 I think readLine is broken. It needs to return "" and not null.
 The difference being that "" has a non null "line.ptr" and "line is  
 null"
 is not true.

 IMO the right way to check if a string is empty is asking if the length  
 is 0.

No. You cannot tell empty from null with length, eg.

char[] isnull = null;
char[] isempty = "";

assert(isnull.length == 0);
assert(isempty.length == 0);

compile, run, no asserts.

 Setting an array's length to 0 automatically sets the ptr to null. So
 relying on any specific behavior of the ptr of a 0 length array is  
 dangerous at best (since it would rely on always slicing to resize).

I agree. I currently use "is" or "===" to tell them apart. eg.

char[] isnull = null;
char[] isempty = "";

assert(isnull === null);
assert(isempty !== null);

I, at first, suspected the behaviour above to be a side effect of D's  
behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be  
null for ""). If this behaviour were removed ptr would have 'nothing' to  
point at. However...

char[] isempty;
char[] test;

test.length = 3;
test[0] = 'a';
test[1] = 'b';
test[2] = 'c';
	
isempty = test[0..0];
	
assert(isempty.length == 0);
assert(isempty !== null);

it appears not, but, as you mention:

 For example the statement
   str.length = str.length;
 does nothing if length > 0 and sets the ptr to null if length == 0.

isempty.length = isempty.length;
	
assert(isempty.length == 0);
assert(isempty !== null);

asserts on the 2nd assert statement as it has set the ptr to null.

 One can argue about D's behavior about nulling the ptr but that's the
 current situation.

Indeed. Setting length to 0, should IMO create an empty string, not  
un-assign or free the string. Setting the reference to null should  
un-assign or free the string.

To be honest I don't really care what it does *so long as* I can tell an  
empty string (array assigned to something with length 0) apart from one  
that does not exist (unassigned array, init to null).

The simple fact of the matter being that in some situations these two  
things need to be treated differently.

In some cases an AA and the "in" operator can be used as a workaround, as  
"in" checks for existance. I didn't think of this idea immediately  
(someone else suggested it). It would be nice if the functionality was  
more immediately apparent.

To clarify I don't want to make it harder to treat them the same, which  
you can currently do with "if (length == 0)" I just want a guaranteed  
method of telling them apart.

 Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.

If the array ptr is null the result will be null, right? I don't see a  
problem with this.

Regan

Mar 28 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsodiv9b023k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 19:05:39 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
             if (!line) break;

 "if line == null" then break... no idea what this is good for.

 I think this isn't needed. I think it probably is why blank lines stop
 the
 foreach.

 I think readLine is broken. It needs to return "" and not null.
 The difference being that "" has a non null "line.ptr" and "line is 
 null"
 is not true.

 IMO the right way to check if a string is empty is asking if the length 
 is 0.

 No. You cannot tell empty from null with length, eg.

 char[] isnull = null;
 char[] isempty = "";

 assert(isnull.length == 0);
 assert(isempty.length == 0);

 compile, run, no asserts.

uhh - I think we have different definition of the word "empty". I take it 
you define empty to be non-null ptr and 0 length, correct? I take empty to 
mean anything that compares as equal to "". In D length==0 is equivalent to 
=="":
 str.length == 0 iff str == ""
That is why I consider testing length to be the simplest/fastest way to test 
for "empty". For example
int main() {
  char[] x;
  x = new char[5];
  assert(x != "");
  assert(x.length != 0);

  x = x[0..0];
  assert(x == "");
  assert(x.length == 0);

  char[] y = "";
  assert(y == "");
  assert(y.length == 0);

  char[] z = null;
  assert(y == "");
  assert(y.length == 0);

  return 0;
}


 Setting an array's length to 0 automatically sets the ptr to null. So
 relying on any specific behavior of the ptr of a 0 length array is 
 dangerous at best (since it would rely on always slicing to resize).

 I agree. I currently use "is" or "===" to tell them apart. eg.

 char[] isnull = null;
 char[] isempty = "";

 assert(isnull === null);
 assert(isempty !== null);

 I, at first, suspected the behaviour above to be a side effect of D's 
 behaviour of appending \0 to hard-coded/static strings (thus ptr cannot be 
 null for ""). If this behaviour were removed ptr would have 'nothing' to 
 point at. However...

 char[] isempty;
 char[] test;

 test.length = 3;
 test[0] = 'a';
 test[1] = 'b';
 test[2] = 'c';

 isempty = test[0..0];

 assert(isempty.length == 0);
 assert(isempty !== null);

 it appears not, but, as you mention:

It is also true that
char[] isempty = "";
char[] isempty2 = test[0..0];
assert( isempty !== isempty2);

 For example the statement
   str.length = str.length;
 does nothing if length > 0 and sets the ptr to null if length == 0.

 isempty.length = isempty.length;

 assert(isempty.length == 0);
 assert(isempty !== null);

 asserts on the 2nd assert statement as it has set the ptr to null.

 One can argue about D's behavior about nulling the ptr but that's the
 current situation.

 Indeed. Setting length to 0, should IMO create an empty string, not 
 un-assign or free the string. Setting the reference to null should 
 un-assign or free the string.

 To be honest I don't really care what it does *so long as* I can tell an 
 empty string (array assigned to something with length 0) apart from one 
 that does not exist (unassigned array, init to null).

ah - here I can see what empty means to you. It is true our definitions of 
"empty" differ.

 The simple fact of the matter being that in some situations these two 
 things need to be treated differently.

That's what "is" and !== are for. But those are rare occasions I would bet.

 In some cases an AA and the "in" operator can be used as a workaround, as 
 "in" checks for existance. I didn't think of this idea immediately 
 (someone else suggested it). It would be nice if the functionality was 
 more immediately apparent.

 To clarify I don't want to make it harder to treat them the same, which 
 you can currently do with "if (length == 0)" I just want a guaranteed 
 method of telling them apart.

 Perhaps it should be illegal to implicitly cast a dynamic array to a ptr.

 If the array ptr is null the result will be null, right? I don't see a 
 problem with this.

I was suggesting making it illegal so that casually testing !line would be 
illegal. Instead it would have to be !line.ptr which makes it more obvious 
what is actually being tested (ie - the length is ignored and just the ptr 
is checked)

By the way, when would you like readLine to return a null string as opposed 
to an non-null-zero-length string?

Mar 28 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

"empty" - "Holding or containing nothing."

In my mind something is "empty" if it:
   a. contains nothing.
   b. exists.

It cannot be "empty" if it contains something.
It cannot be "empty" if it does not exist.

So, my first question. How do I represent "non existant" in D?

Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a  
construct which we use to access some data. This construct IMO needs the  
ability to (1) indicate the (non)existance of the data (2) give us access  
to the data.

In C I would use a pointer eg.

char *ptr = NULL;
ptr = NULL;  //no value exists
ptr = "";    //value exists, it is empty.

The humble pointer can indicate that no data exists, by pointing at NULL  
(which is defined to be an invalid address for data). The pointer can  
indicate the existing data by pointing at it's address. The data it points  
to may be empty if it "contains nothing" (what that means depends on the  
data itself).

D's char[] is a reference not a pointer. A reference should be able to  
represent 1 & 2 above but it's implementation in D blurs the distinction  
between "non existant" and "existing but empty" due to it's relationship  
with null and it's behaviour when setting length to 0.

In short:
- A char[] should not go from "empty" to "non existant" without being  
explicitly assigned to "non existant" (AKA null).
- "empty" (AKA "") should not compare equal to "non existant" (AKA null).

It appears to me that the only reliable way in D to indicate "non  
existant" is to throw an exception. Perhaps this is acceptable, perhaps  
it's the D way and I simply have to get used to it.

<snip>

 Perhaps it should be illegal to implicitly cast a dynamic array to a  
 ptr.

 If the array ptr is null the result will be null, right? I don't see a
 problem with this.

 I was suggesting making it illegal so that casually testing !line would  
 be illegal. Instead it would have to be !line.ptr which makes it more  
 obvious what is actually being tested (ie - the length is ignored and  
 just the ptr is checked)

I don't think this is necessary.

 By the way, when would you like readLine to return a null string as  
 opposed to an non-null-zero-length string?

At the end of file.

readLine() - null means no lines "exist".
readLine() - "" means a line "exists" but is "emtpy" of chars.

Regan

Mar 29 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 29 Mar 2005 22:47:53 +1200, Regan Heath wrote:

 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>  
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

 
 "empty" - "Holding or containing nothing."
 
 In my mind something is "empty" if it:
    a. contains nothing.
    b. exists.
 
 It cannot be "empty" if it contains something.
 It cannot be "empty" if it does not exist.
 
 So, my first question. How do I represent "non existant" in D?
 
 Some abstract ideas/thoughts. A pointer/reference/handle/whatever is a  
 construct which we use to access some data. This construct IMO needs the  
 ability to (1) indicate the (non)existance of the data (2) give us access  
 to the data.
 
 In C I would use a pointer eg.
 
 char *ptr = NULL;
 ptr = NULL;  //no value exists
 ptr = "";    //value exists, it is empty.
 
 The humble pointer can indicate that no data exists, by pointing at NULL  
 (which is defined to be an invalid address for data). The pointer can  
 indicate the existing data by pointing at it's address. The data it points  
 to may be empty if it "contains nothing" (what that means depends on the  
 data itself).
 
 D's char[] is a reference not a pointer. A reference should be able to  
 represent 1 & 2 above but it's implementation in D blurs the distinction  
 between "non existant" and "existing but empty" due to it's relationship  
 with null and it's behaviour when setting length to 0.
 
 In short:
 - A char[] should not go from "empty" to "non existant" without being  
 explicitly assigned to "non existant" (AKA null).
 - "empty" (AKA "") should not compare equal to "non existant" (AKA null).
 
 It appears to me that the only reliable way in D to indicate "non  
 existant" is to throw an exception. Perhaps this is acceptable, perhaps  
 it's the D way and I simply have to get used to it.
 
 <snip>
 
 Perhaps it should be illegal to implicitly cast a dynamic array to a  
 ptr.

 If the array ptr is null the result will be null, right? I don't see a
 problem with this.

 I was suggesting making it illegal so that casually testing !line would  
 be illegal. Instead it would have to be !line.ptr which makes it more  
 obvious what is actually being tested (ie - the length is ignored and  
 just the ptr is checked)

 
 I don't think this is necessary.
 
 By the way, when would you like readLine to return a null string as  
 opposed to an non-null-zero-length string?

 
 At the end of file.
 
 readLine() - null means no lines "exist".
 readLine() - "" means a line "exists" but is "emtpy" of chars.

All of this is well said and presented. I'm in total agreement with this
point of view. 

An empty string is a string that is empty. 

-- 
Derek Parnell
Melbourne, Australia
29/03/2005 9:03:46 PM

Mar 29 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

 "empty" - "Holding or containing nothing."

 In my mind something is "empty" if it:
   a. contains nothing.
   b. exists.

 It cannot be "empty" if it contains something.
 It cannot be "empty" if it does not exist.

 So, my first question. How do I represent "non existant" in D?

What you describe is ok with me but I don't think it maps well to D's 
arrays. To me I don't really look at existance or non-existance but instead 
the following two rules
1) all arrays have a well-defined length
2) arrays with non-zero length have a well-defined pointer
One can tread carefully to preserve pointers with 0 length arrays but it 
takes effort.

 By the way, when would you like readLine to return a null string as 
 opposed to an non-null-zero-length string?

 At the end of file.

 readLine() - null means no lines "exist".
 readLine() - "" means a line "exists" but is "emtpy" of chars.

The foreach will stop automatically at eof. It's like a foreach stopping at 
the end of an array when it has no more elements. It doesn't run once more 
with null - it just stops.

Mar 29 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

 "empty" - "Holding or containing nothing."

 In my mind something is "empty" if it:
   a. contains nothing.
   b. exists.

 It cannot be "empty" if it contains something.
 It cannot be "empty" if it does not exist.

 So, my first question. How do I represent "non existant" in D?

 What you describe is ok with me but I don't think it maps well to D's
 arrays.

Exactly my point. It would only take a few small changes to "fix" the  
problem as I see it.

 To me I don't really look at existance or non-existance but instead the  
 following two rules
 1) all arrays have a well-defined length
 2) arrays with non-zero length have a well-defined pointer
 One can tread carefully to preserve pointers with 0 length arrays but it
 takes effort.

Indeed. So, how do you handle existance/non-existance?

 By the way, when would you like readLine to return a null string as
 opposed to an non-null-zero-length string?

 At the end of file.

 readLine() - null means no lines "exist".
 readLine() - "" means a line "exists" but is "emtpy" of chars.

 The foreach will stop automatically at eof. It's like a foreach stopping  
 at the end of an array when it has no more elements. It doesn't run once  
 more with null - it just stops.

Which foreach? My one? Assume now that I remove the eof() check. What  
happens now?

Regan

Mar 29 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Regan Heath" <regan netwin.co.nz> wrote in message 
news:opsoe3wkh323k2f5 nrage.netwin.co.nz...
 On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com> 
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

 "empty" - "Holding or containing nothing."

 In my mind something is "empty" if it:
   a. contains nothing.
   b. exists.

 It cannot be "empty" if it contains something.
 It cannot be "empty" if it does not exist.

 So, my first question. How do I represent "non existant" in D?

 What you describe is ok with me but I don't think it maps well to D's
 arrays.

 Exactly my point. It would only take a few small changes to "fix" the 
 problem as I see it.

Java arrays have the semantics you describe. They distinguish between 
null/empty/non-empty and none compare as equal to the others. In fact even 
trying to compare a null array throws an exception much like trying to call 
opEquals on a null object reference throws an exception. It's a very 
reasonable thing to do. The main trouble with Java array semantics is that 
APIs wind up choosing between null and empty fairly randomly and so many 
Java array bugs are introduced by guessing some function returns "empty" 
when it in fact returns null. It's easier to focus instead on only 
distinguishing empty/non-empty, which is what D does. One can think up APIs 
where having a third, null, choice would be useful but almost all the time 
the practical uses of an array are covered by empty/non-empty.

 The foreach will stop automatically at eof. It's like a foreach stopping 
 at the end of an array when it has no more elements. It doesn't run once 
 more with null - it just stops.

 Which foreach? My one? Assume now that I remove the eof() check. What 
 happens now?

It would iterate forever just like any loop that doesn't have an ending 
condition.

Mar 29 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 29 Mar 2005 19:17:55 -0500, Ben Hinkle <ben.hinkle gmail.com>  
wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoe3wkh323k2f5 nrage.netwin.co.nz...
 On Tue, 29 Mar 2005 08:29:36 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsoeax3jt23k2f5 nrage.netwin.co.nz...
 On Mon, 28 Mar 2005 21:13:54 -0500, Ben Hinkle <ben.hinkle gmail.com>
 wrote:
 I take it you define empty to be non-null ptr and 0 length, correct?

 "empty" - "Holding or containing nothing."

 In my mind something is "empty" if it:
   a. contains nothing.
   b. exists.

 It cannot be "empty" if it contains something.
 It cannot be "empty" if it does not exist.

 So, my first question. How do I represent "non existant" in D?

 What you describe is ok with me but I don't think it maps well to D's
 arrays.

 Exactly my point. It would only take a few small changes to "fix" the
 problem as I see it.

 Java arrays have the semantics you describe. They distinguish between
 null/empty/non-empty and none compare as equal to the others. In fact  
 even
 trying to compare a null array throws an exception much like trying to  
 call
 opEquals on a null object reference throws an exception. It's a very
 reasonable thing to do.

Ok.

 The main trouble with Java array semantics is that
 APIs wind up choosing between null and empty fairly randomly and so many
 Java array bugs are introduced by guessing some function returns "empty"
 when it in fact returns null.

I can see how if the situation does not call for a distinction between  
"exists but is empty" and "does not exist" then the programmer may choose  
either "" or null to indicate no value. The choice will likely be based on  
thier personal preference and/or "fear of null" (a phenomenon I have  
encountered before)

I don't see this possibility as being a good reason to limit flexibility  
in this way.

 It's easier to focus instead on only
 distinguishing empty/non-empty, which is what D does.

You mean, limit flexibility for the sake of simplicity. I don't like it.

 One can think up APIs where having a third, null, choice would be useful  
 but almost all the time the practical uses of an array are covered by  
 empty/non-empty.

I think it depends on style and the sort of code you write as to whether  
the situations where a null choice is "required"* are common or not.  
Personally I come across them often. I also believe that some people just  
don't see the need for a distinction, i.e. the current readLine  
implementation.

*(required is perhaps the wrong word, you can probably work around most  
situation, but the workaround generally is just that, and sub-optimal)

 The foreach will stop automatically at eof. It's like a foreach  
 stopping
 at the end of an array when it has no more elements. It doesn't run  
 once
 more with null - it just stops.

 Which foreach? My one? Assume now that I remove the eof() check. What
 happens now?

 It would iterate forever

Not if readLine were implemented the way I assumed it would have been.

 just like any loop that doesn't have an ending
 condition.

Bollocks. :)
The ending condition is readLine() returning null (indicating no more  
lines "exist").

Regan

Mar 29 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 28 Mar 2005 15:25:57 +0200, AEon <aeon2001 lycos.de> wrote:
 Trying to understand what you did, here. There seem to be several  
 concepts I am still missing...

Ben has done a fairly good job of explaining it. I'll have a go too, the  
combination of our efforts will hopefully explain "everything". :)

 import std.c.stdlib;
 import std.stream;
 import std.stdio;
  class LineReader(Source) : Source

 You seem to be "shadowing" some parent class called Source?

This technique is called a "Snap-On". I am creating a new template class  
"LineReader" which is a child class of an unspecified (at this stage)  
class.

Later when I say: "LineReader!(BufferedFile) f;"

it specifies that "Source" is "BufferedFile".

 {
     int opApply(int delegate(inout char[]) dg)

 Alas I still have no idea what "delegate" does, and why it needs to be  
 used?

A delegate is like a function pointer, except that a delegate points to a  
(non-static) class member function. So calling it is like calling a class  
member on a class.

In this case the delegate is part of the "magic" that makes foreach work  
on a custom class like LineReader.

     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();

 How come readLine() knows of the stream?

Because LineReader is a child class of BufferedFile, which is a stream.  
The readLine call above calls the readLine of the parent class  
BufferedFile.

             if (!line) break;

 "if line == null" then break... no idea what this is good for.

I was trying to stop at the end of the file, it appears this stops on  
blank lines. IMO readLine is broken, it is returning null for a blank  
line, it should return "".

The difference between null and "" in the case of char[] is that null has  
a null .ptr and "is null" is true, so...

if (!line.ptr) break;
if (line is null) break;

statements should only fire when line is null and not "". But it appears  
readLine does not differentiate between null and "".

             result = dg(line);
             if (result) break;

 Don't understand these lines either.

As Ben said, it's part of the foreach "magic", his links should explain  
it. If not, let us know how the docs are deficient and hopefully someone  
can improve them.

 Can it be that you are filling up a "buffer" with all the lines of the  
 stream, until you reach an empty line, to let foreach then scan that  
 "buffer" like it does for any other array? If so that could possibly use  
 up a lot of RAM?!

No. I am reading one line at a time. When I call the delegate I am  
effectively executing the body of the foreach statement with the line I  
pass. Then I discard the line and read the next one. So only 1 line is in  
memory at a time.

         }
                return result;
     }
         int opApply(int delegate(inout size_t, inout char[]) dg)
     {               int result = 0;
         size_t lineno;

 Why did you use size_t for lineno, would int now also work? (I tested  
 this and it works fine to replace all size_t with int).

As Ben mentioned, size_t is either a 32 or 64 bit type depending on the  
underlying OS/processor. I believe the idea is that using it chooses the  
most "sensible" type for holding "size" values on the current OS/processor.

         char[] line;
          for(lineno = 1; !eof(); lineno++)
         {
             line = readLine();
             if (!line) break;
             result = dg(lineno,line);
             if (result) break;
         }
                        return result;
     }
 }

 AFAICT you defined 2 "structures" that will let the user use foreach on  
 "f.open" streams. One version that will "just" read lines another that  
 will also let you retrieve the line numbers as well.

Not 2 structures in the sense of D structs but 2 methods allowing foreach  
on my new class LineReader, which extends BufferedFile (by adding the  
foreach ability).

 int main(char[][] args)
 {

     LineReader!(BufferedFile) f;
     f = new LineReader!(BufferedFile)();

 Can be reduced to:

      LineReader!(BufferedFile) f = new LineReader!(BufferedFile)();

 making the equivalent coding to

      File f = new File();

 more obvious.

You could, I have chosen not to allocate the class till after my error  
checking, but then I could have moved "LineReader!(BufferedFile) f;" to  
after the error checking also.. I guess I'm used to C. :)

 IOW you seem to have defined a new stream?

Yes. I have extended/added foreach-ability to any Stream class.

     if (args.length < 2) usage();           f.open(args[1],FileMode.In);
     foreach(char[] line; f)

 Is this default behavior? I.e. that foreach can parse streams? AFAICT  
 this is the the new speciality of your stream, right? Very nice.

It's new speciality of my stream. I think we should add it to Streams  
though.
In addition we could add

foreach(char c; f) {}

to read characters one at a time.

     {
         writefln("READ[",line,"]");
     }
     f.close();
         f.open(args[1],FileMode.In);
     foreach(size_t lineno, char[] line; f)

 Neat.

     {
         writefln("READ[",lineno,"][",line,"]");
     }
     f.close();
         return 0;   }


 I noted when testing this code, that it will only read the lines of a  
 stream until an empty line is encountered. Is this indeed intended?

No it was not intended. IMO readLine is broken.

Regan

Mar 28 2005

AEon <aeon2001 lycos.de> writes:

Regan Heath wrote (Ben read your feedback as well thanx):

     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();


 How come readLine() knows of the stream?

 
 Because LineReader is a child class of BufferedFile, which is a stream.  
 The readLine call above calls the readLine of the parent class  
 BufferedFile.

Ah... that is one of the things I really hate as a OOP beginner. It is 
very difficult to check where the heck certain "behavior" comes from. If 
  the programmer is indeed fully aware of the parent classes, that may 
be clearer, but when I only see the "new" code, I find it very 
confusing. I am not even sure one *can* look up the original definition 
of the parent classes?

             result = dg(line);
             if (result) break;

 Don't understand these lines either.

 
 As Ben said, it's part of the foreach "magic", his links should explain  
 it. If not, let us know how the docs are deficient and hopefully 
 someone  can improve them.

Reminds me that I don't actually understand D, and that I only use 
certain code sniplets all over the place sofar. :)


 Why did you use size_t for lineno, would int now also work? (I tested  
 this and it works fine to replace all size_t with int).

 
 As Ben mentioned, size_t is either a 32 or 64 bit type depending on the  
 underlying OS/processor. I believe the idea is that using it chooses 
 the  most "sensible" type for holding "size" values on the current 
 OS/processor.

Aha... IIRC there was something like that in ANSI C as well... I never 
trusted it ;)... so size_t is something like a special optimization 
case. I.e. when do you decide to use good old int, and when do you feel 
size_t would be a better choice?


 IOW you seem to have defined a new stream?

 
 Yes. I have extended/added foreach-ability to any Stream class.

Neat indeed.



BTW, I decided to go the simple way:

File lg = open_Read_Log( glb.log );
File mg = open_Write_Mlg( metafile );

File open_Read_Log( char[] logfile )
{
	char[13] warn = "open_Read_Log";

	if( ! std.file.exists(logfile) )
     {
		Err(warn, "Can't open *read* your log file... '"~logfile~"'", "Ensure 
log file exists and double check path!");
		exit(1);
	}	

	// Define/create "handle" for logfile READ
	File lg = new File( logfile, FileMode.In );
	// If logfile open error: "Error: file '...' not found"
	return lg;
}

etc.

What surprised me in open_Read_Log(), when comparing it to my ANSI C code:

     if (fgets(line, M2AXCHR, link)==NULL){
       if(ferror(link)!=0){ puts("Error during log read..."); exit(1); }
       clearerr(link); break;}

You can check for file existence.

But you do not seem to be able to handle
     "new File( logfile, FileMode.In )"
errors... i.e. if something happens D, will exit with an internal Error 
message.

Presumably one could "catch" such errors to provide own error messages?


Same seems to be the case with

     while( ! lg.eof() )	
     {
	line = lg.readLine();
     }

Should a readLine() error occur, then D trows a internal Error message.

I am not sure I *really* want to catch errors, should this be possible 
in the above 2 cases. But maybe that could be useful?


AEon

Mar 28 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 29 Mar 2005 05:35:07 +0200, AEon <aeon2001 lycos.de> wrote:
 Regan Heath wrote (Ben read your feedback as well thanx):

     {
         int result = 0;
         char[] line;
          while(!eof())
         {
             line = readLine();


 How come readLine() knows of the stream?

  Because LineReader is a child class of BufferedFile, which is a  
 stream.  The readLine call above calls the readLine of the parent  
 class  BufferedFile.

 Ah... that is one of the things I really hate as a OOP beginner. It is  
 very difficult to check where the heck certain "behavior" comes from. If  
   the programmer is indeed fully aware of the parent classes, that may  
 be clearer, but when I only see the "new" code, I find it very  
 confusing. I am not even sure one *can* look up the original definition  
 of the parent classes?

In this case you can look in dmd\src\phobos\std\stream.d for the class  
definition of BufferedFile.

You may be interested in an old thread on method name resolution:
   http://www.digitalmars.com/d/archives/digitalmars/D/6928.html

It's kinda involved but relevant to your comments above as the method name  
resolution affects the behaviour of a derived class. The idea being D's  
method name resolution makes it simpler/explicit WRT the behaviour of  
classes with overloaded methods.

             result = dg(line);
             if (result) break;

 Don't understand these lines either.

  As Ben said, it's part of the foreach "magic", his links should  
 explain  it. If not, let us know how the docs are deficient and  
 hopefully someone  can improve them.

 Reminds me that I don't actually understand D, and that I only use  
 certain code sniplets all over the place sofar. :)

I wouldn't worry overmuch. I still find it hard to remember how to code  
things like opApply, I copy/paste from the docs and then modify each time  
I do it.

 Why did you use size_t for lineno, would int now also work? (I tested   
 this and it works fine to replace all size_t with int).

  As Ben mentioned, size_t is either a 32 or 64 bit type depending on  
 the  underlying OS/processor. I believe the idea is that using it  
 chooses the  most "sensible" type for holding "size" values on the  
 current OS/processor.

 Aha... IIRC there was something like that in ANSI C as well... I never  
 trusted it ;)... so size_t is something like a special optimization  
 case. I.e. when do you decide to use good old int, and when do you feel  
 size_t would be a better choice?

Good question. I would use 'int' when the size of the type is important,  
i.e. I need 32 bits. I would use size_t when the size is unimportant, so  
long as it is "big enough".

 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal Error  
 message.

 Presumably one could "catch" such errors to provide own error messages?

Yes.

try {
   File f = new File(logfile, FileMode.In);
}
catch (OpenException e) {
   writefln("OPEN ERROR - ",e);
}

 Same seems to be the case with

      while( ! lg.eof() )	
      {
 	line = lg.readLine();
      }

 Should a readLine() error occur, then D trows a internal Error message.

try {
   while( ! lg.eof() )	
   {
     line = lg.readLine();
   }
}
catch (ReadException e) {
   writefln("READ ERROR - ",e);
}

 I am not sure I *really* want to catch errors, should this be possible  
 in the above 2 cases. But maybe that could be useful?

Exceptions are the recommended error handling mechanism for D. The  
argument/confusion centers around what is worthy of an exception and what  
is not.

For example IMO in the code above not being able to open a file is  
exceptional (you have assumed it exists by opening in FileMode.In), but,  
reaching the end of the file is not exceptional as it's guaranteed to  
happen eventually.

Uncaught exceptions are automatically handled by the default handler, for  
trivial applications allowing it to handle your exceptions (like the  
failure to open a file) might be exactly what you want. It's your choice.

Regan

Mar 29 2005

AEon <aeon2001 lycos.de> writes:

Regan Heath wrote:

 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal 
 Error  message.

 Presumably one could "catch" such errors to provide own error messages?

 
 Yes.
 
 try {
   File f = new File(logfile, FileMode.In);
 }
 catch (OpenException e) {
   writefln("OPEN ERROR - ",e);
 }

Have read several examples by now. Is there a complete list of catch 
"keywords"? The D documentions mentions a few, but probably not all?

e.g.	catch (ArrayBoundsError)
	catch (Object o)
	catch (std.asserterror.AssertError ae)

 Same seems to be the case with

      while( ! lg.eof() )   
      {
     line = lg.readLine();
      }

 Should a readLine() error occur, then D trows a internal Error message.

 
 try {
   while( ! lg.eof() )   
   {
     line = lg.readLine();
   }
 }
 catch (ReadException e) {
   writefln("READ ERROR - ",e);
 }

Ahh... info like that could be helpful in the official docs.


 I am not sure I *really* want to catch errors, should this be 
 possible  in the above 2 cases. But maybe that could be useful?

 
 Exceptions are the recommended error handling mechanism for D. The  
 argument/confusion centers around what is worthy of an exception and 
 what is not.
 
 For example IMO in the code above not being able to open a file is  
 exceptional (you have assumed it exists by opening in FileMode.In), 
 but,  reaching the end of the file is not exceptional as it's guaranteed 
 to  happen eventually.
 
 Uncaught exceptions are automatically handled by the default handler, 
 for  trivial applications allowing it to handle your exceptions (like 
 the  failure to open a file) might be exactly what you want. It's your 
 choice.

Well in the above examples it would basically just give me the chance to 
write out my own messages. But since these cases are serious, there is 
nothing much one could save.

AEon

Mar 29 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 29 Mar 2005 19:33:30 +0200, AEon <aeon2001 lycos.de> wrote:
 But you do not seem to be able to handle
      "new File( logfile, FileMode.In )"
 errors... i.e. if something happens D, will exit with an internal  
 Error  message.

 Presumably one could "catch" such errors to provide own error messages?

  Yes.
  try {
   File f = new File(logfile, FileMode.In);
 }
 catch (OpenException e) {
   writefln("OPEN ERROR - ",e);
 }

 Have read several examples by now. Is there a complete list of catch  
 "keywords"? The D documentions mentions a few, but probably not all?

 e.g.	catch (ArrayBoundsError)
 	catch (Object o)
 	catch (std.asserterror.AssertError ae)

Each "catch keyword" is a class derived from the Exception or Error  
classes. They are defined in the modules that use them. I agree it would  
be nice to have a complete list. Eventually I can imagine a documentation  
generator listing all the exceptions that can be thrown by a function.

Regan

Mar 29 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Reading large files, writing large files?