digitalmars.D.learn - Converting from std.file.read's void[]

Jonathan M Davis (14/14) Sep 21 2010 Okay, it seems that the way to read in a binary file is to use std.file....

bearophile (7/15) Sep 21 2010 D2 string are immutable(char)[] and not char[].

bearophile (14/15) Sep 21 2010 I have just tried those a little. Python file object doesn't have a eof(...

Jonathan M Davis (6/27) Sep 21 2010 I believe that the typical behaviour in C and C++ is that eof() is false...

Jonathan M Davis (9/27) Sep 21 2010 Well, yes. I was talking about strings in the general sense (though UTF-...

Kagamin (3/6) Sep 21 2010 You may like the BinaryReader interface
Steven Schveighoffer (10/35) Sep 22 2010 You can slice void arrays, even though you cannot index them. If you kn...

Jonathan M Davis <jmdavisProg gmx.com> writes:

Okay, it seems that the way to read in a binary file is to use std.file.read() 
which reads in the file as a void[]. This immediately raises the question as to 
how to convert the void[] into something useful. It seems to me that casting 
void[]  to a ubyte[] is then the appropriate thing to do because then you can 
properly index it and grab the appropriate bytes that need to be converting
into 
useful values. However, that still raises the question of how to get anything 
useful out of the bytes. UTF-8 strings are easy because they're the same size
as 
ubytes. Casting to char[] for the portion of the data that you want as a string 
seems to work just fine. But what about other types? Is it the correct thing to 
cast to T[] where T is whatever type the data represents and then index into it 
to get the values that you want of that type and then cast the next section of 
the data to U[] where U is the type for the next section of the data, etc.? Or 
is there a better way to handle this?

- Jonathan M Davis

Sep 21 2010

bearophile <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 UTF-8 strings are easy because they're the same size as ubytes.
 Casting to char[] for the portion of the data that you want as a string 
 seems to work just fine.

D2 string are immutable(char)[] and not char[].
Strings are UTF-8, while the raw bytes you read from a file may contain
everything, so in some situations you need to use the validate function.


 But what about other types? Is it the correct thing to 
 cast to T[] where T is whatever type the data represents and then index into
it 
 to get the values that you want of that type and then cast the next section of 
 the data to U[] where U is the type for the next section of the data, etc.? Or 
 is there a better way to handle this?

It's better to avoid casts when possible, and SafeD may even be restrict their
usage.
Take a look at the rawWrite/rawRead methods of std.stdio.File.

Bye,
bearophile

Sep 21 2010

bearophile <bearophileHUGS lycos.com> writes:

 Take a look at the rawWrite/rawRead methods of std.stdio.File.

I have just tried those a little. Python file object doesn't have a eof()
method. This D2 program shows that eof() is false even when the whole file has
being read, is this correct?


import std.stdio: File;
void main() {
    double[3] data = [0.5, 1.5, 2.5];
    auto f = File("test.raw", "wb");
    f.rawWrite(data);
    f.close();
    f = File("test.raw", "rb");
    assert(!f.eof());
    f.rawRead(data);
    assert(f.eof()); // Assertion failure
}

Bye,
bearophile

Sep 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, September 21, 2010 17:34:26 bearophile wrote:
 Take a look at the rawWrite/rawRead methods of std.stdio.File.

 
 I have just tried those a little. Python file object doesn't have a eof()
 method. This D2 program shows that eof() is false even when the whole file
 has being read, is this correct?
 
 
 import std.stdio: File;
 void main() {
     double[3] data = [0.5, 1.5, 2.5];
     auto f = File("test.raw", "wb");
     f.rawWrite(data);
     f.close();
     f = File("test.raw", "rb");
     assert(!f.eof());
     f.rawRead(data);
     assert(f.eof()); // Assertion failure
 }
 
 Bye,
 bearophile

I believe that the typical behaviour in C and C++ is that eof() is false until 
you've tried to read beyond the end of the file. So, you get one more read than 
you might expect. You do the read, an then check eof() rather than checking 
eof() and then doing the read if it isn't true.

- Jonathan M Davis

Sep 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, September 21, 2010 16:41:57 bearophile wrote:
 Jonathan M Davis:
 UTF-8 strings are easy because they're the same size as ubytes.
 Casting to char[] for the portion of the data that you want as a string
 seems to work just fine.

 
 D2 string are immutable(char)[] and not char[].
 Strings are UTF-8, while the raw bytes you read from a file may contain
 everything, so in some situations you need to use the validate function.

Well, yes. I was talking about strings in the general sense (though UTF-8 
strings), not necessarily the specific type string. The fact that you can cast
to 
char[] makes getting strings easy, while the correct way to deal with types 
which aren't bytes isn't as obvious.

 
 But what about other types? Is it the correct thing to
 cast to T[] where T is whatever type the data represents and then index
 into it to get the values that you want of that type and then cast the
 next section of the data to U[] where U is the type for the next section
 of the data, etc.? Or is there a better way to handle this?

 
 It's better to avoid casts when possible, and SafeD may even be restrict
 their usage. Take a look at the rawWrite/rawRead methods of
 std.stdio.File.

That does look like a better way to handle it. Thanks. Normally, I don't mess 
with binary files, so I'm not particularly well-versed in the correct ways to 
read them.

- Jonathan M Davis

Sep 21 2010

Kagamin <spam here.lot> writes:

Jonathan M Davis Wrote:

 Okay, it seems that the way to read in a binary file is to use std.file.read() 
 which reads in the file as a void[]. This immediately raises the question as
to 
 how to convert the void[] into something useful.

You may like the BinaryReader interface
http://msdn.microsoft.com/en-us/library/system.io.binaryreader_members.aspx

Sep 21 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 21 Sep 2010 19:06:43 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 Okay, it seems that the way to read in a binary file is to use  
 std.file.read()
 which reads in the file as a void[]. This immediately raises the  
 question as to
 how to convert the void[] into something useful. It seems to me that  
 casting
 void[]  to a ubyte[] is then the appropriate thing to do because then  
 you can
 properly index it and grab the appropriate bytes that need to be  
 converting into
 useful values. However, that still raises the question of how to get  
 anything
 useful out of the bytes. UTF-8 strings are easy because they're the same  
 size as
 ubytes. Casting to char[] for the portion of the data that you want as a  
 string
 seems to work just fine. But what about other types? Is it the correct  
 thing to
 cast to T[] where T is whatever type the data represents and then index  
 into it
 to get the values that you want of that type and then cast the next  
 section of
 the data to U[] where U is the type for the next section of the data,  
 etc.? Or
 is there a better way to handle this?

You can slice void arrays, even though you cannot index them.  If you know  
for instance that a struct S resides at the 15th byte, you can do:

(cast(S[])arr[15..$])[0];

or:

*(cast(S*)arr.ptr + 15);

there are various ways to get the data.  Only if you know the data is an  
*array* of a certain type is it useful to cast the entire array.

-Steve

Sep 22 2010

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Converting from std.file.read's void[]