www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Converting from std.file.read's void[]

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
Okay, it seems that the way to read in a binary file is to use std.file.read() 
which reads in the file as a void[]. This immediately raises the question as to 
how to convert the void[] into something useful. It seems to me that casting 
void[]  to a ubyte[] is then the appropriate thing to do because then you can 
properly index it and grab the appropriate bytes that need to be converting
into 
useful values. However, that still raises the question of how to get anything 
useful out of the bytes. UTF-8 strings are easy because they're the same size
as 
ubytes. Casting to char[] for the portion of the data that you want as a string 
seems to work just fine. But what about other types? Is it the correct thing to 
cast to T[] where T is whatever type the data represents and then index into it 
to get the values that you want of that type and then cast the next section of 
the data to U[] where U is the type for the next section of the data, etc.? Or 
is there a better way to handle this?

- Jonathan M Davis
Sep 21 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 UTF-8 strings are easy because they're the same size as ubytes.
 Casting to char[] for the portion of the data that you want as a string 
 seems to work just fine.
D2 string are immutable(char)[] and not char[]. Strings are UTF-8, while the raw bytes you read from a file may contain everything, so in some situations you need to use the validate function.
 But what about other types? Is it the correct thing to 
 cast to T[] where T is whatever type the data represents and then index into
it 
 to get the values that you want of that type and then cast the next section of 
 the data to U[] where U is the type for the next section of the data, etc.? Or 
 is there a better way to handle this?
It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File. Bye, bearophile
Sep 21 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
 Take a look at the rawWrite/rawRead methods of std.stdio.File.
I have just tried those a little. Python file object doesn't have a eof() method. This D2 program shows that eof() is false even when the whole file has being read, is this correct? import std.stdio: File; void main() { double[3] data = [0.5, 1.5, 2.5]; auto f = File("test.raw", "wb"); f.rawWrite(data); f.close(); f = File("test.raw", "rb"); assert(!f.eof()); f.rawRead(data); assert(f.eof()); // Assertion failure } Bye, bearophile
Sep 21 2010
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, September 21, 2010 17:34:26 bearophile wrote:
 Take a look at the rawWrite/rawRead methods of std.stdio.File.
I have just tried those a little. Python file object doesn't have a eof() method. This D2 program shows that eof() is false even when the whole file has being read, is this correct? import std.stdio: File; void main() { double[3] data = [0.5, 1.5, 2.5]; auto f = File("test.raw", "wb"); f.rawWrite(data); f.close(); f = File("test.raw", "rb"); assert(!f.eof()); f.rawRead(data); assert(f.eof()); // Assertion failure } Bye, bearophile
I believe that the typical behaviour in C and C++ is that eof() is false until you've tried to read beyond the end of the file. So, you get one more read than you might expect. You do the read, an then check eof() rather than checking eof() and then doing the read if it isn't true. - Jonathan M Davis
Sep 21 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, September 21, 2010 16:41:57 bearophile wrote:
 Jonathan M Davis:
 UTF-8 strings are easy because they're the same size as ubytes.
 Casting to char[] for the portion of the data that you want as a string
 seems to work just fine.
D2 string are immutable(char)[] and not char[]. Strings are UTF-8, while the raw bytes you read from a file may contain everything, so in some situations you need to use the validate function.
Well, yes. I was talking about strings in the general sense (though UTF-8 strings), not necessarily the specific type string. The fact that you can cast to char[] makes getting strings easy, while the correct way to deal with types which aren't bytes isn't as obvious.
 
 But what about other types? Is it the correct thing to
 cast to T[] where T is whatever type the data represents and then index
 into it to get the values that you want of that type and then cast the
 next section of the data to U[] where U is the type for the next section
 of the data, etc.? Or is there a better way to handle this?
It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File.
That does look like a better way to handle it. Thanks. Normally, I don't mess with binary files, so I'm not particularly well-versed in the correct ways to read them. - Jonathan M Davis
Sep 21 2010
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Jonathan M Davis Wrote:

 Okay, it seems that the way to read in a binary file is to use std.file.read() 
 which reads in the file as a void[]. This immediately raises the question as
to 
 how to convert the void[] into something useful.
You may like the BinaryReader interface http://msdn.microsoft.com/en-us/library/system.io.binaryreader_members.aspx
Sep 21 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 21 Sep 2010 19:06:43 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 Okay, it seems that the way to read in a binary file is to use  
 std.file.read()
 which reads in the file as a void[]. This immediately raises the  
 question as to
 how to convert the void[] into something useful. It seems to me that  
 casting
 void[]  to a ubyte[] is then the appropriate thing to do because then  
 you can
 properly index it and grab the appropriate bytes that need to be  
 converting into
 useful values. However, that still raises the question of how to get  
 anything
 useful out of the bytes. UTF-8 strings are easy because they're the same  
 size as
 ubytes. Casting to char[] for the portion of the data that you want as a  
 string
 seems to work just fine. But what about other types? Is it the correct  
 thing to
 cast to T[] where T is whatever type the data represents and then index  
 into it
 to get the values that you want of that type and then cast the next  
 section of
 the data to U[] where U is the type for the next section of the data,  
 etc.? Or
 is there a better way to handle this?
You can slice void arrays, even though you cannot index them. If you know for instance that a struct S resides at the 15th byte, you can do: (cast(S[])arr[15..$])[0]; or: *(cast(S*)arr.ptr + 15); there are various ways to get the data. Only if you know the data is an *array* of a certain type is it useful to cast the entire array. -Steve
Sep 22 2010