www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - A bit of binary I/O

reply Heinz <billgates microsoft.com> writes:
Hi guys, i'm having great fun writing and reading binary files. It's my first
time doing this and i've got a few questions in mind.
I write the same data(1 ulong and 1 string, i call them primitives) in 3
different ways and i get a different output for one of them. I create 1 file
per method. If you open the created file with an hex editor you can see this.

The first way is to write primitives manually one by one:

// primitive way
ulong i = 9;
char[] s = "hello world";
myFile.writeExact(&i, i.sizeof);
myFile.writeExact(&s, s.sizeof);

Reading data:
// Is done by reading each primitive.
ulong i2; char[] s2;
myFile.readExact(&i2, i2.sizeof);
myFile.readExact(&s2, s2.sizeof);



The second way is to write a structure with all the primitives as members:

// struct way
struct t
{
	ulong i;
	char[] s;
}

t mt;
mt.i = 9;
mt.s = "hello world";
myFile.writeExact(&mt, mt.sizeof);

Reading data:
// We read the entire struct.
t mt2;
myFile.readExact(&mt2, mt2.sizeof);



And the third way is to write a class with all the primitives as members:

// class way
class tt
{
	ulong i;
	char[] s;
}

tt mtt = new tt();
mtt.i = 9;
mtt.s = "hello world";
ResFile.writeExact(&mtt, mtt.sizeof);

Reading data:
// We read the entire class.
tt mtt2;
myFile.readExact(&mtt2, mtt2.sizeof);



All of these methods works perfect. I'm able to retrieve values from all of
them. Now lets check at the outputs:

// Primitive

09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

// Structure

09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

// Class

C0 3F 91 00

My questions are:

1) What's the best method to write data (in terms of data protection/encryption
against reversion). The class way seems to me at first look the most secure way.
2) Wich method is the faster in retrieving data?
3) How the hell does this work? I mean, the string s is 10 chars long but the
first 2 methods uses only 8 bytes to store the string and most of them are 0.
Even more interesting, look at the class method, it uses only 4 bytes to store
about 18 bytes of real data! WTF.
I'm really ?

This is a very interesting subject to me and if someone could clear my mind i
would apreciate it very much.

Thx you very very much in advance.

Heinz
Jan 20 2007
next sibling parent reply "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> writes:
 // Primitive
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

09 00 00 00 00 00 00 00 // the ulong with value 9 0B 00 00 00 // arraysize 11 A0 C7 41 00 // pointervalue to the start of data
 // Structure
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

same here
 // Class
 C0 3F 91 00

the first 4 bytes of your class. mtt.sizeof is the size of the reference not the size of the object itself. s.ptr is the pointer to the array data. &s is the address of the struct, that holds the array length and the pointer to the data. To write the string, you might want to try this: myFile.writeExact( s.ptr, s.length );
Jan 20 2007
parent reply Heinz <billgates microsoft.com> writes:
Frank Benoit (keinfarbton) Wrote:

 // Primitive
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

09 00 00 00 00 00 00 00 // the ulong with value 9 0B 00 00 00 // arraysize 11 A0 C7 41 00 // pointervalue to the start of data
 // Structure
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

same here
 // Class
 C0 3F 91 00

the first 4 bytes of your class. mtt.sizeof is the size of the reference not the size of the object itself. s.ptr is the pointer to the array data. &s is the address of the struct, that holds the array length and the pointer to the data. To write the string, you might want to try this: myFile.writeExact( s.ptr, s.length );

I get it, but if i'm actually writing the address of my data and not the data itself then why i'm able to retrieve the data even if it's not there?
Jan 20 2007
next sibling parent "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> writes:
 I get it, but if i'm actually writing the address of my data and not the data
itself then why i'm able to retrieve the data even if it's not there? 

Hehe, this works because the string is still in memory. And then you read back the pointer address from the file, and overwrite the other array data ptr with it. Now s2 points to the data of s. If you do the read in a second program run, it will probably not work.
Jan 20 2007
prev sibling parent Heinz <billgates microsoft.com> writes:
Heinz Wrote:

 Frank Benoit (keinfarbton) Wrote:
 
 // Primitive
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

09 00 00 00 00 00 00 00 // the ulong with value 9 0B 00 00 00 // arraysize 11 A0 C7 41 00 // pointervalue to the start of data
 // Structure
 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

same here
 // Class
 C0 3F 91 00

the first 4 bytes of your class. mtt.sizeof is the size of the reference not the size of the object itself. s.ptr is the pointer to the array data. &s is the address of the struct, that holds the array length and the pointer to the data. To write the string, you might want to try this: myFile.writeExact( s.ptr, s.length );

I get it, but if i'm actually writing the address of my data and not the data itself then why i'm able to retrieve the data even if it's not there?

I think i'm getting it, the data retrieved are addresses to the start of data but in my RAM, so if i take this file to another computer the data received should be different, am i right? To solve this and write the real data you suggest using the .ptr, is this property available in every object. I'm sorry to bother you so much Frank: I'm interested in your oppinion about the other 2 questions. Really thanks man, you rule.
Jan 20 2007
prev sibling next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Heinz" <billgates microsoft.com> wrote in message 
news:eou69k$8tf$1 digitaldaemon.com...

 The first way is to write primitives manually one by one:

 // primitive way
 ulong i = 9;
 char[] s = "hello world";
 myFile.writeExact(&i, i.sizeof);
 myFile.writeExact(&s, s.sizeof);

 Reading data:
 // Is done by reading each primitive.
 ulong i2; char[] s2;
 myFile.readExact(&i2, i2.sizeof);
 myFile.readExact(&s2, s2.sizeof);

You're writing the string wrong. All you're doing is writing the length and pointer of the array data, without actually writing the data. The Stream class (and by extension, the File class) provides functions for writing out every basic type: ulong i = 9; char[] s = "hello world"; myFile.write(i); myFile.write(s); ... ulong i2; char[] s2; myFile.read(i2); myFile.read(s);
 The second way is to write a structure with all the primitives as members:

 // struct way
 struct t
 {
 ulong i;
 char[] s;
 }

 t mt;
 mt.i = 9;
 mt.s = "hello world";
 myFile.writeExact(&mt, mt.sizeof);

 Reading data:
 // We read the entire struct.
 t mt2;
 myFile.readExact(&mt2, mt2.sizeof);

Again, you're just writing out the array reference without writing its contents. You have to write out each member individually. If there were no reference types in the struct, this would work fine.
 And the third way is to write a class with all the primitives as members:

 // class way
 class tt
 {
 ulong i;
 char[] s;
 }

 tt mtt = new tt();
 mtt.i = 9;
 mtt.s = "hello world";
 ResFile.writeExact(&mtt, mtt.sizeof);

 Reading data:
 // We read the entire class.
 tt mtt2;
 myFile.readExact(&mtt2, mtt2.sizeof);

This is incorrect, and is only working because of how you've written your program. You're not writing the data out at all, you're writing a class reference. The 00913FC0 is just the memory address of the class instance that mtt points to, and when you read that address back in, you're just looking at the data in memory. This program wouldn't work if you write the file, exited, then had another program that read the data. You'd end up with a memory access violation, and none of the data in the class is actually written out. If you want to write a class out to a file, a common way is to have some kind of generic "serialize" and "unserialize" functions for the class: class C { ulong i; char[] s; void serialize(Stream s) { s.write(i); s.write(s); } static C unserialize(Stream s) { C c = new C(); s.read(c.i); s.read(c.s); return c; } } ... C c = new C(); c.i = 5; c.s = "foo"; c.serialize(myFile); ... C c = C.unserialize(myFile);
 All of these methods works perfect. I'm able to retrieve values from all 
 of them. Now lets check at the outputs:

 // Primitive

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Structure

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Class

 C0 3F 91 00

 My questions are:

 1) What's the best method to write data (in terms of data 
 protection/encryption against reversion). The class way seems to me at 
 first look the most secure way.

As explained before, the class method is wrong, and there is no encryption going on here. It's just a memory address, and you should never, ever write memory addresses to a file. That being said, the best way is probably to just use the primitive .read and .write methods of File. Just .. never, ever write pointers or references of any kind to a file.
 2) Wich method is the faster in retrieving data?

If you implement them correctly, all three sample programs should make the exact same output file using the same number of writes (and read it in the same number of reads), and so they are all the same in terms of performance.
Jan 20 2007
parent reply Heinz <billgates microsoft.com> writes:
Jarrett Billingsley Wrote:

 "Heinz" <billgates microsoft.com> wrote in message 
 news:eou69k$8tf$1 digitaldaemon.com...
 
 The first way is to write primitives manually one by one:

 // primitive way
 ulong i = 9;
 char[] s = "hello world";
 myFile.writeExact(&i, i.sizeof);
 myFile.writeExact(&s, s.sizeof);

 Reading data:
 // Is done by reading each primitive.
 ulong i2; char[] s2;
 myFile.readExact(&i2, i2.sizeof);
 myFile.readExact(&s2, s2.sizeof);

You're writing the string wrong. All you're doing is writing the length and pointer of the array data, without actually writing the data. The Stream class (and by extension, the File class) provides functions for writing out every basic type: ulong i = 9; char[] s = "hello world"; myFile.write(i); myFile.write(s); ... ulong i2; char[] s2; myFile.read(i2); myFile.read(s);
 The second way is to write a structure with all the primitives as members:

 // struct way
 struct t
 {
 ulong i;
 char[] s;
 }

 t mt;
 mt.i = 9;
 mt.s = "hello world";
 myFile.writeExact(&mt, mt.sizeof);

 Reading data:
 // We read the entire struct.
 t mt2;
 myFile.readExact(&mt2, mt2.sizeof);

Again, you're just writing out the array reference without writing its contents. You have to write out each member individually. If there were no reference types in the struct, this would work fine.
 And the third way is to write a class with all the primitives as members:

 // class way
 class tt
 {
 ulong i;
 char[] s;
 }

 tt mtt = new tt();
 mtt.i = 9;
 mtt.s = "hello world";
 ResFile.writeExact(&mtt, mtt.sizeof);

 Reading data:
 // We read the entire class.
 tt mtt2;
 myFile.readExact(&mtt2, mtt2.sizeof);

This is incorrect, and is only working because of how you've written your program. You're not writing the data out at all, you're writing a class reference. The 00913FC0 is just the memory address of the class instance that mtt points to, and when you read that address back in, you're just looking at the data in memory. This program wouldn't work if you write the file, exited, then had another program that read the data. You'd end up with a memory access violation, and none of the data in the class is actually written out. If you want to write a class out to a file, a common way is to have some kind of generic "serialize" and "unserialize" functions for the class: class C { ulong i; char[] s; void serialize(Stream s) { s.write(i); s.write(s); } static C unserialize(Stream s) { C c = new C(); s.read(c.i); s.read(c.s); return c; } } ... C c = new C(); c.i = 5; c.s = "foo"; c.serialize(myFile); ... C c = C.unserialize(myFile);
 All of these methods works perfect. I'm able to retrieve values from all 
 of them. Now lets check at the outputs:

 // Primitive

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Structure

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Class

 C0 3F 91 00

 My questions are:

 1) What's the best method to write data (in terms of data 
 protection/encryption against reversion). The class way seems to me at 
 first look the most secure way.

As explained before, the class method is wrong, and there is no encryption going on here. It's just a memory address, and you should never, ever write memory addresses to a file. That being said, the best way is probably to just use the primitive .read and .write methods of File. Just .. never, ever write pointers or references of any kind to a file.
 2) Wich method is the faster in retrieving data?

If you implement them correctly, all three sample programs should make the exact same output file using the same number of writes (and read it in the same number of reads), and so they are all the same in terms of performance.

Wow, that covers all, thanks for your reply. But, can i still write an entire structure with writeExact()? or you suggest writting each member of the structure with write()? Another question: Writting a type char[] with write() writes string as ASCII? if so then is a legible string, how can i protect that data? Thanks man
Jan 20 2007
next sibling parent janderson <askme me.com> writes:
Heinz wrote:
 Jarrett Billingsley Wrote:
 
 
 Another question: Writting a type char[] with write() writes string as ASCII?
if so then is a legible string, how can i protect that data?
 
 Thanks man

You have to use some form of encryption. XOR encryption is one of the simplest, although not the most secure. Here's a C doc about it http://www.cprogramming.com/tutorial/xor.html. Maybe there's already an encryption library in D? -Joel
Jan 20 2007
prev sibling next sibling parent Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Heinz wrote:
 Jarrett Billingsley Wrote:
 
 "Heinz" <billgates microsoft.com> wrote in message 
 news:eou69k$8tf$1 digitaldaemon.com...

 The first way is to write primitives manually one by one:

 // primitive way
 ulong i = 9;
 char[] s = "hello world";
 myFile.writeExact(&i, i.sizeof);
 myFile.writeExact(&s, s.sizeof);

 Reading data:
 // Is done by reading each primitive.
 ulong i2; char[] s2;
 myFile.readExact(&i2, i2.sizeof);
 myFile.readExact(&s2, s2.sizeof);

pointer of the array data, without actually writing the data. The Stream class (and by extension, the File class) provides functions for writing out every basic type: ulong i = 9; char[] s = "hello world"; myFile.write(i); myFile.write(s); ... ulong i2; char[] s2; myFile.read(i2); myFile.read(s);
 The second way is to write a structure with all the primitives as members:

 // struct way
 struct t
 {
 ulong i;
 char[] s;
 }

 t mt;
 mt.i = 9;
 mt.s = "hello world";
 myFile.writeExact(&mt, mt.sizeof);

 Reading data:
 // We read the entire struct.
 t mt2;
 myFile.readExact(&mt2, mt2.sizeof);

contents. You have to write out each member individually. If there were no reference types in the struct, this would work fine.
 And the third way is to write a class with all the primitives as members:

 // class way
 class tt
 {
 ulong i;
 char[] s;
 }

 tt mtt = new tt();
 mtt.i = 9;
 mtt.s = "hello world";
 ResFile.writeExact(&mtt, mtt.sizeof);

 Reading data:
 // We read the entire class.
 tt mtt2;
 myFile.readExact(&mtt2, mtt2.sizeof);

program. You're not writing the data out at all, you're writing a class reference. The 00913FC0 is just the memory address of the class instance that mtt points to, and when you read that address back in, you're just looking at the data in memory. This program wouldn't work if you write the file, exited, then had another program that read the data. You'd end up with a memory access violation, and none of the data in the class is actually written out. If you want to write a class out to a file, a common way is to have some kind of generic "serialize" and "unserialize" functions for the class: class C { ulong i; char[] s; void serialize(Stream s) { s.write(i); s.write(s); } static C unserialize(Stream s) { C c = new C(); s.read(c.i); s.read(c.s); return c; } } ... C c = new C(); c.i = 5; c.s = "foo"; c.serialize(myFile); ... C c = C.unserialize(myFile);
 All of these methods works perfect. I'm able to retrieve values from all 
 of them. Now lets check at the outputs:

 // Primitive

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Structure

 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00

 // Class

 C0 3F 91 00

 My questions are:

 1) What's the best method to write data (in terms of data 
 protection/encryption against reversion). The class way seems to me at 
 first look the most secure way.

going on here. It's just a memory address, and you should never, ever write memory addresses to a file. That being said, the best way is probably to just use the primitive .read and .write methods of File. Just .. never, ever write pointers or references of any kind to a file.
 2) Wich method is the faster in retrieving data?

exact same output file using the same number of writes (and read it in the same number of reads), and so they are all the same in terms of performance.

Wow, that covers all, thanks for your reply. But, can i still write an entire structure with writeExact()? or you suggest writting each member of the structure with write()? Another question: Writting a type char[] with write() writes string as ASCII? if so then is a legible string, how can i protect that data? Thanks man

Well technically it will write it as UTF8, which is as near to ASCII as makes no nevermind. If you don't want it readable (and this is a binary file anyway) you could just use some simple reversable encryption algorithm. Something like this for a silly random. <code> module silly; import tango .io .Stdout ; struct SillyCrypt { alias process opCall ; static const CHUNK_SIZE = 32_U ; static const ROT = 16_U ; static const XOR = 24_U ; static char[] process (char[] src) { char[] result ; foreach (ch; chunks(src)) { result ~= mutate(ch); } return result; } private static char[][] chunks (char[] x) { char[] source = x ; char[][] result ; while (source.length >= CHUNK_SIZE) { result ~= source[0 .. CHUNK_SIZE] ; source = source[CHUNK_SIZE .. $ ] ; } if (source.length) { result ~= source; } return result; } private static char[] mutate (char[] x) { char[] result ; if (x.length > ROT) { result = x[ROT .. $] ~ x[0 .. ROT]; } else { result = x.dup; } foreach (inout c; result) { c ^= XOR; } return result; } } const SOURCE = "I would say hello to you, but you couldn't read it even if I did."c ; void main () { auto enc = SillyCrypt(SOURCE) ; auto dec = SillyCrypt(enc ) ; Stdout ("Source -> "c)(SOURCE).newline() ("Encrypt -> "c)(enc ).newline() ("Decrypt -> "c)(dec ).newline() .flush ; } </code> The output when I tried it was this: Source -> I would say hello to you, but you couldn't read it even if I did. Encrypt -> w8lw8awm48zml8awQ8owmt|8kya8p}ttql8}n}v8q~8Q8|q|m8{wmt|v?l8j}y|86 Decrypt -> I would say hello to you, but you couldn't read it even if I did. I know I don't personally know anyone who can read "w8lw8awm48zml8awQ8owmt|8kya8p}ttql8}n}v8q~8Q8|q|m8{wmt|v?l8j}y|86" at all. :) -- Chris Nicholson-Sauls
Jan 20 2007
prev sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Heinz" <billgates microsoft.com> wrote in message 
news:eoualo$1rcv$1 digitaldaemon.com...
 Wow, that covers all, thanks for your reply.

 But, can i still write an entire structure with writeExact()? or you 
 suggest writting each member of the structure with write()?

Yeah, that's perfectly fine as long as the structure doesn't contain any reference members (pointers, class references, dynamic arrays). Binary files a lot of times have some kind of standard header which can be written or read in one big chunk, which is possible to do with a structure. But if the structure contains any reference members, writing it out with writeExact will not work, and you'll have to write out the members manually.
Jan 20 2007
prev sibling next sibling parent reply Heinz <billgates microsoft.com> writes:
In C++ you can write an entire structure to a binary file:

http://www.gamedev.net/reference/articles/article1127.asp
http://www.codersource.net/cpp_file_io_binary.html

Can you do the same in D?
Jan 20 2007
parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Heinz wrote:
 In C++ you can write an entire structure to a binary file:
 
 http://www.gamedev.net/reference/articles/article1127.asp
 http://www.codersource.net/cpp_file_io_binary.html
 
 Can you do the same in D?

Sure, and it will work between instances of the program so long as none of the structure's members are referances: pointers, object variables, arrays. -- Chris Nicholson-Sauls
Jan 20 2007
next sibling parent reply Heinz <billgates microsoft.com> writes:
Chris Nicholson-Sauls Wrote:

 Heinz wrote:
 In C++ you can write an entire structure to a binary file:
 
 http://www.gamedev.net/reference/articles/article1127.asp
 http://www.codersource.net/cpp_file_io_binary.html
 
 Can you do the same in D?

Sure, and it will work between instances of the program so long as none of the structure's members are referances: pointers, object variables, arrays. -- Chris Nicholson-Sauls

So, you mean i can't have this structure because i has an array? struct h { } Could you post an example please?
Jan 20 2007
parent janderson <askme me.com> writes:
Heinz wrote:
 Chris Nicholson-Sauls Wrote:
 
 Heinz wrote:
 In C++ you can write an entire structure to a binary file:

 http://www.gamedev.net/reference/articles/article1127.asp
 http://www.codersource.net/cpp_file_io_binary.html

 Can you do the same in D?

members are referances: pointers, object variables, arrays. -- Chris Nicholson-Sauls

So, you mean i can't have this structure because i has an array? struct h { } Could you post an example please?

Right, because these arrays are essentially a pointer to data somewhere else, they don't exist in the same block of memory. To do it automatically you would need some form of metadata (which would identify pointers) or something like serialization (which handled each element on its own). In D and C++ you can read a block like below in one go: struct h { int x; int y; char a; char b[100]; //Note because this is constant its included in this block. }; However you can't write something like this in D or C++: struct h { int x; int y; char a; char* b; //This is pointing elsewhere in memory. You'll need to fix this pointer up when you read it in. }; Since D dynamic arrays are really: struct Darray { size_t length; T* type; //Pointer to some location }; You can't save these out inside a struct. You need to save the data it points to as well. -Joel
Jan 20 2007
prev sibling parent reply Heinz <billgates microsoft.com> writes:
Chris Nicholson-Sauls Wrote:

 Heinz wrote:
 In C++ you can write an entire structure to a binary file:
 
 http://www.gamedev.net/reference/articles/article1127.asp
 http://www.codersource.net/cpp_file_io_binary.html
 
 Can you do the same in D?

Sure, and it will work between instances of the program so long as none of the structure's members are referances: pointers, object variables, arrays. -- Chris Nicholson-Sauls

What about classes can they be written under the same rules?
Jan 20 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Heinz" <billgates microsoft.com> wrote in message 
news:eoug19$22to$1 digitaldaemon.com...
 Chris Nicholson-Sauls Wrote:

 What about classes can they be written under the same rules?

Class instances == object variables. All instances of classes are references (pointers) implicitly.
Jan 20 2007
prev sibling parent reply "Frank Benoit (keinfarbton)" <benoit tionex.removethispart.de> writes:
You can take a look at the source of a serialisation library. E.g. see
this thread:
"serialization library" in the group D.announce on 8th Nov 2006
Jan 20 2007
parent "Christian Kamm" <kamm nospam.de> writes:
On Sun, 21 Jan 2007 03:15:38 +0100, Frank Benoit (keinfarbton)  
<benoit tionex.removethispart.de> wrote:

 You can take a look at the source of a serialisation library. E.g. see
 this thread:
 "serialization library" in the group D.announce on 8th Nov 2006

Up to date versions of that library are found at http://www.dsource.org/projects/serialization Christian
Jan 25 2007