www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Binary I/O for Newbie

reply "tjb" <broughtj gmail.com> writes:
All,

I am just starting to learn D.  I am an economist - not a 
programmer, so I appreciate your patience with lack of knowledge.

I have some financial data in a binary file that I would like to 
process. In C++ I have the data in a structure like this:

struct TaqIdx {
   char symbol[10];
   int tdate;
   int begrec;
   int endrec;
}

And I use an ifstream to cast the data to the structure in read.  
I'm struggling to get a handle on I/O in D.  Can you give some 
pointers?  Thanks so much!

TJB
Feb 27 2012
next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 27 Feb 2012 12:21:21 -0600, tjb <broughtj gmail.com> wrote:

 All,

 I am just starting to learn D.  I am an economist - not a programmer, so  
 I appreciate your patience with lack of knowledge.

 I have some financial data in a binary file that I would like to  
 process. In C++ I have the data in a structure like this:

 struct TaqIdx {
    char symbol[10];
    int tdate;
    int begrec;
    int endrec;
 }

 And I use an ifstream to cast the data to the structure in read.  I'm  
 struggling to get a handle on I/O in D.  Can you give some pointers?   
 Thanks so much!

 TJB
This is about the simplest way to read binary data in: auto data = cast(TaqIdx[]) std.file.read(filename);
Feb 27 2012
prev sibling next sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Mon, 27 Feb 2012 19:21:21 +0100, tjb wrote:

 All,
 
 I am just starting to learn D.  I am an economist - not a programmer, so
 I appreciate your patience with lack of knowledge.
 
 I have some financial data in a binary file that I would like to
 process. In C++ I have the data in a structure like this:
 
 struct TaqIdx {
    char symbol[10];
    int tdate;
    int begrec;
    int endrec;
 }
 
 And I use an ifstream to cast the data to the structure in read. I'm
 struggling to get a handle on I/O in D.  Can you give some pointers? 
 Thanks so much!
 
 TJB
Check out std.stream (http://dlang.org/phobos/std_stream.html). I'd do something like this: auto input = new File("somefile"); TagIdx tag; input.readExact( &tag, TagIdx.sizeof ); If you get funky results, the file might be using a different endianness.
Feb 27 2012
parent reply Tobias Brandt <tob.brandt googlemail.com> writes:
Doesn't the struct alignment play a role here?
Feb 27 2012
parent reply Justin Whear <justin economicmodeling.com> writes:
On Mon, 27 Feb 2012 19:42:36 +0100, Tobias Brandt wrote:

 Doesn't the struct alignment play a role here?
Good point. If the data is packed, you can toss an align(1) on the front of the struct declaration.
Feb 27 2012
parent reply "tjb" <broughtj gmail.com> writes:
On Monday, 27 February 2012 at 18:56:15 UTC, Justin Whear wrote:
 On Mon, 27 Feb 2012 19:42:36 +0100, Tobias Brandt wrote:

 Doesn't the struct alignment play a role here?
Good point. If the data is packed, you can toss an align(1) on the front of the struct declaration.
So, something like this should work: import std.stdio : writeln, writefln; import std.stream; align(1) struct TaqIdx { char[10] symbol; int tdate; int begrec; int endrec; } void main() { auto input = new File("T200808A.IDX"); TaqIdx taq; input.readExact(&taq, TaqIdx.sizeof); writefln("%s %s %s %s", taq.symbol, taq.tdate, taq.begrec, taq.endrec); } Thanks so much! TJB
Feb 27 2012
parent reply Tobias Brandt <tob.brandt googlemail.com> writes:
 So, something like this should work:
 [...]
It really depends on how you wrote the file originally. If you know that it is packed, i.e. 10+32+32+32=106 bytes per record, then yes. If you wrote to the file with a C++ program, then I guess the compiler aligned the data so that the whole struct is 128 bytes in size. Technically, the C++ compiler is allowed to do anything short of changing the order of the struct fields. You could just let your C++ program print out sizeof(TaqIdx) or manually divide the file size by the number of records (if you know it) to make sure.
Feb 27 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/27/2012 11:27 AM, Tobias Brandt wrote:
 So, something like this should work:
 [...]
It really depends on how you wrote the file originally. If you know that it is packed, i.e. 10+32+32+32=106 bytes per record, then yes.
You meant 4 bytes per int. :)
 If you wrote to the file with a C++ program, then I guess the
 compiler aligned the data so that the whole struct is 128 bytes
 in size. Technically, the C++ compiler is allowed to do
 anything short of changing the order of the struct fields.
That is correct for non-POD types. The C++ compiler must treat POD structs essentially as if they are C structs. Ali
Feb 27 2012
parent reply Tobias Brandt <tob.brandt googlemail.com> writes:
 It really depends on how you wrote the file originally. If you
 know that it is packed, i.e. 10+32+32+32=106 bytes per record,
 then yes.
You meant 4 bytes per int. :)
Yep, good catch.
 If you wrote to the file with a C++ program, then I guess the
 compiler aligned the data so that the whole struct is 128 bytes
 in size. Technically, the C++ compiler is allowed to do
 anything short of changing the order of the struct fields.
 That is correct for non-POD types. The C++ compiler must treat
 POD structs essentially as if they are C structs.
Correct me if I'm wrong. But as far a I know the C standard also allows arbitrary alignment.
Feb 27 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/27/2012 11:43 AM, Tobias Brandt wrote:

 If you wrote to the file with a C++ program, then I guess the
 compiler aligned the data so that the whole struct is 128 bytes
 in size. Technically, the C++ compiler is allowed to do
 anything short of changing the order of the struct fields.
 That is correct for non-POD types. The C++ compiler must treat
 POD structs essentially as if they are C structs.
Correct me if I'm wrong. But as far a I know the C standard also allows arbitrary alignment.
You were correct. I somehow misread "short of changing the order" as meaning "even changing the order". But even then I wasn't entirely correct. Just found this thread: http://stackoverflow.com/q/281045 C guarantees that the members are not reordered, but C++ allows reordering by "The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1)." Ali
Feb 27 2012
prev sibling parent reply "tjb" <broughtj gmail.com> writes:
On Monday, 27 February 2012 at 19:28:07 UTC, Tobias Brandt wrote:
 So, something like this should work:
 [...]
It really depends on how you wrote the file originally. If you know that it is packed, i.e. 10+32+32+32=106 bytes per record, then yes. If you wrote to the file with a C++ program, then I guess the compiler aligned the data so that the whole struct is 128 bytes in size. Technically, the C++ compiler is allowed to do anything short of changing the order of the struct fields. You could just let your C++ program print out sizeof(TaqIdx) or manually divide the file size by the number of records (if you know it) to make sure.
Just looked at my old C++ code. And the struct looks like this: struct TaqIdx { char symbol[10]; int tdate; int begrec; int endrec; }__attribute__((packed)); So I am guessing I want to use the align(1) as Justin suggested. Correct? TJB
Feb 27 2012
parent Tobias Brandt <tob.brandt googlemail.com> writes:
 Just looked at my old C++ code. =A0And the struct looks like this:


 struct TaqIdx {
 =A0char symbol[10];
 =A0int tdate;
 =A0int begrec;
 =A0int endrec;
 }__attribute__((packed));

 So I am guessing I want to use the align(1) as Justin suggested. Correct?
Yes.
Feb 27 2012
prev sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/27/2012 10:21 AM, tjb wrote:

 I have some financial data in a binary file that I would like to
 process. In C++ I have the data in a structure like this:

 struct TaqIdx {
 char symbol[10];
 int tdate;
 int begrec;
 int endrec;
 }
The equivalent of that C++ (and C) struct would be almost the same in D. Just replace char with 'ubyte' (or 'byte', if the 'char' were signed). Let me stress the fact that character types of D are UTF code units, not bytes. struct TaqIdx { ubyte[10] symbol; int tdate; int begrec; int endrec; } On the other hand, if symbol were indeed in ASCII, and will be treated as such in the program, then char[10] is fine too: char[10] symbol; Also, the int type is always 32 bits in D. Check whether that was the case for the system that the C++ TaqIdx was used on. You can read in binary by std.stdio.rawRead. It is mildly annoying that you still have to use an array even when reading a single TaqIdx: TaqIdx[1] taqs; file.rawRead(taqs); Then use taqs[0] if there is only one.
 Can you give some pointers?
Must... resist... :p Ali
Feb 27 2012