www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - CSV Data to Binary File

reply "TJB" <broughtj gmail.com> writes:
I am trying to read data in from a csv file into a struct, and 
then turn around and write that data to binary format.

Here is my code:

import std.algorithm;
import std.csv;
import stdio = std.stdio;
import std.stream;

align(1) struct QuotesBin
{
   int qtim;9   int bid;
   int ofr;
   int bidsiz;
   int ofrsiz;
   short mode;
   char[1] ex;
   char[4] mmid;
}

void main()
{
   string infile = "temp.csv";
   string outfile = "temp.bin";
   Stream fin = new BufferedFile(infile);
   Stream fout = new BufferedFile(outfile, FileMode.Out);

   foreach(ulong n, char[] line; fin)
   {
     auto record = csvReader!QuotesBin(line).front;
     fout.writeExact(&record, QuotesBin.sizeof);
   }

   fin.close();
   fout.close();
}

Here is a snippet of my csv data:

34220, 370000, 371200, 1, 1, 12, N,
34220, 369000, 372500, 1, 11, 12, P,
34220, 370000, 371200, 1, 2, 12, N,
34220, 370000, 371100, 1, 33, 12, N,
34220, 369400, 371100, 6, 3, 12, P,
34220, 370000, 371200, 1, 2, 12, N,
34220, 369300, 371200, 9, 2, 12, N,
34220, 369300, 371200, 5, 2, 12, N,
34220, 368900, 371200, 13, 2, 12, N,
34220, 368900, 371100, 13, 1, 12, N,

For some reason this fails miserably. Can anyone help me out as 
to why? What do I need to do differently?

Thanks,
TJB
Aug 07 2014
next sibling parent reply "TJB" <broughtj gmail.com> writes:
On Thursday, 7 August 2014 at 15:11:48 UTC, TJB wrote:
 I am trying to read data in from a csv file into a struct, and 
 then turn around and write that data to binary format.

 Here is my code:

 import std.algorithm;
 import std.csv;
 import stdio = std.stdio;
 import std.stream;

 align(1) struct QuotesBin
 {
   int qtim;9   int bid;
   int ofr;
   int bidsiz;
   int ofrsiz;
   short mode;
   char[1] ex;
   char[4] mmid;
 }

 void main()
 {
   string infile = "temp.csv";
   string outfile = "temp.bin";
   Stream fin = new BufferedFile(infile);
   Stream fout = new BufferedFile(outfile, FileMode.Out);

   foreach(ulong n, char[] line; fin)
   {
     auto record = csvReader!QuotesBin(line).front;
     fout.writeExact(&record, QuotesBin.sizeof);
   }

   fin.close();
   fout.close();
 }

 Here is a snippet of my csv data:

 34220, 370000, 371200, 1, 1, 12, N,
 34220, 369000, 372500, 1, 11, 12, P,
 34220, 370000, 371200, 1, 2, 12, N,
 34220, 370000, 371100, 1, 33, 12, N,
 34220, 369400, 371100, 6, 3, 12, P,
 34220, 370000, 371200, 1, 2, 12, N,
 34220, 369300, 371200, 9, 2, 12, N,
 34220, 369300, 371200, 5, 2, 12, N,
 34220, 368900, 371200, 13, 2, 12, N,
 34220, 368900, 371100, 13, 1, 12, N,

 For some reason this fails miserably. Can anyone help me out as 
 to why? What do I need to do differently?

 Thanks,
 TJB
Some of the code got messed up when I pasted. Should be: align(1) struct QuotesBin { int qtim; int bid; int ofr; int bidsiz; int ofrsiz; short mode; char[1] ex; char[4] mmid; } Thanks!
Aug 07 2014
parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 7 August 2014 at 15:14:00 UTC, TJB wrote:
 align(1) struct QuotesBin
 {
   int qtim;
   int bid;
   int ofr;
   int bidsiz;
   int ofrsiz;
   short mode;
   char[1] ex;
   char[4] mmid;
 }

 Thanks!
(You forgot to include the error. For other readers: It fails to compile with "template std.conv.toImpl cannot deduce function from argument types !(char[4])(string)" and similar error messages.) This is caused by the two `char` arrays. `std.conv.to` cannot convert strings to fixed-size char arrays, probably because it's not clear what should happen if the input string is too long or too short. Would it be a good idea to support this? As a workaround, you could declare a second struct with the same members, but `ex` and `mmid` as strings, read your data into these, and assign it to the first structure: import std.algorithm; import std.csv; import stdio = std.stdio; import std.stream; align(1) struct QuotesBinDummy { int qtim; int bid; int ofr; int bidsiz; int ofrsiz; short mode; string ex; string mmid; } align(1) struct QuotesBin { int qtim; int bid; int ofr; int bidsiz; int ofrsiz; short mode; char[1] ex; char[4] mmid; } void main() { string infile = "temp.csv"; string outfile = "temp.bin"; Stream fin = new BufferedFile(infile); Stream fout = new BufferedFile(outfile, FileMode.Out); foreach(ulong n, char[] line; fin) { auto temp = csvReader!QuotesBinDummy(line).front; QuotesBin record; record.tupleof = temp.tupleof; fout.writeExact(&record, QuotesBin.sizeof); } fin.close(); fout.close(); } The line "record.tupleof = temp.tupleof;" will however fail with your example data, because the `ex` field includes a space in the CSV, and the last field is empty, but needs to be 4 chars long.
Aug 07 2014
parent reply "TJB" <broughtj gmail.com> writes:
Thanks Marc. Not sure what to do here. I need to the binary data 
to be exactly the number of bytes as specified by the struct.

How to handle the conversion from string to char[]?

TJB

 (You forgot to include the error. For other readers: It fails 
 to compile with "template std.conv.toImpl cannot deduce 
 function from argument types !(char[4])(string)" and similar 
 error messages.)

 This is caused by the two `char` arrays. `std.conv.to` cannot 
 convert strings to fixed-size char arrays, probably because 
 it's not clear what should happen if the input string is too 
 long or too short.

 Would it be a good idea to support this?

 As a workaround, you could declare a second struct with the 
 same members, but `ex` and `mmid` as strings, read your data 
 into these, and assign it to the first structure:

 import std.algorithm;
 import std.csv;
 import stdio = std.stdio;
 import std.stream;

 align(1) struct QuotesBinDummy
 {
   int qtim;
   int bid;
   int ofr;
   int bidsiz;
   int ofrsiz;
   short mode;
   string ex;
   string mmid;
 }

 align(1) struct QuotesBin
 {
   int qtim;
   int bid;
   int ofr;
   int bidsiz;
   int ofrsiz;
   short mode;
   char[1] ex;
   char[4] mmid;
 }


 void main()
 {
   string infile = "temp.csv";
   string outfile = "temp.bin";
   Stream fin = new BufferedFile(infile);
   Stream fout = new BufferedFile(outfile, FileMode.Out);

   foreach(ulong n, char[] line; fin)
   {
     auto temp = csvReader!QuotesBinDummy(line).front;
     QuotesBin record;
     record.tupleof = temp.tupleof;
     fout.writeExact(&record, QuotesBin.sizeof);
   }

   fin.close();
   fout.close();
 }

 The line "record.tupleof = temp.tupleof;" will however fail 
 with your example data, because the `ex` field includes a space 
 in the CSV, and the last field is empty, but needs to be 4 
 chars long.
Aug 07 2014
next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 7 August 2014 at 16:08:01 UTC, TJB wrote:
 Thanks Marc. Not sure what to do here. I need to the binary 
 data to be exactly the number of bytes as specified by the 
 struct.

 How to handle the conversion from string to char[]?
Well, in your CSV data, they don't have the right length, so you have to decide how to handle that. The easiest way would be to set the length. This will fill up the string with "\0" bytes if it is to short: align(1) struct QuotesBin { int qtim; int bid; int ofr; int bidsiz; int ofrsiz; short mode; char[1] ex; char[4] mmid; this(const QuotesBinDummy rhs) { this.qtim = rhs.qtim; this.bid = rhs.bid; this.ofr = rhs.ofr; this.bidsiz = rhs.bidsiz; this.ofrsiz = rhs.ofrsiz; this.mode = rhs.mode; string tmp; tmp = rhs.ex; tmp.length = this.ex.length; this.ex = tmp; tmp = rhs.mmid; tmp.length = this.mmid.length; this.mmid = tmp; } } ... auto temp = csvReader!QuotesBinDummy(line).front; QuotesBin record = temp; fout.writeExact(&record, QuotesBin.sizeof); ... But of course, whether this is correct depends on whether your binary format allows it, or requires all chars to be non-zero ASCII values.
Aug 07 2014
prev sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 7 August 2014 at 16:08:01 UTC, TJB wrote:
 Thanks Marc. Not sure what to do here. I need to the binary 
 data to be exactly the number of bytes as specified by the 
 struct.
Something else: The `align(1)` on your type definition specifies the alignment of the entire struct, but has no effect on the alignment of its fields relative to the beginning. Your probably want this: align(1) struct QuotesBin { align(1): int qtim; int bid; int ofr; int bidsiz; int ofrsiz; short mode; char[1] ex; char[4] mmid; } This align the struct as a whole, and all its fields at byte boundaries. Without the second `align(1)`, there should be a gap between `mode` and `ex`. Strangely enough, when I test it, there's none. Will have to ask...
Aug 07 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 7 August 2014 at 17:12:35 UTC, Marc Schütz wrote:
 This align the struct as a whole, and all its fields at byte 
 boundaries. Without the second `align(1)`, there should be a 
 gap between `mode` and `ex`. Strangely enough, when I test it, 
 there's none. Will have to ask...
Sorry, should have been `ex` and `mmid`. I've posted my question here: http://forum.dlang.org/post/bkearrybmwguqrliexsw forum.dlang.org
Aug 07 2014
prev sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Thursday, 7 August 2014 at 15:11:48 UTC, TJB wrote:
 Here is a snippet of my csv data:

 34220, 370000, 371200, 1, 1, 12, N,
 34220, 369000, 372500, 1, 11, 12, P,
 34220, 370000, 371200, 1, 2, 12, N,
I can't help but think somehow that as long as the data is numbers or words, that scanf would be useful even if it's a C function... Someone mentioned the final empty field, this makes me scratch my head... And a struct of exactly 27 bytes... i'd probably pad that to 28 or 32 if possible which allows you to expand your definition later... not to mention being 32bit aligned (if speed becomes important)
Aug 07 2014