www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Best way to read CSV data file into Mir (2d array) ndslice?

reply mw <mingwu gmail.com> writes:
Hi,

I'm just wondering what is the best way to read CSV data file 
into Mir (2d array) ndslice? Esp. if it can parse date into 
int/float.

I searched a bit, but can't find any example.


Thanks.
Sep 20 2022
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:
 Hi,

 I'm just wondering what is the best way to read CSV data file 
 into Mir (2d array) ndslice? Esp. if it can parse date into 
 int/float.

 I searched a bit, but can't find any example.


 Thanks.
It probably can't hurt to try the simplest approach first. `std.csv` can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to `std.csv` for reading CSVs. ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1]. [1] https://github.com/libmir/mir-algorithm/issues/426
Sep 21 2022
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 21 September 2022 at 13:08:14 UTC, jmh530 wrote:
 On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:
 Hi,

 I'm just wondering what is the best way to read CSV data file 
 into Mir (2d array) ndslice? Esp. if it can parse date into 
 int/float.

 I searched a bit, but can't find any example.


 Thanks.
It probably can't hurt to try the simplest approach first. `std.csv` can return an input range that you can then use to create a ndslice. Offhand, I don't know what D tools are an alternative to `std.csv` for reading CSVs. ndslice assumes homogenous data, but you can put the Dates (as Date types) as part of the labels (as Data Frames). However, there's a bit to be desired in terms of getting that functionality integrated into the rest of the package [1]. [1] https://github.com/libmir/mir-algorithm/issues/426
I just tried doing it with `std.csv`, but my version was a bit awkward since it doesn't seem quite so straightforward to just take the result of csvReader and put it in a array. I had to read it in there. I also wanted to allocate the array up front, but to do that I needed to know how big it was and ended up doing two passes on reading the data, which isn't ideal. ```d import std.csv; import std.stdio: writeln; import mir.ndslice.allocation: slice; void main() { string text = "date,x1,x2\n1/31/2010,65,2.5\n2/28/2010,123,7.5"; auto records_firstpass = text.csvReader!double(["x1","x2"]); auto records_secondpass = text.csvReader!double(["x1","x2"]); size_t len = 0; foreach (record; records_firstpass) { len++; } auto data = slice!double(len, 2); size_t i = 0; size_t j; foreach (record; records_secondpass) { j = 0; foreach (r; record) { data[i, j] = r; j++; } i++; } writeln(data); } ```
Sep 21 2022
parent mw <mingwu gmail.com> writes:
On Wednesday, 21 September 2022 at 19:14:30 UTC, jmh530 wrote:
 I just tried doing it with `std.csv`, but my version was a bit 
 awkward since it doesn't seem quite so straightforward to just 
 take the result of csvReader and put it in a array. I had to 
 read it in there. I also wanted to allocate the array up front, 
 but to do that I needed to know how big it was and ended up 
 doing two passes on reading the data, which isn't ideal.
Thanks, as you said this isn't ideal. For Mir to catch up with numpy, being able to easily read CSV to import data is a must to attract data scientists. In numpy/pandas, it's just *one* liner. I logged an issue here as a feature request: https://github.com/libmir/mir-algorithm/issues/442
Sep 21 2022