digitalmars.D.learn - Best way to read CSV data file into Mir (2d array) ndslice?

mw (6/6) Sep 20 2022 Hi,

jmh530 (10/16) Sep 21 2022 It probably can't hurt to try the simplest approach first.

jmh530 (35/55) Sep 21 2022 I just tried doing it with `std.csv`, but my version was a bit

mw (7/13) Sep 21 2022 Thanks, as you said this isn't ideal.

mw <mingwu gmail.com> writes:

Hi,

I'm just wondering what is the best way to read CSV data file 
into Mir (2d array) ndslice? Esp. if it can parse date into 
int/float.

I searched a bit, but can't find any example.


Thanks.

Sep 20 2022

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:
 Hi,

 I'm just wondering what is the best way to read CSV data file 
 into Mir (2d array) ndslice? Esp. if it can parse date into 
 int/float.

 I searched a bit, but can't find any example.


 Thanks.

It probably can't hurt to try the simplest approach first. 
`std.csv` can return an input range that you can then use to 
create a ndslice. Offhand, I don't know what D tools are an 
alternative to `std.csv` for reading CSVs.

ndslice assumes homogenous data, but you can put the Dates (as 
Date types) as part of the labels (as Data Frames). However, 
there's a bit to be desired in terms of getting that 
functionality integrated into the rest of the package [1].

[1] https://github.com/libmir/mir-algorithm/issues/426

Sep 21 2022

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 21 September 2022 at 13:08:14 UTC, jmh530 wrote:
 On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:
 Hi,

 I'm just wondering what is the best way to read CSV data file 
 into Mir (2d array) ndslice? Esp. if it can parse date into 
 int/float.

 I searched a bit, but can't find any example.


 Thanks.

 It probably can't hurt to try the simplest approach first. 
 `std.csv` can return an input range that you can then use to 
 create a ndslice. Offhand, I don't know what D tools are an 
 alternative to `std.csv` for reading CSVs.

 ndslice assumes homogenous data, but you can put the Dates (as 
 Date types) as part of the labels (as Data Frames). However, 
 there's a bit to be desired in terms of getting that 
 functionality integrated into the rest of the package [1].

 [1] https://github.com/libmir/mir-algorithm/issues/426

I just tried doing it with `std.csv`, but my version was a bit 
awkward since it doesn't seem quite so straightforward to just 
take the result of csvReader and put it in a array. I had to read 
it in there. I also wanted to allocate the array up front, but to 
do that I needed to know how big it was and ended up doing two 
passes on reading the data, which isn't ideal.

```d
import std.csv;
import std.stdio: writeln;
import mir.ndslice.allocation: slice;

void main() {
     string text = 
"date,x1,x2\n1/31/2010,65,2.5\n2/28/2010,123,7.5";
     auto records_firstpass = text.csvReader!double(["x1","x2"]);
     auto records_secondpass = text.csvReader!double(["x1","x2"]);
     size_t len = 0;
     foreach (record; records_firstpass) {
         len++;
     }
     auto data = slice!double(len, 2);
     size_t i = 0;
     size_t j;
     foreach (record; records_secondpass)
     {
         j = 0;
         foreach (r; record) {
             data[i, j] = r;
             j++;
         }
         i++;
     }
     writeln(data);
}
```

Sep 21 2022

mw <mingwu gmail.com> writes:

On Wednesday, 21 September 2022 at 19:14:30 UTC, jmh530 wrote:
 I just tried doing it with `std.csv`, but my version was a bit 
 awkward since it doesn't seem quite so straightforward to just 
 take the result of csvReader and put it in a array. I had to 
 read it in there. I also wanted to allocate the array up front, 
 but to do that I needed to know how big it was and ended up 
 doing two passes on reading the data, which isn't ideal.

Thanks, as you said this isn't ideal.

For Mir to catch up with numpy, being able to easily read CSV to 
import data is a must to attract data scientists.

In numpy/pandas, it's just *one* liner.

I logged an issue here as a feature request:

https://github.com/libmir/mir-algorithm/issues/442

Sep 21 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Best way to read CSV data file into Mir (2d array) ndslice?