www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - nested csv into tsv

reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
dear, i have this data:
________________________________
data1	data2	data3a;data3b;data3c
cata1	cata2	cata3a;cata3b;cata3c
tata1	tata2	tata3a;tata3b;tata3c
________________________________

field are sepaated by tab but third field contain data separeted by semi
colon

I have try:
________________________________
import std.csv;
import std.string;
import std.stdio;

struct Data{
    public:
        string field1;
        string field2;

     property void field3( string field ){
        _field3 =3D field.split(";");
    }
     property string[] field3(  ){
        return _field3;
    }

    private:
        string[] _field3;
}

void main(){
    Data[] result;
    File f =3D File( "data.csv", "r" );
    foreach( char[] line; f.byLine() ){
        result ~=3D csvReader!Data(line, '\t').front;
    }
}
________________________________


This build fine but do not works at runtime

________________________________
$ ./test_csv=20
std.csv.CSVException /usr/include/d/std/csv.d(1047): Can't parse string:
"[" is missing
std.conv.ConvException /usr/include/d/std/conv.d(2714): Can't parse
string: "[" is missing
std.conv.ConvException /usr/include/d/std/conv.d(1597): Unexpected 'd'
when converting from type string to type string[]
________________________________
Mar 18 2012
next sibling parent reply "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Sunday, 18 March 2012 at 14:45:42 UTC, bioinfornatics wrote:
 ________________________________
 $ ./test_csv
 std.csv.CSVException /usr/include/d/std/csv.d(1047): Can't 
 parse string:
 "[" is missing
 std.conv.ConvException /usr/include/d/std/conv.d(2714): Can't 
 parse
 string: "[" is missing
 std.conv.ConvException /usr/include/d/std/conv.d(1597): 
 Unexpected 'd'
 when converting from type string to type string[]
 ________________________________
I'm going to harbor a guess that you have confused std.conv.to by using two different types for field3 property void field3( string field ){ _field3 = field.split(";"); } property string[] field3( ){ return _field3; } The first says it is a string, the second a string[]. I assume that std.conv.to sees field3 as a string[] and is trying to convert a string to it. In this case it expects the string to be formatted, ["this is an","array","of string"]
Mar 18 2012
parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le dimanche 18 mars 2012 =C3=A0 16:53 +0100, Jesse Phillips a =C3=A9crit :
 On Sunday, 18 March 2012 at 14:45:42 UTC, bioinfornatics wrote:
 ________________________________
 $ ./test_csv
 std.csv.CSVException /usr/include/d/std/csv.d(1047): Can't=20
 parse string:
 "[" is missing
 std.conv.ConvException /usr/include/d/std/conv.d(2714): Can't=20
 parse
 string: "[" is missing
 std.conv.ConvException /usr/include/d/std/conv.d(1597):=20
 Unexpected 'd'
 when converting from type string to type string[]
 ________________________________
=20 I'm going to harbor a guess that you have confused std.conv.to by=20 using two different types for field3 =20 property void field3( string field ){ _field3 =3D field.split(";"); } property string[] field3( ){ return _field3; } =20 The first says it is a string, the second a string[]. I assume=20 that std.conv.to sees field3 as a string[] and is trying to=20 convert a string to it. In this case it expects the string to be=20 formatted, ["this is an","array","of string"]
If i do this: ________________________________
import std.csv;
import std.string; import std.stdio; struct Data{ public: string field1; string field2; property void field3( string field ){ _field3 =3D field.split(";"); } property string field3( ){ string result; foreach( item; _field3 ) result ~=3D " %s;".format( item ); return result; } property attributes(){ return _field3; } private: string[] _field3; } void main(){ Data[] result; File f =3D File( "data.csv", "r" ); foreach( char[] line; f.byLine() ){ result ~=3D csvReader!Data(line, '\t').front; } } ________________________________ Same result
Mar 18 2012
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/18/2012 07:45 AM, bioinfornatics wrote:
 dear, i have this data:
 ________________________________
 data1	data2	data3a;data3b;data3c
 cata1	cata2	cata3a;cata3b;cata3c
 tata1	tata2	tata3a;tata3b;tata3c
 ________________________________

 field are sepaated by tab but third field contain data separeted by semi
 colon

 I have try:
 ________________________________
 import std.csv;
 import std.string;
 import std.stdio;

 struct Data{
      public:
          string field1;
          string field2;

       property void field3( string field ){
          _field3 = field.split(";");
      }
       property string[] field3(  ){
          return _field3;
      }
Besides the confusion that Jesse Phillips has pointed out, csvReader cannot decide to treat those two property functions as if they represent a member of Data.
      private:
          string[] _field3;
Data still has three members: field1, field2, and _field3. The problem is, although the format clearly states that there are three strings that are delimited by '\t', the third field of the struct is not a string.
 }

 void main(){
      Data[] result;
      File f = File( "data.csv", "r" );
      foreach( char[] line; f.byLine() ){
          result ~= csvReader!Data(line, '\t').front;
      }
 }
So the solution is that _field3 must be a string: import std.csv; import std.string; import std.stdio; struct Data{ public: string field1; string field2; private: string _field3; } void main(){ Data[] result; File f = File( "data.csv", "r" ); foreach( char[] line; f.byLine() ){ result ~= csvReader!Data(line, '\t').front; } writeln(result); } You must provide the properties on top of that: import std.csv; import std.string; import std.stdio; struct Data{ public: string field1; string field2; void field3( string[] field ) property { _field3 = field.join(); } string[] field3( ) property { return _field3.split(";"); } string toString() { return format("%s,%s,%s", field1, field2, field3); } private: string _field3; } void main(){ Data[] result; File f = File( "data.csv", "r" ); foreach( char[] line; f.byLine() ){ result ~= csvReader!Data(line, '\t').front; } writeln(result); } Note that to avoid confusing the readers, the property functions both use string[], not string. (I've also put property at the end of the function signature, which I started to favor recently.) The optimizations can come after that. The following calls split() only whene necessary: import std.csv; import std.string; import std.stdio; struct Data{ public: string field1; string field2; void field3( string[] field ) property { _field3 = field; _raw_field3 = null; } string[] field3( ) property { if (_raw_field3 !is null) { _field3 = _raw_field3.split(";"); } return _field3; } string toString() { return format("%s,%s,%s", field1, field2, field3); } private: string _raw_field3; string[] _field3; } void main(){ Data[] result; File f = File( "data.csv", "r" ); foreach( char[] line; f.byLine() ){ result ~= csvReader!Data(line, '\t').front; } writeln(result); } Ali
Mar 18 2012
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le dimanche 18 mars 2012 =C3=A0 09:53 -0700, Ali =C3=87ehreli a =C3=A9crit =
:
 On 03/18/2012 07:45 AM, bioinfornatics wrote:
  > dear, i have this data:
  > ________________________________
  > data1	data2	data3a;data3b;data3c
  > cata1	cata2	cata3a;cata3b;cata3c
  > tata1	tata2	tata3a;tata3b;tata3c
  > ________________________________
  >
  > field are sepaated by tab but third field contain data separeted by se=
mi
  > colon
  >
  > I have try:
  > ________________________________
  > import std.csv;
  > import std.string;
  > import std.stdio;
  >
  > struct Data{
  >      public:
  >          string field1;
  >          string field2;
  >
  >       property void field3( string field ){
  >          _field3 =3D field.split(";");
  >      }
  >       property string[] field3(  ){
  >          return _field3;
  >      }
=20
 Besides the confusion that Jesse Phillips has pointed out, csvReader=20
 cannot decide to treat those two property functions as if they represent=
=20
 a member of Data.
=20
  >
  >      private:
  >          string[] _field3;
=20
 Data still has three members: field1, field2, and _field3.
=20
 The problem is, although the format clearly states that there are three=
=20
 strings that are delimited by '\t', the third field of the struct is not=
=20
 a string.
=20
  > }
  >
  > void main(){
  >      Data[] result;
  >      File f =3D File( "data.csv", "r" );
  >      foreach( char[] line; f.byLine() ){
  >          result ~=3D csvReader!Data(line, '\t').front;
  >      }
  > }
=20
 So the solution is that _field3 must be a string:
=20
 import std.csv;
 import std.string;
 import std.stdio;
=20
 struct Data{
      public:
          string field1;
          string field2;
=20
      private:
          string _field3;
 }
=20
 void main(){
      Data[] result;
      File f =3D File( "data.csv", "r" );
      foreach( char[] line; f.byLine() ){
          result ~=3D csvReader!Data(line, '\t').front;
      }
=20
      writeln(result);
 }
=20
 You must provide the properties on top of that:
=20
 import std.csv;
 import std.string;
 import std.stdio;
=20
 struct Data{
      public:
          string field1;
          string field2;
=20
      void field3( string[] field )  property {
          _field3 =3D field.join();
      }
=20
      string[] field3(  )  property {
          return _field3.split(";");
      }
=20
      string toString() {
          return format("%s,%s,%s", field1, field2, field3);
      }
=20
      private:
          string _field3;
 }
=20
 void main(){
      Data[] result;
      File f =3D File( "data.csv", "r" );
      foreach( char[] line; f.byLine() ){
          result ~=3D csvReader!Data(line, '\t').front;
      }
=20
      writeln(result);
 }
=20
 Note that to avoid confusing the readers, the property functions both=20
 use string[], not string. (I've also put  property at the end of the=20
 function signature, which I started to favor recently.)
=20
 The optimizations can come after that. The following calls split() only=
=20
 whene necessary:
=20
 import std.csv;
 import std.string;
 import std.stdio;
=20
 struct Data{
      public:
          string field1;
          string field2;
=20
      void field3( string[] field )  property {
          _field3 =3D field;
          _raw_field3 =3D null;
      }
=20
      string[] field3(  )  property {
          if (_raw_field3 !is null) {
              _field3 =3D _raw_field3.split(";");
          }
          return _field3;
      }
=20
      string toString() {
          return format("%s,%s,%s", field1, field2, field3);
      }
=20
      private:
          string _raw_field3;
          string[] _field3;
 }
=20
 void main(){
      Data[] result;
      File f =3D File( "data.csv", "r" );
      foreach( char[] line; f.byLine() ){
          result ~=3D csvReader!Data(line, '\t').front;
      }
=20
      writeln(result);
 }
=20
 Ali
=20
Very interesing big thanks for this snippet code
Mar 18 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Bug fix release: :)

On 03/18/2012 10:13 AM, bioinfornatics wrote:
 Le dimanche 18 mars 2012 à 09:53 -0700, Ali Çehreli a écrit :
       void field3( string[] field )  property {
           _field3 = field.join();
I think that should have been field.join(";"). (But join() is not used in the final version of the program anyway.)
       string[] field3(  )  property {
           if (_raw_field3 !is null) {
               _field3 = _raw_field3.split(";");
This line must be added so that split() is not called every time: _raw_field3 = null;
           }
           return _field3;
       }
Ali
Mar 18 2012