www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - about std.csv and derived format

reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
Dear,

I would like to parse this file:
http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt

struct Bed{
	string    chrom;	// 0
	size_t    chromStart;	// 1
	size_t    chromEnd;	// 2
	string    name;		// 3
	size_t    score;	// 4
	char      strand;	// 5
	size_t    thickStart;	// 6
	size_t    thickEnd;	// 7
	size_t[3] itemRgb;	// 8
        size_t    blockCount;	// 9
        size_t    blockSizes;	// 10
        size_t    blockStarts;	// 11
}

In more fields 3 to 11 are optional. Then you can have:
* field 0 - 3
* field 0 - 4
* field 0 - 5
... to 0 - 12
Feb 29 2012
next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le mercredi 29 f=C3=A9vrier 2012 =C3=A0 12:42 +0100, bioinfornatics a =C3=
=A9crit :
 Dear,
=20
 I would like to parse this file:
 http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt
=20
 struct Bed{
 	string    chrom;	// 0
 	size_t    chromStart;	// 1
 	size_t    chromEnd;	// 2
 	string    name;		// 3
 	size_t    score;	// 4
 	char      strand;	// 5
 	size_t    thickStart;	// 6
 	size_t    thickEnd;	// 7
 	size_t[3] itemRgb;	// 8
         size_t    blockCount;	// 9
         size_t    blockSizes;	// 10
         size_t    blockStarts;	// 11
 }
=20
 In more fields 3 to 11 are optional. Then you can have:
 * field 0 - 3
 * field 0 - 4
 * field 0 - 5
 ... to 0 - 12
=20

line 0 -> 2 into ItemRGBDemo.txt are metadata so they should be parsed by hand. browser position chr7:127471196-127495720 browser hide all track name=3D"ItemRGBDemo" description=3D"Item RGB demonstration" visibility=3D2 itemRgb=3D"On" My problem is: - need to parse data in csv format - how manage with optional field
Feb 29 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics 
wrote:
 Le mercredi 29 février 2012 à 12:42 +0100, bioinfornatics a 
 écrit :
 Dear,
 
 I would like to parse this file:
 http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt


 My problem is:
 - need to parse data in csv format
 - how manage with optional field

It looks like the data is tab delimited so separator is a tab. There are no optional fields in CSV, but you can disable exceptions. auto records = csvReader!(Bed,Malformed.ignore)(str,'\t');
Feb 29 2012
prev sibling next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le mercredi 29 f=C3=A9vrier 2012 =C3=A0 13:23 +0100, Jesse Phillips a =C3=
=A9crit :
 On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics=20
 wrote:
 Le mercredi 29 f=C3=A9vrier 2012 =C3=A0 12:42 +0100, bioinfornatics a=


 =C3=A9crit :
 Dear,
=20
 I would like to parse this file:
 http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt


 My problem is:
 - need to parse data in csv format
 - how manage with optional field

It looks like the data is tab delimited so separator is a tab.=20 There are no optional fields in CSV, but you can disable=20 exceptions. =20 auto records =3D csvReader!(Bed,Malformed.ignore)(str,'\t');

thanks jesse; how i can convert inputRange return type to Bed ? csvReader return a type that change dynamycally so if i use a template function the type is never same and i can't hard write a copy to Bed type. example if i use BedData3 or BedData4: ------------------------- struct BedData3{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 } struct BedData4{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 size_t score; // 4 } ------------------------ i have try to deal with ReturnType but i fail. paste https://gist.github.com/1946288 at line 294 bedReader take ane BedData3 tp 11 then at line 338 how get an array of record and store this array into struct bed line 192 thanks a lot
Feb 29 2012
prev sibling next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le jeudi 01 mars 2012 =C3=A0 01:52 +0100, bioinfornatics a =C3=A9crit :
 Le mercredi 29 f=C3=A9vrier 2012 =C3=A0 13:23 +0100, Jesse Phillips a =C3=

 On Wednesday, 29 February 2012 at 11:51:29 UTC, bioinfornatics=20
 wrote:
 Le mercredi 29 f=C3=A9vrier 2012 =C3=A0 12:42 +0100, bioinfornatics a=



 =C3=A9crit :
 Dear,
=20
 I would like to parse this file:
 http://genome.ucsc.edu/goldenPath/help/ItemRGBDemo.txt


 My problem is:
 - need to parse data in csv format
 - how manage with optional field

It looks like the data is tab delimited so separator is a tab.=20 There are no optional fields in CSV, but you can disable=20 exceptions. =20 auto records =3D csvReader!(Bed,Malformed.ignore)(str,'\t');

thanks jesse; =20 how i can convert inputRange return type to Bed ? csvReader return a type that change dynamycally so if i use a template function the type is never same and i can't hard write a copy to Bed type. example if i use BedData3 or BedData4: =20 ------------------------- struct BedData3{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 } =20 struct BedData4{ string chrom; // 0 size_t chromStart; // 1 size_t chromEnd; // 2 string name; // 3 size_t score; // 4 } ------------------------ =20 i have try to deal with ReturnType but i fail. =20 paste https://gist.github.com/1946288 =20 at line 294 bedReader take ane BedData3 tp 11 then at line 338 how get an array of record and store this array into struct bed line 192 =20 =20 thanks a lot =20

It is ok i have found a way maybe is not an efficient way but it works: https://gist.github.com/1946669 a minor bug exist for parse track line will be fixed tomorrow. time to bed Big thanks to all
Feb 29 2012
prev sibling next sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Thursday, 1 March 2012 at 02:07:44 UTC, bioinfornatics wrote:

 It is ok i have found a way maybe is not an efficient way but 
 it works:
 https://gist.github.com/1946669

 a minor bug exist for parse track line will be fixed tomorrow. 
 time to
 bed


 Big thanks to all

You can edit a gist instead of creating a new. This seems like a very fragile implementation, and hard to follow. My quick untested code: auto str = readText(filePath); // Ignoring first three lines. str = array(str.util(newline).until(newline).until(newline)); auto bedInstances = csvReader!(BedData11,Malformed.ignore)(str,'\t'); But if you must keep the separate structs, I don't have any better suggestions.
Feb 29 2012
prev sibling next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
Le jeudi 01 mars 2012 =C3=A0 04:36 +0100, Jesse Phillips a =C3=A9crit :
 On Thursday, 1 March 2012 at 02:07:44 UTC, bioinfornatics wrote:
=20
 It is ok i have found a way maybe is not an efficient way but=20
 it works:
 https://gist.github.com/1946669

 a minor bug exist for parse track line will be fixed tomorrow.=20
 time to
 bed


 Big thanks to all

You can edit a gist instead of creating a new. =20 This seems like a very fragile implementation, and hard to=20 follow. My quick untested code: =20 auto str =3D readText(filePath); =20 // Ignoring first three lines. str =3D array(str.util(newline).until(newline).until(newline)); =20 auto bedInstances =3D=20 csvReader!(BedData11,Malformed.ignore)(str,'\t'); =20 But if you must keep the separate structs, I don't have any=20 better suggestions.

and how convert bedInstances input array to BedData11[] ? Add a constructo to BedData11 and use std.algorithm.map? map!"BedData11(a.filed1, a.filed2...)"(bedInstances);=20
Mar 01 2012
prev sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
On Thursday, 1 March 2012 at 10:09:55 UTC, bioinfornatics wrote:

 and how convert bedInstances input array to BedData11[] ?

std.array.array()
Mar 01 2012