www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading bigger file

reply "bioinfornatics" <bioinfornatics fedoraproject.org> writes:
Hi,

I already asked some question about it. I come back with a newest 
^^
why when reading a huge file more i advance into the file more 
that take time to get a line ?


first line is get into 0 msec and that increase so that is a 
problem when your file is around 30 GB!

by example ucmpress one fastq from 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00096/sequence_read/
use this tiny code  http://dpaste.dzfl.pl/47838d8d

you will see that more the line number is higher more that take 
time to get it.
Mar 08 2013
parent reply "Chris Cain" <clcain uncg.edu> writes:
On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?
---- StopWatch sw; while( !fastq1.empty ){ sw.start(); auto q1 = fastq1.next(); sw.stop(); writeln( sw.peek().msecs() ); } ---- That's because you never reset the StopWatch.
Mar 08 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 08 Mar 2013 16:31:45 +0100
schrieb "Chris Cain" <clcain uncg.edu>:

 On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?
---- StopWatch sw; while( !fastq1.empty ){ sw.start(); auto q1 = fastq1.next(); sw.stop(); writeln( sw.peek().msecs() ); } ---- That's because you never reset the StopWatch.
Ha ha! On a different note... if you still just parse linearly without seeking inside the file, you should consider parsing the compressed file directly. GZIP decompression is very fast. You may get 80 MiB/s for the decompression as well as for HDD read speed. So as long as you parallelize reading new data and decompression, that's your actual read speed. Now the compression factor you indicated is ~15x, so that makes it effectively a 15 * 80 MiB/s = 1.2 GiB/s read speed. Sounds good? :) -- Marco
Mar 08 2013
prev sibling parent "bioinfornatics" <bioinfornatics fedoraproject.org> writes:
On Friday, 8 March 2013 at 15:31:46 UTC, Chris Cain wrote:
 On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?
---- StopWatch sw; while( !fastq1.empty ){ sw.start(); auto q1 = fastq1.next(); sw.stop(); writeln( sw.peek().msecs() ); } ---- That's because you never reset the StopWatch.
oh ok, i was though sw.start() init at 0. We will said that is friday ... thanks ^^
Mar 08 2013