digitalmars.D.learn - Reading bigger file

bioinfornatics (12/12) Mar 08 2013 Hi,

Chris Cain (11/13) Mar 08 2013 ----

Marco Leise (14/29) Mar 08 2013 Ha ha!
bioinfornatics (3/16) Mar 08 2013 oh ok, i was though sw.start() init at 0. We will said that is

"bioinfornatics" <bioinfornatics fedoraproject.org> writes:

Hi,

I already asked some question about it. I come back with a newest 
^^
why when reading a huge file more i advance into the file more 
that take time to get a line ?


first line is get into 0 msec and that increase so that is a 
problem when your file is around 30 GB!

by example ucmpress one fastq from 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00096/sequence_read/
use this tiny code  http://dpaste.dzfl.pl/47838d8d

you will see that more the line number is higher more that take 
time to get it.

Mar 08 2013

"Chris Cain" <clcain uncg.edu> writes:

On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?

----
     StopWatch   sw;
     while( !fastq1.empty ){
         sw.start();
         auto        q1          = fastq1.next();
         sw.stop();
         writeln( sw.peek().msecs() );
     }
----

That's because you never reset the StopWatch.

Mar 08 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Fri, 08 Mar 2013 16:31:45 +0100
schrieb "Chris Cain" <clcain uncg.edu>:

 On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?

 
 ----
      StopWatch   sw;
      while( !fastq1.empty ){
          sw.start();
          auto        q1          = fastq1.next();
          sw.stop();
          writeln( sw.peek().msecs() );
      }
 ----
 
 That's because you never reset the StopWatch.

Ha ha!

On a different note... if you still just parse linearly
without seeking inside the file, you should consider parsing
the compressed file directly. GZIP decompression is very fast.
You may get 80 MiB/s for the decompression as well as for HDD
read speed. So as long as you parallelize reading new data and
decompression, that's your actual read speed. Now the
compression factor you indicated is ~15x, so that makes it
effectively a 15 * 80 MiB/s = 1.2 GiB/s read speed. Sounds
good? :)

-- 
Marco

Mar 08 2013

"bioinfornatics" <bioinfornatics fedoraproject.org> writes:

On Friday, 8 March 2013 at 15:31:46 UTC, Chris Cain wrote:
 On Friday, 8 March 2013 at 15:25:02 UTC, bioinfornatics wrote:
 why when reading a huge file more i advance into the file more 
 that take time to get a line ?

 ----
     StopWatch   sw;
     while( !fastq1.empty ){
         sw.start();
         auto        q1          = fastq1.next();
         sw.stop();
         writeln( sw.peek().msecs() );
     }
 ----

 That's because you never reset the StopWatch.

oh ok, i was though sw.start() init at 0. We will said that is 
friday ... thanks ^^

Mar 08 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Reading bigger file