www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to open a compressed file in gz format ?

reply sharkloc <dswew4er qq.com> writes:
I want to read the content(file.gz) line by line,the following 
code is not friendly to large files of hundreds of Gb, and the 
memory overhead is also very large.


import std.stdio;
import std.process;
import std.string;

void main(string[] args){

	string fileName = args[1];
	string command = "gzip -dc " ~ fileName ;
	auto dmd = executeShell(command);

	if(dmd.status != 0){
		writeln("Compilation failed:\n", dmd.output);
	}
	else{
		auto all=chomp(dmd.output).split("\n");
	    writeln(typeid(all));
	    for(int i=0; i<all.length; i++){
		    writeln(all[i]);
		}
	}

}
Mar 14 2021
next sibling parent frame <frame86 live.com> writes:
On Monday, 15 March 2021 at 01:36:08 UTC, sharkloc wrote:
 I want to read the content(file.gz) line by line,the following 
 code is not friendly to large files of hundreds of Gb, and the 
 memory overhead is also very large.
You can use the internal zlib instead of a shell. This example is using stdin but you can it also replace with a file handle: import std.zlib; import std.stdio; import std.conv : to; import std.array : split; import std.algorithm.iteration : map; void main() { UnCompress decmp = new UnCompress; string buf; // read 4096 bytes of compressed stream at iteration foreach (chunk; stdin.byChunk(4096).map!(x => decmp.uncompress(x))) { // chunk has unknown length of decompressed data auto lines = to!string(chunk).split("\n"); foreach (i, line; lines[0 .. $]) { if (i == 0) { // if there is something in buffer // it belongs to previos line writeln(buf ~ line); // reset buffer buf.length = 0; } else if (i + 1 == lines.length) { // the last line is maybe incomplete, we never // directly output it buf = line; } else { writeln(line); } } } // rest if (buf.length) { write(buf); } }
Mar 16 2021
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/14/21 9:36 PM, sharkloc wrote:
 I want to read the content(file.gz) line by line,the following code is 
 not friendly to large files of hundreds of Gb, and the memory overhead 
 is also very large.
 
 
 import std.stdio;
 import std.process;
 import std.string;
 
 void main(string[] args){
 
      string fileName = args[1];
      string command = "gzip -dc " ~ fileName ;
      auto dmd = executeShell(command);
 
      if(dmd.status != 0){
          writeln("Compilation failed:\n", dmd.output);
      }
      else{
          auto all=chomp(dmd.output).split("\n");
          writeln(typeid(all));
          for(int i=0; i<all.length; i++){
              writeln(all[i]);
          }
      }
 
 }
It's not super-user-friendly, but iopipe excels at this kind of stuff (untested): // dub dependencies: [iopipe, io] import iopipe.bufpipe; import iopipe.textpipe; import iopipe.zip; import iopipe.refc; import std.io; import std.stdio; void main(string[] args) { string fileName = args[1]; auto lineRange = File(fileName) // open file .refCounted // make it copyable .bufd // buffer it .unzip // unzip it .assumeText // assume the binary data is utf8 text .byLineRange!true; // true = discard newlines foreach(line; lineRange) writeln(line); } -Steve
Mar 16 2021