www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Problems with Zlib - data error

reply Era Scarecrow <rtcvb32 yahoo.com> writes:
I took the UnCompress example and tried to make use of it, 
however it breaks midway through my program with nothing more 
than 'Data Error'.

[code]
//shamelessly taken for experimenting with
UnCompress decmp = new UnCompress;
foreach (chunk; stdin.byChunk(4096).map!(x => 
decmp.uncompress(x)))
[/code]

Although 2 things to note. First I'm using an xor block of data 
that's compressed (either with gzip or using only zlib), and 
second the size of the data is 660Mb, while the compressed gzip 
file is about 3Mb in size. So when the data gets out of the large 
null blocks is when it dies. The first 5Mb fits in 18k of 
compressed space (and could be re-compressed to save another 17%).

Is this a bug with zlib? With the Dlang library? Or is it a 
memory issue with allocation (which drove me to use this rather 
than the straight compress/decompress in the first place).

[code]
   File xor = File(args[2], "r"); //line 53

   foreach (chunk; xor.byChunk(2^^16).map!(x => cast(ubyte[]) 
decmp.uncompress(x))) //line 59 where it's breaking, doesn't 
matter if it's 4k, 8k, or 64k.
[/code]


std.zlib.ZlibException std\zlib.d(96): data error
----------------
0x00407C62 in void std.zlib.UnCompress.error(int)
0x00405134 in ubyte[] 
xortool.main(immutable(char)[][]).__lambda2!(ubyte[]).__lambda2(ubyte[])
0x00405291 in  property ubyte[] 
std.algorithm.iteration.__T9MapResultS297xortool
4mainFAAyaZ9__lambda2TS3std5stdio4File7ByChunkZ.MapResult.front() 
at 
c:\D\dmd2\windows\bin\..\..\src\phobos\std\algorithm\iteration.d(582)
0x0040243F in _Dmain at g:\\Patch-Datei\xortool.d(59)
0x00405F43 in 
D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv
0x00405F07 in void rt.dmain2._d_run_main(int, char**, extern (C) 
int function(char[][])*).runAll()
0x00405E08 in _d_run_main
0x00405BF8 in main at g:\\Patch-Datei\xortool.d(7)
0x0044E281 in mainCRTStartup
0x764333CA in BaseThreadInitThunk
0x77899ED2 in RtlInitializeExceptionChain
0x77899EA5 in RtlInitializeExceptionChain
Apr 20
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 20 April 2017 at 20:19:31 UTC, Era Scarecrow wrote:
 I took the UnCompress example and tried to make use of it, 
 however it breaks midway through my program with nothing more 
 than 'Data Error'.
See the tip of the week here: http://arsdnet.net/this-week-in-d/2016-apr-24.html In short, byChunk reuses its buffer, and std.zlib holds on to the pointer. That combination leads to corrupted data. Easiest fix is to .dup the chunk... I don't know of one off the top of my head that avoids the allocation using any of the std.zlib functions.
Apr 20
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Thursday, 20 April 2017 at 20:24:15 UTC, Adam D. Ruppe wrote:
 In short, byChunk reuses its buffer, and std.zlib holds on to 
 the pointer. That combination leads to corrupted data.

 Easiest fix is to .dup the chunk...
So that's what's going on. But if I have to dup the blocks then I have the same problem as before with limited memory issues. I kinda wish more there was the gz_open that is in the C interface and let it deal with the decompression and memory management as appropriate. I suppose i could incorporate a 8 byte header file that has the length before/after that are 0's and just drop 630Mb from the data that can be skipped... which is the bulk of the compressed data. I just hoped to keep it very simple.
Apr 21
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 21 April 2017 at 11:18:55 UTC, Era Scarecrow wrote:
  So that's what's going on. But if I have to dup the blocks 
 then I have the same problem as before with limited memory 
 issues. I kinda wish more there was the gz_open that is in the 
 C interface and let it deal with the decompression and memory 
 management as appropriate.
You could always declare it with extern(C) and call it yourself. But I didn't realize your thing was a literal example from the docs. Ugh, can't even trust that.
 I suppose i could incorporate a 8 byte header file that has the 
 length before/after that are 0's and just drop 630Mb from the 
 data that can be skipped... which is the bulk of the compressed 
 data. I just hoped to keep it very simple.
Take a look at zlib.d's source http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232 It isn't a long function, so if you take that you can copy/paste the C parts to get you started with your own function that manages the memory more efficiently to drop the parts you don't care about.
Apr 21
parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Friday, 21 April 2017 at 12:57:25 UTC, Adam D. Ruppe wrote:
 But I didn't realize your thing was a literal example from the 
 docs. Ugh, can't even trust that.
Which was a larger portion of why I was confused by it all than otherwise. Still, it's much easier to salvage if I knew how the memory being returned was allocated or not, and if it could be de-allocated after I was done with it, vs letting the gc manage it. The black box vs white box approach.
 Take a look at zlib.d's source

 http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232

 It isn't a long function, so if you take that you can 
 copy/paste the C parts to get you started with your own 
 function that manages the memory more efficiently to drop the 
 parts you don't care about.
I've worked directly with Zlib API in the past; However it was namely to get it to work with AHK allowing me to instantly compress text and see it's UUEncode64 output (which was fun) as well as having multiple source references for better compression. I think I'll just go with full memory compression and make a quick simple filter to manage the large blocks of 0's to something more manageable. That will reduce the memory allocation issues.
Apr 21
parent Era Scarecrow <rtcvb32 yahoo.com> writes:
On Friday, 21 April 2017 at 17:40:03 UTC, Era Scarecrow wrote:
 I think I'll just go with full memory compression and make a 
 quick simple filter to manage the large blocks of 0's to 
 something more manageable. That will reduce the memory 
 allocation issues.
Done and I'm happy with the results. After getting all my tests to work, working on the input of 660Mb went to 3.8Mb, and then compressing it with Zlib went to 2.98Mb. Alas the tool will be more useful in limited scope (rom hacking for example) than anywhere else probably... Although if there's any request for the source I can spruce it up before submitting it for public use.
Apr 21