digitalmars.D.learn - Problems with Zlib

digitalmars.D.learn - Problems with Zlib - data error

Era Scarecrow (45/45) Apr 20 2017 I took the UnCompress example and tried to make use of it,

Adam D. Ruppe (8/11) Apr 20 2017 See the tip of the week here:

Era Scarecrow (10/13) Apr 21 2017 So that's what's going on. But if I have to dup the blocks then

Adam D. Ruppe (10/19) Apr 21 2017 You could always declare it with extern(C) and call it yourself.

Era Scarecrow (15/23) Apr 21 2017 Which was a larger portion of why I was confused by it all than

Era Scarecrow (8/12) Apr 21 2017 Done and I'm happy with the results. After getting all my tests

Era Scarecrow <rtcvb32 yahoo.com> writes:

I took the UnCompress example and tried to make use of it, 
however it breaks midway through my program with nothing more 
than 'Data Error'.

[code]
//shamelessly taken for experimenting with
UnCompress decmp = new UnCompress;
foreach (chunk; stdin.byChunk(4096).map!(x => 
decmp.uncompress(x)))
[/code]

Although 2 things to note. First I'm using an xor block of data 
that's compressed (either with gzip or using only zlib), and 
second the size of the data is 660Mb, while the compressed gzip 
file is about 3Mb in size. So when the data gets out of the large 
null blocks is when it dies. The first 5Mb fits in 18k of 
compressed space (and could be re-compressed to save another 17%).

Is this a bug with zlib? With the Dlang library? Or is it a 
memory issue with allocation (which drove me to use this rather 
than the straight compress/decompress in the first place).

[code]
   File xor = File(args[2], "r"); //line 53

   foreach (chunk; xor.byChunk(2^^16).map!(x => cast(ubyte[]) 
decmp.uncompress(x))) //line 59 where it's breaking, doesn't 
matter if it's 4k, 8k, or 64k.
[/code]


std.zlib.ZlibException std\zlib.d(96): data error
----------------
0x00407C62 in void std.zlib.UnCompress.error(int)
0x00405134 in ubyte[] 
xortool.main(immutable(char)[][]).__lambda2!(ubyte[]).__lambda2(ubyte[])
0x00405291 in  property ubyte[] 
std.algorithm.iteration.__T9MapResultS297xortool
4mainFAAyaZ9__lambda2TS3std5stdio4File7ByChunkZ.MapResult.front() 
at 
c:\D\dmd2\windows\bin\..\..\src\phobos\std\algorithm\iteration.d(582)
0x0040243F in _Dmain at g:\\Patch-Datei\xortool.d(59)
0x00405F43 in 
D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv
0x00405F07 in void rt.dmain2._d_run_main(int, char**, extern (C) 
int function(char[][])*).runAll()
0x00405E08 in _d_run_main
0x00405BF8 in main at g:\\Patch-Datei\xortool.d(7)
0x0044E281 in mainCRTStartup
0x764333CA in BaseThreadInitThunk
0x77899ED2 in RtlInitializeExceptionChain
0x77899EA5 in RtlInitializeExceptionChain

Apr 20 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Thursday, 20 April 2017 at 20:19:31 UTC, Era Scarecrow wrote:
 I took the UnCompress example and tried to make use of it, 
 however it breaks midway through my program with nothing more 
 than 'Data Error'.

See the tip of the week here:

http://arsdnet.net/this-week-in-d/2016-apr-24.html

In short, byChunk reuses its buffer, and std.zlib holds on to the 
pointer. That combination leads to corrupted data.


Easiest fix is to .dup the chunk... I don't know of one off the 
top of my head that avoids the allocation using any of the 
std.zlib functions.

Apr 20 2017

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Thursday, 20 April 2017 at 20:24:15 UTC, Adam D. Ruppe wrote:
 In short, byChunk reuses its buffer, and std.zlib holds on to 
 the pointer. That combination leads to corrupted data.

 Easiest fix is to .dup the chunk...

  So that's what's going on. But if I have to dup the blocks then 
I have the same problem as before with limited memory issues. I 
kinda wish more there was the gz_open that is in the C interface 
and let it deal with the decompression and memory management as 
appropriate.

I suppose i could incorporate a 8 byte header file that has the 
length before/after that are 0's and just drop 630Mb from the 
data that can be skipped... which is the bulk of the compressed 
data. I just hoped to keep it very simple.

Apr 21 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Friday, 21 April 2017 at 11:18:55 UTC, Era Scarecrow wrote:
  So that's what's going on. But if I have to dup the blocks 
 then I have the same problem as before with limited memory 
 issues. I kinda wish more there was the gz_open that is in the 
 C interface and let it deal with the decompression and memory 
 management as appropriate.

You could always declare it with extern(C) and call it yourself.

But I didn't realize your thing was a literal example from the 
docs. Ugh, can't even trust that.

 I suppose i could incorporate a 8 byte header file that has the 
 length before/after that are 0's and just drop 630Mb from the 
 data that can be skipped... which is the bulk of the compressed 
 data. I just hoped to keep it very simple.

Take a look at zlib.d's source

http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232

It isn't a long function, so if you take that you can copy/paste 
the C parts to get you started with your own function that 
manages the memory more efficiently to drop the parts you don't 
care about.

Apr 21 2017

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 21 April 2017 at 12:57:25 UTC, Adam D. Ruppe wrote:
 But I didn't realize your thing was a literal example from the 
 docs. Ugh, can't even trust that.

Which was a larger portion of why I was confused by it all than 
otherwise.

Still, it's much easier to salvage if I knew how the memory being 
returned was allocated or not, and if it could be de-allocated 
after I was done with it, vs letting the gc manage it. The black 
box vs white box approach.

 Take a look at zlib.d's source

 http://dpldocs.info/experimental-docs/source/std.zlib.d.html#L232

 It isn't a long function, so if you take that you can 
 copy/paste the C parts to get you started with your own 
 function that manages the memory more efficiently to drop the 
 parts you don't care about.

I've worked directly with Zlib API in the past; However it was 
namely to get it to work with AHK allowing me to instantly 
compress text and see it's UUEncode64 output (which was fun) as 
well as having multiple source references for better compression.



I think I'll just go with full memory compression and make a 
quick simple filter to manage the large blocks of 0's to 
something more manageable. That will reduce the memory allocation 
issues.

Apr 21 2017

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Friday, 21 April 2017 at 17:40:03 UTC, Era Scarecrow wrote:
 I think I'll just go with full memory compression and make a 
 quick simple filter to manage the large blocks of 0's to 
 something more manageable. That will reduce the memory 
 allocation issues.

  Done and I'm happy with the results. After getting all my tests 
to work, working on the input of 660Mb went to 3.8Mb, and then 
compressing it with Zlib went to 2.98Mb.

  Alas the tool will be more useful in limited scope (rom hacking 
for example) than anywhere else probably... Although if there's 
any request for the source I can spruce it up before submitting 
it for public use.

Apr 21 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Problems with Zlib - data error