www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - std.zlib.decompress may have flawed logic for allocating buffer

reply Lynn A. <Lynn_member pathlink.com> writes:
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

HTH,
Lynn A.
Aug 30 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <cgv2k4$2dau$1 digitaldaemon.com>, Lynn A. says...
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

Working on it. I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1. After that I'm going to look at this bug. Sean
Aug 31 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <ch29mp$ugn$1 digitaldaemon.com>, Sean Kelly says...
In article <cgv2k4$2dau$1 digitaldaemon.com>, Lynn A. says...
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

Working on it. I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1. After that I'm going to look at this bug. Sean

Before you posted this, I checked into it and believe I found a fix. I'm not a zlib expert, but this seems to take care of the problem with 1.1.4. Change line 159 from: err = etc.c.zlib.inflate(&zs, Z_FINISH); to: err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH); Here's how I tested it: #import std.zlib; #import std.stdio; #import std.random; # #bool CompressThenUncompress (ubyte[] src) #{ # try { # ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src); # double ratio = (dst.length / cast(double)src.length); # writef("src.length: ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio); # ubyte[] uncompressedBuf; # uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst); # assert(src.length == uncompressedBuf.length); # assert(src == uncompressedBuf); # } # catch { # writefln(" ... Exception thrown when src.length = ", src.length, "."); # return false; # } # return true; #} # #void main (char[][] args) #{ # // smallish buffers # for(int idx = 0; idx < 25; idx++) { # char[] buf = new char[rand() % 100]; # # // Alternate between more & less compressible # foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 2)); # # if(CompressThenUncompress(cast(ubyte[])buf)) { # printf("; Success.\n"); # } else { # return; # } # } # # // larger buffers # for(int idx = 0; idx < 25; idx++) { # char[] buf = new char[rand() % 10000000]; # # // Alternate between more & less compressible # foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10)); # # if(CompressThenUncompress(cast(ubyte[])buf)) { # printf("; Success.\n"); # } else { # return; # } # } # # printf("PASSED all tests.\n"); #}
Aug 31 2004
next sibling parent reply Lynn <Lynn_member pathlink.com> writes:
#        // Alternate between more & less compressible
#        foreach(inout char c; buf) c = ' ' + (rand() % (idx % 2 ? 91 : 10));

<alert comment="newbie> Looks like great progress is taking place. THANKS from a prospective user of std.zlib. One suggestion: The above test is ingenious (and over my head), but with large buffers the resulting compression ratios are invariably about 46.9% for the more compressible buffer, and about 82.4% for the less compressible buffer. I would advocate logic that generated test buffers with much greater variability of the compression ratio ... perhaps from 1:10 to 2:1. And also a contrived buffer that was all one letter (e.g. '1'), all two letters repeated many times (e.g. "1212121212 ... 1212"), all three letters (e.g. "123"), etc. Specifically, I anticipate using zlib for quite regular specialized text that compresses the original from about 4.1 mb to 1.1 mb. = 73%. It would be an ironic implementation of "Murphy's Law" if the fix worked well with a limited range of compression ratios, but couldn't handle a buffer of 5000 "Test Test Test Test ... repeat 4995 more times ... Test". Another suggestion: perhaps the tests that are being put together could eventually be incorporated as zlib unittests? Also, could the people doing the fixing post the modified zlib object code somewhere for other people to try out? My impression is that recompiling Phobos is not for faint-hearted newbies. (Another suggestion: perhaps I should do more than advocate the suggestions above, but "roll up my sleeves" and contribute time and effort. "People tend to be as useless as they can get away with, and as resourceful as the circumstances demand." L.Allan :-) </alert>
Sep 01 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <ch4f53$26q7$1 digitaldaemon.com>, Lynn says...
My impression is that recompiling Phobos is not for faint-hearted newbies.

1) On Windows, make sure the dm\bin and dmd\bin folders are in your PATH (you may have to move them towards the 'front' of the PATH if other make tools are already present in the PATH). 2) Next, cd to each: dmd/src/phobos/etc/c/zlib/ dmd/src/phobos/etc/c/recls/ dmd/src/phobos/internal/gc/ dmd/src/phobos/ 3) and edit the win32.mak files to make sure that the CC and DMD vars. are set properly (if your path is setup Ok, then you can just change them to dmc and dmd respectively). 4) Then run "make -f win32.mak" in each. 5) copy dmd/src/phobos/phobos.lib to dmd/lib (after making a backup ;) The steps are similar for Linux, except the make files are named linux.mak, the C compiler won't be DMC and you need to copy dmd/src/phobos/libphobos.a to /usr/lib after the build.
</alert>

Sep 01 2004
parent "Lynn Allan" <l_d_allan adelphia.net> writes:
 2) Next, cd to each:
 dmd/src/phobos/etc/c/zlib/
 dmd/src/phobos/etc/c/recls/
 dmd/src/phobos/internal/gc/
 dmd/src/phobos/

 3) and edit the win32.mak files to make sure that the CC and DMD vars. are 
 set
 properly (if your path is setup Ok, then you can just change them to dmc 
 and dmd
 respectively).

 4) Then run "make -f win32.mak" in each.

<alert comment="newbie"> I wasn't able to get dmd/src/phobos/c/recls to recompile. I was able to rebuild zlib.lib, but couldn't figure out how to get D code to reference my changes rather than phobos.lib. I suppose I've used IDE's too much. After some head scratching, I wonder if you really meant that I should attempt to rebuild: dmd/src/photos/std/zlib.d As a quick hack, I put test.d and zlib.d in the same directory, and put in several : writefln("Hello Lynn"); "fingerprints" within std.zlib.compress and std.zlib.uncompress. I compiled with: dmd test.d zlib.d The "Hello Lynn" outputs shows up when running test.exe, so it seems like I have an ad hoc environment from which I can make further tests. Quick sanity check ... am I on the right track with this? My intent is to incorporate the "fix candidates" identified in earlier posts, and then exercise the "CompressThenUncompress" test scaffolding with test cases that have compression ranges from 100:1 to 1:2. (such as "1111111 .... 1111" "12121212 ... 1212" "123123123 ... 123123" </alert>
Sep 01 2004
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <ch3ajm$1gks$1 digitaldaemon.com>, Dave says...
Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it. I actually missed the while(1) bit and thought inflate was only being called once. It should also work if you set the value to Z_SYNC_FLUSH. Sean
Sep 01 2004
parent reply Dave <Dave_member pathlink.com> writes:
In article <ch4uav$2dfk$1 digitaldaemon.com>, Sean Kelly says...
In article <ch3ajm$1gks$1 digitaldaemon.com>, Dave says...
Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it. I actually missed the while(1) bit and thought inflate was only being called once. It should also work if you set the value to Z_SYNC_FLUSH. Sean

Thanks! Timed both with the following on Linux and there was virtually no difference between the two averaged over 3 runs each. Also timed the previous code post - again no difference. The Z_NO_FLUSH was tested on Windows as well and "passed". What is the best way to submit this so it is included in the next build? import std.zlib; import std.stdio; import std.random; bool CompressThenUncompress (char[] src) { try { char[] dst = cast(char[])std.zlib.compress(cast(void[])src); double ratio = (src.length / cast(double)dst.length); if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) { writefln(" ... NO MATCH when src.length = ", src.length, "."); return false; } else { writef("src.length: ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio); } } catch { writefln(" ... Exception thrown when src.length = ", src.length, "."); return false; } return true; } void main (char[][] args) { for(int idx = 0; idx < 50; idx++) { char[] buf = new char[(idx + 1) * 20000]; // Alternate between more & less compressible foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1))); if(CompressThenUncompress(buf)) { printf("; Success.\n"); } else { return; } } printf("PASSED all tests.\n"); }
Sep 01 2004
parent reply Dave <Dave_member pathlink.com> writes:
By "What is the best way to submit this so it is included in the next build?" I
meant the Z_NO_FLUSH change, not the included code as one might expect given the
context ;)

Geesh, I guess I should review my own posts more often /before/ I send them <g>

- Dave

In article <ch52eq$2fbi$1 digitaldaemon.com>, Dave says...
Timed both with the following on Linux and there was virtually no difference
between the two averaged over 3 runs each. Also timed the previous code post -
again no difference.

The Z_NO_FLUSH was tested on Windows as well and "passed".

What is the best way to submit this so it is included in the next build?

import std.zlib;
import std.stdio;
import std.random;

bool CompressThenUncompress (char[] src)
{
try {
char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
double ratio = (src.length / cast(double)dst.length);
if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
writefln(" ... NO MATCH when src.length = ", src.length, ".");
return false;
} else {
writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
}
}
catch {
writefln(" ... Exception thrown when src.length = ", src.length, ".");
return false;
}

return true;
}

void main (char[][] args)
{
for(int idx = 0; idx < 50; idx++) {
char[] buf = new char[(idx + 1) * 20000];

// Alternate between more & less compressible
foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));

if(CompressThenUncompress(buf)) {
printf("; Success.\n");
} else {
return;
}
}

printf("PASSED all tests.\n");
}

Sep 02 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
By "What is the best way to submit this so it is included in the next build?" I
meant the Z_NO_FLUSH change, not the included code as one might expect given the
context ;)

Try emailing Walter with the fix. Sean
Sep 02 2004
next sibling parent "Lynn Allan" <l_d_allan adelphia.net> writes:
meant the Z_NO_FLUSH change, not the included code as one might expect 
given the


My 2 .... I would advocate at least some of "the included code" remain for unittest. We have a decent **start** of a rigorous set of test scaffolding for the zlib module. Without implying the slightest criticism of the truly amazing WB, this problem would have been caught during week one (if not day one) of porting zlib from C to D if DBC and unittest was incorporated (unittest may not have been implemented when the port was done?). From The WB circa 2002-April: http://www.digitalmars.com/d/archives/4607.html "Unit testing [along with DBC] has in my mind risen to become a main feature of D." Amen. Amen! AMEN! I would only humbly add that unittest has the potential to significantly subdue that truism which is typically attributed to D. Knuth, "premature optimization is the root of [most | all] software evil." (may actually be T. Hoare's: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." ) Lynn A.
Sep 02 2004
prev sibling parent reply "Lynn Allan" <l.allan att.net> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:ch7ikr$ipi$1 digitaldaemon.com...
 In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
 Try emailing Walter with the fix [for std.zlib.uncompress].

<alert comment="newbie"> Walter, Did this notification happen? I'm a D newbie and not familiar with the procedure to submit 'candidates' for code fixes to phobos. There was a moderately extensive series of posts related to a possible flaw in std.zlib.uncompress. A probable fix was identified, as well as a number of routines to possibly incorporate as zlib unittests. "std.zlib.decompress may have flawed logic for allocating buffer" news:<cgp8gc$2edj$1 digitaldaemon.com news:cgv2k4$2dau$1 digitaldaemon.com HTH, Lynn A. </alert>
Sep 04 2004
parent Dave <Dave_member pathlink.com> writes:
I'll take care of it..

- Dave

In article <chcq3n$2u7m$1 digitaldaemon.com>, Lynn Allan says...
"Sean Kelly" <sean f4.ca> wrote in message
news:ch7ikr$ipi$1 digitaldaemon.com...
 In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
 Try emailing Walter with the fix [for std.zlib.uncompress].

<alert comment="newbie"> Walter, Did this notification happen? I'm a D newbie and not familiar with the procedure to submit 'candidates' for code fixes to phobos. There was a moderately extensive series of posts related to a possible flaw in std.zlib.uncompress. A probable fix was identified, as well as a number of routines to possibly incorporate as zlib unittests. "std.zlib.decompress may have flawed logic for allocating buffer" news:<cgp8gc$2edj$1 digitaldaemon.com news:cgv2k4$2dau$1 digitaldaemon.com HTH, Lynn A. </alert>

Sep 04 2004
prev sibling parent "Lynn Allan" <l_d_allan adelphia.net> writes:
<alert comment="newbie">

With the Z_NO_FLUSH change noted in earlier posts, I was able to recompile 
all previous code using a rebuilt zlib.obj and ... (drum roll, please) ... 
IT WORKED.  THANKS!

I also appended the following code to the code submitted by Dave. It has a 
test buffer than has extremely regular text in a buffer than grows from 
len=6 to about len=6,291,456. (6 * 2 **  20).  This caused the "compression 
ratio" to go from 1:2.33 for 6 bytes of "123123" to about 1000:1 for 6291456 
bytes of "123123 ... 123123".

#   // increasingly long buffers with VERY high compressibility
#    // "123123 .... 123123"
#   for (int len = 6; len <= 10000000; len += len) {
#        char[] buf = new char[len];
#
#        for (int idx = 0; idx < len; idx += 3) {
#          buf[idx] = '1';
#          buf[idx + 1] = '2';
#          buf[idx + 2] = '3';
#        }
#        if(CompressThenUncompress(cast(ubyte[])buf)) {
#            printf("; Success.\n");
#        }
#        else {
#            return;
#        }
#    }
Sep 01 2004