digitalmars.D.bugs - std.zlib.decompress may have flawed logic for allocating buffer

Lynn A. (11/11) Aug 30 2004 It appears there may be a bug in std.zlib.uncompress. The size of the ou...

Sean Kelly (4/13) Aug 31 2004 Working on it. I'm just about done updating etc.c.zlib from v1.1.4 to v...

Dave (63/79) Aug 31 2004 Before you posted this, I checked into it and believe I found a fix.

Lynn (24/26) Sep 01 2004 Dave (17/19) Sep 01 2004 1) On Windows, make sure the dm\bin and dmd\bin folders are in your PATH...

Lynn Allan (23/34) Sep 01 2004

Sean Kelly (5/11) Sep 01 2004 Yup, that should do it. I actually missed the while(1) bit and thought ...

Dave (42/57) Sep 01 2004 Thanks!

Dave (6/46) Sep 02 2004 By "What is the best way to submit this so it is included in the next bu...

Sean Kelly (3/6) Sep 02 2004 Try emailing Walter with the fix.

Lynn Allan (19/21) Sep 02 2004 My 2� ....
Lynn Allan (15/17) Sep 04 2004

Dave (3/20) Sep 04 2004 I'll take care of it..

Lynn Allan (26/26) Sep 01 2004

Lynn A. <Lynn_member pathlink.com> writes:

It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

HTH,
Lynn A.

Aug 30 2004

Sean Kelly <sean f4.ca> writes:

In article <cgv2k4$2dau$1 digitaldaemon.com>, Lynn A. says...
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1.
After that I'm going to look at this bug.

Sean

Aug 31 2004

Dave <Dave_member pathlink.com> writes:

In article <ch29mp$ugn$1 digitaldaemon.com>, Sean Kelly says...
In article <cgv2k4$2dau$1 digitaldaemon.com>, Lynn A. says...
It appears there may be a bug in std.zlib.uncompress. The size of the output
buffer is apparently estimated from the size of the input, and can be inadequate
when the decompression ratio is over 2x.

Rather than repeat other info, there is discussion in the following thread from
the digitalmars.d newsgroup:
"Having problems with uncompress of zip file created by std.zlib"

There is a corresponding thread that is more specific about the uncompress
behavior, and has more info (including code) at:
http://dsource.org/forums/viewtopic.php?t=321

Working on it.  I'm just about done updating etc.c.zlib from v1.1.4 to v1.2.1.
After that I'm going to look at this bug.

Sean

Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);


Here's how I tested it:

#import std.zlib;
#import std.stdio;
#import std.random;

#bool CompressThenUncompress (ubyte[] src)





ratio);












#void main (char[][] args)

Aug 31 2004

Lynn <Lynn_member pathlink.com> writes:




<alert comment="newbie>

Looks like great progress is taking place. THANKS from a prospective user of
std.zlib.

One suggestion: The above test is ingenious (and over my head), but with large
buffers the resulting compression ratios are invariably about 46.9% for the more
compressible buffer, and about 82.4% for the less compressible buffer. I would
advocate logic that generated test buffers with much greater variability of the
compression ratio ... perhaps from 1:10 to 2:1.  And also a contrived buffer
that was all one letter (e.g. '1'), all two letters repeated many times (e.g.
"1212121212 ... 1212"), all three letters (e.g. "123"), etc.

Specifically, I anticipate using zlib for quite regular specialized text that
compresses the original from about 4.1 mb to 1.1 mb. = 73%. It would be an
ironic implementation of "Murphy's Law" if the fix worked well with a limited
range of compression ratios, but couldn't handle a buffer of 5000 "Test Test
Test Test ... repeat 4995 more times ... Test".

Another suggestion: perhaps the tests that are being put together could
eventually be incorporated as zlib unittests? Also, could the people doing the
fixing post the modified zlib object code somewhere for other people to try out?
My impression is that recompiling Phobos is not for faint-hearted newbies.

(Another suggestion: perhaps I should do more than advocate the suggestions
above, but "roll up my sleeves" and contribute time and effort. "People tend to
be as useless as they can get away with, and as resourceful as the circumstances
demand." L.Allan :-)

</alert>

Sep 01 2004

Dave <Dave_member pathlink.com> writes:

In article <ch4f53$26q7$1 digitaldaemon.com>, Lynn says...
My impression is that recompiling Phobos is not for faint-hearted newbies.

1) On Windows, make sure the dm\bin and dmd\bin folders are in your PATH (you
may have to move them towards the 'front' of the PATH if other make tools are
already present in the PATH).

2) Next, cd to each:

dmd/src/phobos/etc/c/zlib/
dmd/src/phobos/etc/c/recls/
dmd/src/phobos/internal/gc/
dmd/src/phobos/

3) and edit the win32.mak files to make sure that the CC and DMD vars. are set
properly (if your path is setup Ok, then you can just change them to dmc and dmd
respectively).

4) Then run "make -f win32.mak" in each.

5) copy dmd/src/phobos/phobos.lib to dmd/lib (after making a backup ;)

The steps are similar for Linux, except the make files are named linux.mak, the
C compiler won't be DMC and you need to copy dmd/src/phobos/libphobos.a to
/usr/lib after the build.

</alert>

Sep 01 2004

"Lynn Allan" <l_d_allan adelphia.net> writes:

 2) Next, cd to each:
 dmd/src/phobos/etc/c/zlib/
 dmd/src/phobos/etc/c/recls/
 dmd/src/phobos/internal/gc/
 dmd/src/phobos/

 3) and edit the win32.mak files to make sure that the CC and DMD vars. are 
 set
 properly (if your path is setup Ok, then you can just change them to dmc 
 and dmd
 respectively).

 4) Then run "make -f win32.mak" in each.

<alert comment="newbie">

I wasn't able to get dmd/src/phobos/c/recls to recompile. I was able to 
rebuild zlib.lib, but couldn't figure out how to get D code to reference my 
changes rather than phobos.lib. I suppose I've used IDE's too much.

After some head scratching, I wonder if you really meant that I should 
attempt to rebuild:
dmd/src/photos/std/zlib.d

As a quick hack, I put test.d and zlib.d in the same directory, and put in 
several :
writefln("Hello Lynn");
"fingerprints" within std.zlib.compress and std.zlib.uncompress. I compiled 
with:
dmd test.d zlib.d
The "Hello Lynn" outputs shows up when running test.exe, so it seems like I 
have an ad hoc environment from which I can make further tests.

Quick sanity check ... am I on the right track with this? My intent is to 
incorporate the "fix candidates" identified in earlier posts, and then 
exercise the "CompressThenUncompress" test scaffolding with test cases that 
have compression ranges from 100:1 to 1:2. (such as
"1111111   .... 1111"
"12121212 ...  1212"
"123123123 ... 123123"

</alert>

Sep 01 2004

Sean Kelly <sean f4.ca> writes:

In article <ch3ajm$1gks$1 digitaldaemon.com>, Dave says...
Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it.  I actually missed the while(1) bit and thought inflate
was only being called once.  It should also work if you set the value to
Z_SYNC_FLUSH.


Sean

Sep 01 2004

Dave <Dave_member pathlink.com> writes:

In article <ch4uav$2dfk$1 digitaldaemon.com>, Sean Kelly says...
In article <ch3ajm$1gks$1 digitaldaemon.com>, Dave says...
Before you posted this, I checked into it and believe I found a fix.

I'm not a zlib expert, but this seems to take care of the problem with 1.1.4.

Change line 159 from:
err = etc.c.zlib.inflate(&zs, Z_FINISH);

to:
err = etc.c.zlib.inflate(&zs, Z_NO_FLUSH);

Yup, that should do it.  I actually missed the while(1) bit and thought inflate
was only being called once.  It should also work if you set the value to
Z_SYNC_FLUSH.


Sean

Thanks!

Timed both with the following on Linux and there was virtually no difference
between the two averaged over 3 runs each. Also timed the previous code post -
again no difference.

The Z_NO_FLUSH was tested on Windows as well and "passed".

What is the best way to submit this so it is included in the next build?

import std.zlib;
import std.stdio;
import std.random;

bool CompressThenUncompress (char[] src)
{
try {
char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
double ratio = (src.length / cast(double)dst.length);
if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
writefln(" ... NO MATCH when src.length = ", src.length, ".");
return false;
} else {
writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
}
}
catch {
writefln(" ... Exception thrown when src.length = ", src.length, ".");
return false;
}

return true;
}

void main (char[][] args)
{
for(int idx = 0; idx < 50; idx++) {
char[] buf = new char[(idx + 1) * 20000];

// Alternate between more & less compressible
foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));

if(CompressThenUncompress(buf)) {
printf("; Success.\n");
} else {
return;
}
}

printf("PASSED all tests.\n");
}

Sep 01 2004

Dave <Dave_member pathlink.com> writes:

By "What is the best way to submit this so it is included in the next build?" I
meant the Z_NO_FLUSH change, not the included code as one might expect given the
context ;)

Geesh, I guess I should review my own posts more often /before/ I send them <g>

- Dave

In article <ch52eq$2fbi$1 digitaldaemon.com>, Dave says...
Timed both with the following on Linux and there was virtually no difference
between the two averaged over 3 runs each. Also timed the previous code post -
again no difference.

The Z_NO_FLUSH was tested on Windows as well and "passed".

What is the best way to submit this so it is included in the next build?

import std.zlib;
import std.stdio;
import std.random;

bool CompressThenUncompress (char[] src)
{
try {
char[] dst = cast(char[])std.zlib.compress(cast(void[])src);
double ratio = (src.length / cast(double)dst.length);
if(src != cast(char[])std.zlib.uncompress(cast(void[])dst)) {
writefln(" ... NO MATCH when src.length = ", src.length, ".");
return false;
} else {
writef("src.length:  ", src.length, ", dst: ", dst.length, ", Ratio = ", ratio);
}
}
catch {
writefln(" ... Exception thrown when src.length = ", src.length, ".");
return false;
}

return true;
}

void main (char[][] args)
{
for(int idx = 0; idx < 50; idx++) {
char[] buf = new char[(idx + 1) * 20000];

// Alternate between more & less compressible
foreach(inout char c; buf) c = ' ' + (rand() % (90 / ((idx % 5) + 1)));

if(CompressThenUncompress(buf)) {
printf("; Success.\n");
} else {
return;
}
}

printf("PASSED all tests.\n");
}

Sep 02 2004

Sean Kelly <sean f4.ca> writes:

In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
By "What is the best way to submit this so it is included in the next build?" I
meant the Z_NO_FLUSH change, not the included code as one might expect given the
context ;)

Try emailing Walter with the fix.

Sean

Sep 02 2004

"Lynn Allan" <l_d_allan adelphia.net> writes:

meant the Z_NO_FLUSH change, not the included code as one might expect 
given the


My 2� ....

I would advocate at least some of "the included code" remain for unittest. 
We have a decent **start** of a rigorous set of test scaffolding for the 
zlib module. Without implying the slightest criticism of the truly amazing 
WB, this problem would have been caught during week one (if not day one) of 
porting zlib from C to D if DBC and unittest was incorporated (unittest may 
not have been implemented when the port was done?).

From The WB circa 2002-April:
http://www.digitalmars.com/d/archives/4607.html
"Unit testing [along with DBC] has in my mind risen to become a main feature 
of D."

Amen. Amen! AMEN!

I would only humbly add that unittest has the potential to significantly 
subdue that truism which is typically attributed to D. Knuth, "premature 
optimization is the root of [most | all] software evil."

(may actually be T. Hoare's:
"We should forget about small efficiencies, say about 97% of the time: 
premature optimization is the root of all evil." )

Lynn A.

Sep 02 2004

"Lynn Allan" <l.allan att.net> writes:

"Sean Kelly" <sean f4.ca> wrote in message
news:ch7ikr$ipi$1 digitaldaemon.com...
 In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
 Try emailing Walter with the fix [for std.zlib.uncompress].

<alert comment="newbie">

Walter,

Did this notification happen? I'm a D newbie and not familiar with the
procedure to submit 'candidates' for code fixes to phobos. There was a
moderately extensive series of posts related to a possible flaw in
std.zlib.uncompress. A probable fix was identified, as well as a number of
routines to possibly incorporate as zlib unittests.

"std.zlib.decompress may have flawed logic for allocating buffer"

news:<cgp8gc$2edj$1 digitaldaemon.com
news:cgv2k4$2dau$1 digitaldaemon.com

HTH,
Lynn A.

</alert>

Sep 04 2004

Dave <Dave_member pathlink.com> writes:

I'll take care of it..

- Dave

In article <chcq3n$2u7m$1 digitaldaemon.com>, Lynn Allan says...
"Sean Kelly" <sean f4.ca> wrote in message
news:ch7ikr$ipi$1 digitaldaemon.com...
 In article <ch7ceo$fn2$1 digitaldaemon.com>, Dave says...
 Try emailing Walter with the fix [for std.zlib.uncompress].

<alert comment="newbie">

Walter,

Did this notification happen? I'm a D newbie and not familiar with the
procedure to submit 'candidates' for code fixes to phobos. There was a
moderately extensive series of posts related to a possible flaw in
std.zlib.uncompress. A probable fix was identified, as well as a number of
routines to possibly incorporate as zlib unittests.

"std.zlib.decompress may have flawed logic for allocating buffer"

news:<cgp8gc$2edj$1 digitaldaemon.com
news:cgv2k4$2dau$1 digitaldaemon.com

HTH,
Lynn A.

</alert>

Sep 04 2004

"Lynn Allan" <l_d_allan adelphia.net> writes:

<alert comment="newbie">

With the Z_NO_FLUSH change noted in earlier posts, I was able to recompile 
all previous code using a rebuilt zlib.obj and ... (drum roll, please) ... 
IT WORKED.  THANKS!

I also appended the following code to the code submitted by Dave. It has a 
test buffer than has extremely regular text in a buffer than grows from 
len=6 to about len=6,291,456. (6 * 2 **  20).  This caused the "compression 
ratio" to go from 1:2.33 for 6 bytes of "123123" to about 1000:1 for 6291456 
bytes of "123123 ... 123123".

Sep 01 2004

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - std.zlib.decompress may have flawed logic for allocating buffer