www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Having problems with uncompress of zip file created by std.zlib

reply "Lynn Allan" <l.allan att.net> writes:
I'm puzzled why the code below doesn't work. I'm attempting to use the
std.zlib.uncompress on a file that was created with std.zlib.compress. In
the code, there is a "version" that:
* reads Test.vpl into step01 (plain text with <crlf>'s)
* compresses into step02 and writes Test.zip
* reads Test.zip into buffer step03
* checks length of step02 and step03 equal to confirm read+write+read
* attempts to uncompress the buffer from Test.zip into char[] buffer step04
* gets exception with message, "Error: buf error"

The same file without -version=WriteFileBeforeReading skips the 1st, 2nd,
and 4th steps above to see if it works better to let the Test.zip file
close. I've tried a variety of combinations with reusing buffers, small
files, big files.

Am I doing something wrong? Leaving out a step or three? Do I need to
incorporate std.zip.ZipArchive and ArchiveMember for a Test.zip that only
includes one file?

Lynn A.


//********************************************************
//********************************************************
import std.file;
import std.zlib;

int main (char[][] args)
{
  version(WriteFileBeforeReading) // dmd -version=WriteFileBeforeReading
test.d
  {
    printf("Reached version: WriteFileBeforeReading\n");
    char[] inputStep01 = cast(char[])std.file.read("Test.vpl");
    ubyte[] compressedStep02 = cast(ubyte[])compress(inputStep01);
    printf("inputStep01 size:          %d\n", inputStep01.length);
    printf("compressedStep02 size:     %d\n", compressedStep02.length);
    std.file.write("Test.zip", compressedStep02);
  }
  printf("Reached past: WriteFileBeforeReading\n");
  ubyte [] compressedStep03 = cast(ubyte[])std.file.read("Test.zip");
  printf("Test.zip size:               %d\n", compressedStep03.length);

  version(WriteFileBeforeReading)
  { assert(compressedStep02.length == compressedStep03.length);
  }
  char[] textUncompressedStep04 = cast(char[])uncompress(compressedStep03);
  printf("textUncompressedStep04 size: %d\n",
textUncompressedStep04.length);

  return 0;
}
Aug 27 2004
parent reply "Walter" <newshound digitalmars.com> writes:
The following test program does a simple read/write of a zip file. It might
be helpful.
----------------------------

import std.file;
import std.date;
import std.zip;
import std.zlib;

int main(char[][] args)
{
    byte[] buffer;
    std.zip.ZipArchive zr;
    char[] zipname;
    ubyte[] data;

    testzlib();
    if (args.length > 1)
 zipname = args[1];
    else
 zipname = "test.zip";
    buffer = cast(byte[])std.file.read(zipname);
    zr = new std.zip.ZipArchive(cast(void[])buffer);
    printf("comment = '%.*s'\n", zr.comment);
    zr.print();

    foreach (ArchiveMember de; zr.directory)
    {
 de.print();
 printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));

 arrayPrint(de.compressedData);

 data = zr.expand(de);
 printf("data = '%.*s'\n", data);
    }

    printf("**Success**\n");

    zr = new std.zip.ZipArchive();
    ArchiveMember am = new ArchiveMember();
    am.compressionMethod = 8;
    am.name = "foo.bar";
    //am.extra = cast(ubyte[])"ExTrA";
    am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
yellow submarine";
    am.expandedSize = am.expandedData.length;
    zr.addMember(am);
    void[] data2 = zr.build();
    std.file.write("foo.zip", cast(byte[])data2);

    return 0;
}

void arrayPrint(ubyte[] array)
{
    //printf("array %p,%d\n", (void*)array, array.length);
    for (int i = 0; i < array.length; i++)
    {
 printf("%02x ", array[i]);
 if (((i + 1) & 15) == 0)
     printf("\n");
    }
    printf("\n\n");
}

void testzlib()
{
    ubyte[] src = cast(ubyte[])
"the quick brown fox jumps over the lazy dog\r
the quick brown fox jumps over the lazy dog\r
";
    ubyte[] dst;

    arrayPrint(src);
    dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    arrayPrint(dst);
    src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    arrayPrint(src);
}
Aug 28 2004
parent reply "Lynn Allan" <l.allan att.net> writes:
<alert comment="newbie">

I'm still having problems so I removed the std.file logic to better
illustrate the misbehavior I'm seeing. The exceptions thrown seem related to
the size of the buffer handled by std.zlib.uncompress. Or that I never
really woke up this morning???

The code is similar to Walter B.'s sample code for using zip, except using
larger buffers and doesn't "reuse" the original src buffer as the
destination of uncompress. Eventually, I want to read in a 1.1 meg plain
text file that has been compressed with std.zlib.compress from about 4.1
meg. The application will use std.zlib.uncompress and proceed. The original
uncompressed buffer will be read in from a file, but this simplified sample
code just uses arrays to check what happens when a plain text buffer is
compressed, and then uncompressed.

To summarize, main declares different text buffers of varying sizes and then
calls CompressThenUncompress. Oddly, the same CompressThenUncompress code
(below) that works for a buffer of 30 ubytes may fail inconsistently with 60
ubytes. The way the buffer is declared also seems to make a difference.

There seems to be a 'threshold' of about 50 bytes, but that isn't consistent
either. I suspect that I'm confused about declaring arrays of ubytes??

I've included the code below, which may be hard to view depending on word
wrap. It may be more viewable at:
http://dsource.org/forums/viewtopic.php?t=321

Am I doing something wrong or leaving out a step or three? The output from
running the program is shown at the bottom..

</alert>

// *******************************
// *******************************
import std.zlib;
import std.stdio;

void CompressThenUncompress (ubyte[] src)
{
  try {
    ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
    writef("src.length:  ", src.length, " dst: ", dst.length);
    ubyte[] uncompressedBuf;
    uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
    writefln(" ... Got past std.zlib.uncompress. dst.length: ", dst.length);
    assert(src.length == uncompressedBuf.length);
    assert(src == uncompressedBuf);
  }
  catch {
    writefln(" ... Exception thrown when src.length = ", src.length, ". Keep
going");
  }
}

char[] outerBuf30 =  "000000000011111111112222222222";
char[] outerBuf40 =  "0000000000111111111122222222223333333333";
char[] outerBuf50 =  "00000000001111111111222222222233333333334444444444";
char[] outerBuf100 = "00000000001111111111222222222233333333334444444444"
                     "01234567890123456789012345678901234567890123456789";

void main (char[][] args)
{
  char[] buf32 = "0123456789 0123456789 0123456789";
  CompressThenUncompress(cast(ubyte[])buf32);  // Works ok

  char[] buf40 = "0123456789 0123456789 0123456789 0123456";
  CompressThenUncompress(cast(ubyte[])buf40);  // Works ok

  char[] buf60 = "0123456789 0123456789 0123456789 0123456790 123456789
123456";
  CompressThenUncompress(cast(ubyte[])buf60);  // Throws exception

  ubyte[] ubuf60 = cast(ubyte[])"0123456789 0123456789 0123456789 "
                                "0123456790 123456789 123456";
  CompressThenUncompress(ubuf60);              // Throws exception

  char[] buf80 = "0123456789012345678901234567890123456789"
                 "0123456789012345678901234567890123456789";
  CompressThenUncompress(cast(ubyte[])buf80);  // Throws exception

  CompressThenUncompress(cast(ubyte[])"This string is 28 chars long");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 42 chars long "
                                      "0123456789012");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 46 chars long "
                                      "01234567890123456");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 60 chars long "
                                      "0123456789012345678901234567890");
//ok
  CompressThenUncompress(cast(ubyte[])"This string is 80 chars long "
                                      "0123456789012345678901234567890"
                                      "12345678901234567890");
//ok

  CompressThenUncompress(cast(ubyte[])outerBuf30);      // ok
  CompressThenUncompress(cast(ubyte[])outerBuf40);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf50);      // Throws exception
  CompressThenUncompress(cast(ubyte[])outerBuf100);     // Throws exception
}

// Results from running above code for different array declarations
src.length:  32 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  40 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
going
src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
going
src.length:  80 dst: 21 ... Exception thrown when src.length = 80. Keep
going
src.length:  28 dst: 34 ... Got past std.zlib.uncompress. dst.length: 34
src.length:  42 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  46 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  60 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  80 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
src.length:  30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
src.length:  40 dst: 19 ... Exception thrown when src.length = 40. Keep
going
src.length:  50 dst: 21 ... Exception thrown when src.length = 50. Keep
going
src.length: 100 dst: 33 ... Exception thrown when src.length = 100. Keep
going


"Walter" <newshound digitalmars.com> wrote in message
news:cgper0$2hoh$1 digitaldaemon.com...
 The following test program does a simple read/write of a zip file. It
might
 be helpful.
 ----------------------------

 import std.file;
 import std.date;
 import std.zip;
 import std.zlib;

 int main(char[][] args)
 {
     byte[] buffer;
     std.zip.ZipArchive zr;
     char[] zipname;
     ubyte[] data;

     testzlib();
     if (args.length > 1)
  zipname = args[1];
     else
  zipname = "test.zip";
     buffer = cast(byte[])std.file.read(zipname);
     zr = new std.zip.ZipArchive(cast(void[])buffer);
     printf("comment = '%.*s'\n", zr.comment);
     zr.print();

     foreach (ArchiveMember de; zr.directory)
     {
  de.print();
  printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));

  arrayPrint(de.compressedData);

  data = zr.expand(de);
  printf("data = '%.*s'\n", data);
     }

     printf("**Success**\n");

     zr = new std.zip.ZipArchive();
     ArchiveMember am = new ArchiveMember();
     am.compressionMethod = 8;
     am.name = "foo.bar";
     //am.extra = cast(ubyte[])"ExTrA";
     am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
 yellow submarine";
     am.expandedSize = am.expandedData.length;
     zr.addMember(am);
     void[] data2 = zr.build();
     std.file.write("foo.zip", cast(byte[])data2);

     return 0;
 }

 void arrayPrint(ubyte[] array)
 {
     //printf("array %p,%d\n", (void*)array, array.length);
     for (int i = 0; i < array.length; i++)
     {
  printf("%02x ", array[i]);
  if (((i + 1) & 15) == 0)
      printf("\n");
     }
     printf("\n\n");
 }

 void testzlib()
 {
     ubyte[] src = cast(ubyte[])
 "the quick brown fox jumps over the lazy dog\r
 the quick brown fox jumps over the lazy dog\r
 ";
     ubyte[] dst;

     arrayPrint(src);
     dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
     arrayPrint(dst);
     src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
     arrayPrint(src);
 }
Aug 28 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
It looks like a bug in std.zlib.uncompress. The code
    if (!destlen)
        destlen = srcbuf.length * 2 + 1;
doesn't always allocate enough space for the result. When I change the 1 to
100 (or something big like that) all the examples in your test work. I have
no idea what the "right" value should be. I was just playing around with
different values.

-Ben

Lynn Allan wrote:

 <alert comment="newbie">
 
 I'm still having problems so I removed the std.file logic to better
 illustrate the misbehavior I'm seeing. The exceptions thrown seem related
 to the size of the buffer handled by std.zlib.uncompress. Or that I never
 really woke up this morning???
 
 The code is similar to Walter B.'s sample code for using zip, except using
 larger buffers and doesn't "reuse" the original src buffer as the
 destination of uncompress. Eventually, I want to read in a 1.1 meg plain
 text file that has been compressed with std.zlib.compress from about 4.1
 meg. The application will use std.zlib.uncompress and proceed. The
 original uncompressed buffer will be read in from a file, but this
 simplified sample code just uses arrays to check what happens when a plain
 text buffer is compressed, and then uncompressed.
 
 To summarize, main declares different text buffers of varying sizes and
 then calls CompressThenUncompress. Oddly, the same CompressThenUncompress
 code (below) that works for a buffer of 30 ubytes may fail inconsistently
 with 60 ubytes. The way the buffer is declared also seems to make a
 difference.
 
 There seems to be a 'threshold' of about 50 bytes, but that isn't
 consistent either. I suspect that I'm confused about declaring arrays of
 ubytes??
 
 I've included the code below, which may be hard to view depending on word
 wrap. It may be more viewable at:
 http://dsource.org/forums/viewtopic.php?t=321
 
 Am I doing something wrong or leaving out a step or three? The output from
 running the program is shown at the bottom..
 
 </alert>
 
 // *******************************
 // *******************************
 import std.zlib;
 import std.stdio;
 
 void CompressThenUncompress (ubyte[] src)
 {
   try {
     ubyte[] dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
     writef("src.length:  ", src.length, " dst: ", dst.length);
     ubyte[] uncompressedBuf;
     uncompressedBuf = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
     writefln(" ... Got past std.zlib.uncompress. dst.length: ",
     dst.length); assert(src.length == uncompressedBuf.length);
     assert(src == uncompressedBuf);
   }
   catch {
     writefln(" ... Exception thrown when src.length = ", src.length, ".
     Keep
 going");
   }
 }
 
 char[] outerBuf30 =  "000000000011111111112222222222";
 char[] outerBuf40 =  "0000000000111111111122222222223333333333";
 char[] outerBuf50 =  "00000000001111111111222222222233333333334444444444";
 char[] outerBuf100 = "00000000001111111111222222222233333333334444444444"
                      "01234567890123456789012345678901234567890123456789";
 
 void main (char[][] args)
 {
   char[] buf32 = "0123456789 0123456789 0123456789";
   CompressThenUncompress(cast(ubyte[])buf32);  // Works ok
 
   char[] buf40 = "0123456789 0123456789 0123456789 0123456";
   CompressThenUncompress(cast(ubyte[])buf40);  // Works ok
 
   char[] buf60 = "0123456789 0123456789 0123456789 0123456790 123456789
 123456";
   CompressThenUncompress(cast(ubyte[])buf60);  // Throws exception
 
   ubyte[] ubuf60 = cast(ubyte[])"0123456789 0123456789 0123456789 "
                                 "0123456790 123456789 123456";
   CompressThenUncompress(ubuf60);              // Throws exception
 
   char[] buf80 = "0123456789012345678901234567890123456789"
                  "0123456789012345678901234567890123456789";
   CompressThenUncompress(cast(ubyte[])buf80);  // Throws exception
 
   CompressThenUncompress(cast(ubyte[])"This string is 28 chars long");
 //ok
   CompressThenUncompress(cast(ubyte[])"This string is 42 chars long "
                                       "0123456789012");
 //ok
   CompressThenUncompress(cast(ubyte[])"This string is 46 chars long "
                                       "01234567890123456");
 //ok
   CompressThenUncompress(cast(ubyte[])"This string is 60 chars long "
                                       "0123456789012345678901234567890");
 //ok
   CompressThenUncompress(cast(ubyte[])"This string is 80 chars long "
                                       "0123456789012345678901234567890"
                                       "12345678901234567890");
 //ok
 
   CompressThenUncompress(cast(ubyte[])outerBuf30);      // ok
   CompressThenUncompress(cast(ubyte[])outerBuf40);      // Throws
   exception
   CompressThenUncompress(cast(ubyte[])outerBuf50);      // Throws
   exception
   CompressThenUncompress(cast(ubyte[])outerBuf100);     // Throws
   exception
 }
 
 // Results from running above code for different array declarations
 src.length:  32 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
 src.length:  40 dst: 22 ... Got past std.zlib.uncompress. dst.length: 22
 src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
 going
 src.length:  60 dst: 28 ... Exception thrown when src.length = 60. Keep
 going
 src.length:  80 dst: 21 ... Exception thrown when src.length = 80. Keep
 going
 src.length:  28 dst: 34 ... Got past std.zlib.uncompress. dst.length: 34
 src.length:  42 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
 src.length:  46 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
 src.length:  60 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
 src.length:  80 dst: 46 ... Got past std.zlib.uncompress. dst.length: 46
 src.length:  30 dst: 16 ... Got past std.zlib.uncompress. dst.length: 16
 src.length:  40 dst: 19 ... Exception thrown when src.length = 40. Keep
 going
 src.length:  50 dst: 21 ... Exception thrown when src.length = 50. Keep
 going
 src.length: 100 dst: 33 ... Exception thrown when src.length = 100. Keep
 going
 
 
 "Walter" <newshound digitalmars.com> wrote in message
 news:cgper0$2hoh$1 digitaldaemon.com...
 The following test program does a simple read/write of a zip file. It
might
 be helpful.
 ----------------------------

 import std.file;
 import std.date;
 import std.zip;
 import std.zlib;

 int main(char[][] args)
 {
     byte[] buffer;
     std.zip.ZipArchive zr;
     char[] zipname;
     ubyte[] data;

     testzlib();
     if (args.length > 1)
  zipname = args[1];
     else
  zipname = "test.zip";
     buffer = cast(byte[])std.file.read(zipname);
     zr = new std.zip.ZipArchive(cast(void[])buffer);
     printf("comment = '%.*s'\n", zr.comment);
     zr.print();

     foreach (ArchiveMember de; zr.directory)
     {
  de.print();
  printf("date = '%.*s'\n", std.date.toString(std.date.toDtime(de.time)));

  arrayPrint(de.compressedData);

  data = zr.expand(de);
  printf("data = '%.*s'\n", data);
     }

     printf("**Success**\n");

     zr = new std.zip.ZipArchive();
     ArchiveMember am = new ArchiveMember();
     am.compressionMethod = 8;
     am.name = "foo.bar";
     //am.extra = cast(ubyte[])"ExTrA";
     am.expandedData = cast(ubyte[])"We all live in a yellow submarine, a
 yellow submarine";
     am.expandedSize = am.expandedData.length;
     zr.addMember(am);
     void[] data2 = zr.build();
     std.file.write("foo.zip", cast(byte[])data2);

     return 0;
 }

 void arrayPrint(ubyte[] array)
 {
     //printf("array %p,%d\n", (void*)array, array.length);
     for (int i = 0; i < array.length; i++)
     {
  printf("%02x ", array[i]);
  if (((i + 1) & 15) == 0)
      printf("\n");
     }
     printf("\n\n");
 }

 void testzlib()
 {
     ubyte[] src = cast(ubyte[])
 "the quick brown fox jumps over the lazy dog\r
 the quick brown fox jumps over the lazy dog\r
 ";
     ubyte[] dst;

     arrayPrint(src);
     dst = cast(ubyte[])std.zlib.compress(cast(void[])src);
     arrayPrint(dst);
     src = cast(ubyte[])std.zlib.uncompress(cast(void[])dst);
     arrayPrint(src);
 }
Aug 28 2004
next sibling parent reply "Lynn Allan" <l.allan att.net> writes:
Interesting ... I found the snippet you noted below in the phobos zlib code.
Does that mean that there isn't really a workaround for someone using
std.zlib? My impression is that std.zlib was ported from the original C
code.

 if (!destlen)
   destlen = srcbuf.length * 2 + 1;
"Ben Hinkle" <bhinkle4 juno.com> wrote in message news:<cgr95t$u3q$1 digitaldaemon.com>...
 It looks like a bug in std.zlib.uncompress. The code
 if (!destlen)
 destlen = srcbuf.length * 2 + 1;
 doesn't always allocate enough space for the result. When I change the 1
to
 100 (or something big like that) all the examples in your test work. I
have
 no idea what the "right" value should be. I was just playing around with
 different values.

 -Ben
Aug 28 2004
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
 My impression is that std.zlib was ported from the original C
 code.
interestingly, those lines of code do not appear in the original zlib source. i think what walter tried to do was "approximate" a buffer size, which is not really the best way to go about it, as the size of the uncompressed data is not necessarily (2*compressed)+1. it would be better just to fail than try to carry on half-assedly in this case. or, rather than returning a void[], it could accept an out void[] for the dest buffer. though it wouldn't be as elegant :P
Aug 28 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Ben Hinkle wrote:
 It looks like a bug in std.zlib.uncompress. The code
     if (!destlen)
         destlen = srcbuf.length * 2 + 1;
 doesn't always allocate enough space for the result. When I change the 1 to
 100 (or something big like that) all the examples in your test work. I have
 no idea what the "right" value should be. I was just playing around with
 different values.
Typical usage of zlib is to loop on inflate until all the data has been extracted--it looks like the current implementation is trying to do everything in one pass. I'd be happy to fix this, though I won't have time until tomorrow. Also, the core zlib inflate/deflate functions do not generate or parse a zip header. This process is only taken care of by the printf-type functions in the library (which don't operate on memory buffers). While not having a header is fine (and probably preferable) for application-specific data, it means that std.zlib will not be able to read or write zip files usable by other programs. I've written in-memory wrappers for zlib before that take care of this issue and would be happy to do something about it if folks are interested. For the free functions the best way to do this would be to add a bit parameter at the end to specify whether the header should be processed/generated. For the classes this could be a value passed on construction. Default would be to off. Frankly, it would be nice if the zlib routines didn't allocate a new buffer for every function call. Maybe a new set of functions that take both the input and output buffers as parameters? The output buffer might still have to grow if it's not big enough. Sean
Aug 29 2004
parent "Lynn Allan" <l.allan att.net> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:cgtc4a$1ojk$1 digitaldaemon.com...
 Ben Hinkle wrote:
 It looks like a bug in std.zlib.uncompress. The code
     if (!destlen)
         destlen = srcbuf.length * 2 + 1;
 doesn't always allocate enough space for the result. When I change the 1
to
 100 (or something big like that) all the examples in your test work. I
have
 no idea what the "right" value should be. I was just playing around with
 different values.
Typical usage of zlib is to loop on inflate until all the data has been extracted--it looks like the current implementation is trying to do everything in one pass. I'd be happy to fix this, though I won't have time until tomorrow.
I've posted as a std.zlib.decompress bug, and appreciate Sean K's offer to fix. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/1677 Lynn A.
Aug 30 2004