www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.zip expand: memory allocation failed

reply Selim Ozel <sozel wpi.edu> writes:
I am simply trying to unzip a compressed zip file slightly over 
1GB. The de-compressed size is about 4 GB.

The code is very similar to what's explained in the documentation 
[1] and it works for smaller files.

Anyone has a solution? Memory mapping [2] previously solved some 
part of my issue but expand is still throwing memory allocation 
failure.

Selim

[1] https://dlang.org/phobos/std_zip.html
[2] 
https://forum.dlang.org/thread/mfnleztnwrbgivjvzvdp forum.dlang.org
Oct 15 2021
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 15 October 2021 at 20:41:36 UTC, Selim Ozel wrote:
 I am simply trying to unzip a compressed zip file slightly over 
 1GB. The de-compressed size is about 4 GB.

 The code is very similar to what's explained in the 
 documentation [1] and it works for smaller files.

 Anyone has a solution? Memory mapping [2] previously solved 
 some part of my issue but expand is still throwing memory 
 allocation failure.

 Selim

 [1] https://dlang.org/phobos/std_zip.html
 [2] 
 https://forum.dlang.org/thread/mfnleztnwrbgivjvzvdp forum.dlang.org
Did you try the MmFile workaround?
Oct 16 2021
parent Selim Ozel <sozel wpi.edu> writes:
 Did you try the MmFile workaround?
I did. I also pinpointed the problem, I use x86_mscoff to run dub and it's specific to that architecture selection. It's related to MapViewOfFileEx [1]. I still haven't found a way around it though. [1] https://stackoverflow.com/questions/12121843/mapviewoffileex-valid-lpbaseaddress
Oct 23 2021
prev sibling parent reply Selim Ozel <sozel wpi.edu> writes:
On Friday, 15 October 2021 at 20:41:36 UTC, Selim Ozel wrote:
 I am simply trying to unzip a compressed zip file slightly over 
 1GB. The de-compressed size is about 4 GB.

 The code is very similar to what's explained in the 
 documentation [1] and it works for smaller files.

 Anyone has a solution? Memory mapping [2] previously solved 
 some part of my issue but expand is still throwing memory 
 allocation failure.

 Selim

 [1] https://dlang.org/phobos/std_zip.html
 [2] 
 https://forum.dlang.org/thread/mfnleztnwrbgivjvzvdp forum.dlang.org
It turns out my computer was literally running out of memory as the file was getting unzipped. For some reason to uncompress a 1-gig file with uncompressed size of 4-gig, Zip Archive of D-Lang tries to use more than 16 gig of RAM. I don't know why. Maybe I missed something. I use a Windows 10, DMD v2.091 with x86_mscoff. My work around was to call 7z from D Lang and do the compression over there. That worked like a charm. It seems that zip.d [1] calls uncompress routine from zlib.d [2]. Would calling zlib uncompress by chunks solve this memory issue? Any ideas? S [1] https://github.com/dlang/phobos/blob/master/std/zip.d [2] https://github.com/dlang/phobos/blob/master/std/zlib.d
Oct 24 2021
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Sunday, 24 October 2021 at 12:00:39 UTC, Selim Ozel wrote:
 On Friday, 15 October 2021 at 20:41:36 UTC, Selim Ozel wrote:
 [...]
It turns out my computer was literally running out of memory as the file was getting unzipped. For some reason to uncompress a 1-gig file with uncompressed size of 4-gig, Zip Archive of D-Lang tries to use more than 16 gig of RAM. I don't know why. Maybe I missed something. I use a Windows 10, DMD v2.091 with x86_mscoff. My work around was to call 7z from D Lang and do the compression over there. That worked like a charm. It seems that zip.d [1] calls uncompress routine from zlib.d [2]. Would calling zlib uncompress by chunks solve this memory issue? Any ideas? S [1] https://github.com/dlang/phobos/blob/master/std/zip.d [2] https://github.com/dlang/phobos/blob/master/std/zlib.d
Create an issue and we can solve it
Oct 24 2021
parent Selim Ozel <sozel wpi.edu> writes:
On Sunday, 24 October 2021 at 14:14:08 UTC, Imperatorn wrote:
 Create an issue and we can solve it
Thanks. I opened an issue. https://issues.dlang.org/show_bug.cgi?id=22436
Oct 25 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/24/21 8:00 AM, Selim Ozel wrote:

 It turns out my computer was literally running out of memory as the file 
 was getting unzipped. For some reason  to uncompress a 1-gig file with 
 uncompressed size of 4-gig, Zip Archive of D-Lang tries to use more than 
 16 gig of RAM. I don't know why. Maybe I missed something. I use a 
 Windows 10, DMD v2.091 with x86_mscoff.
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Oct 25 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 25 October 2021 at 20:50:40 UTC, Steven Schveighoffer 
wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 It turns out my computer was literally running out of memory 
 as the file was getting unzipped. For some reason  to 
 uncompress a 1-gig file with uncompressed size of 4-gig, Zip 
 Archive of D-Lang tries to use more than 16 gig of RAM. I 
 don't know why. Maybe I missed something. I use a Windows 10, 
 DMD v2.091 with x86_mscoff.
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Oct 25 2021
parent reply bauss <jj_1337 live.dk> writes:
On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 On Monday, 25 October 2021 at 20:50:40 UTC, Steven 
 Schveighoffer wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 It turns out my computer was literally running out of memory 
 as the file was getting unzipped. For some reason  to 
 uncompress a 1-gig file with uncompressed size of 4-gig, Zip 
 Archive of D-Lang tries to use more than 16 gig of RAM. I 
 don't know why. Maybe I missed something. I use a Windows 10, 
 DMD v2.091 with x86_mscoff.
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
Oct 25 2021
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 26 October 2021 at 06:32:21 UTC, bauss wrote:
 On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 On Monday, 25 October 2021 at 20:50:40 UTC, Steven 
 Schveighoffer wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 [...]
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
Exactly my thoughts
Oct 26 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/26/21 2:32 AM, bauss wrote:
 On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 On Monday, 25 October 2021 at 20:50:40 UTC, Steven Schveighoffer wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 It turns out my computer was literally running out of memory as the 
 file was getting unzipped. For some reason  to uncompress a 1-gig 
 file with uncompressed size of 4-gig, Zip Archive of D-Lang tries to 
 use more than 16 gig of RAM. I don't know why. Maybe I missed 
 something. I use a Windows 10, DMD v2.091 with x86_mscoff.
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
You guys aren't getting it: ``` ubyte[] expand(ArchiveMember de); Decompress the contents of a member. Fills in properties extractVersion, flags, compressionMethod, time, crc32, compressedSize, expandedSize, expandedData[], name[], extra[]. ``` Where is it supposed to store that `ubyte[]`? -Steve
Oct 26 2021
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 26 October 2021 at 13:43:36 UTC, Steven Schveighoffer 
wrote:
 On 10/26/21 2:32 AM, bauss wrote:
 On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 On Monday, 25 October 2021 at 20:50:40 UTC, Steven 
 Schveighoffer wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 [...]
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
You guys aren't getting it: ``` ubyte[] expand(ArchiveMember de); Decompress the contents of a member. Fills in properties extractVersion, flags, compressionMethod, time, crc32, compressedSize, expandedSize, expandedData[], name[], extra[]. ``` Where is it supposed to store that `ubyte[]`? -Steve
That's the current implementation. I don't know about *nix, but my Windows machine can easily extract a file bigger than my RAM. It ofc also depends on the dictionary.
Oct 26 2021
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 26 October 2021 at 17:38:22 UTC, Imperatorn wrote:
 On Tuesday, 26 October 2021 at 13:43:36 UTC, Steven 
 Schveighoffer wrote:
 On 10/26/21 2:32 AM, bauss wrote:
 On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 [...]
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
You guys aren't getting it: ``` ubyte[] expand(ArchiveMember de); Decompress the contents of a member. Fills in properties extractVersion, flags, compressionMethod, time, crc32, compressedSize, expandedSize, expandedData[], name[], extra[]. ``` Where is it supposed to store that `ubyte[]`? -Steve
That's the current implementation. I don't know about *nix, but my Windows machine can easily extract a file bigger than my RAM. It ofc also depends on the dictionary.
The biggest file I've ever decompressed on my own hardware was about 200 GB. Needless to say, it wasn't using the algorithm in std.zip
Oct 26 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/26/21 1:38 PM, Imperatorn wrote:

 
 That's the current implementation.
No, that's the API. You cannot fix the implementation with that API and not end up allocating an array to hold the entire unzipped contents. You can't even decompress to a file, and then mmap those contents -- the address space isn't there.
 I don't know about *nix, but my Windows machine can easily extract a 
 file bigger than my RAM.
zlib does not require decompressing in entirety. This is just the way that std.zip decided to expose the API. e.g. iopipe uses an expandable buffer for decompression and compression, which does not have to contain the entire file. -Steve
Oct 26 2021
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 26 October 2021 at 20:33:17 UTC, Steven Schveighoffer 
wrote:
 On 10/26/21 1:38 PM, Imperatorn wrote:

 
 That's the current implementation.
No, that's the API. You cannot fix the implementation with that API and not end up allocating an array to hold the entire unzipped contents. You can't even decompress to a file, and then mmap those contents -- the address space isn't there.
 I don't know about *nix, but my Windows machine can easily 
 extract a file bigger than my RAM.
zlib does not require decompressing in entirety. This is just the way that std.zip decided to expose the API. e.g. iopipe uses an expandable buffer for decompression and compression, which does not have to contain the entire file. -Steve
Yes, that's how it's written. That's the problem.
Oct 26 2021
prev sibling parent bauss <jj_1337 live.dk> writes:
On Tuesday, 26 October 2021 at 13:43:36 UTC, Steven Schveighoffer 
wrote:
 On 10/26/21 2:32 AM, bauss wrote:
 On Monday, 25 October 2021 at 22:38:38 UTC, Imperatorn wrote:
 On Monday, 25 October 2021 at 20:50:40 UTC, Steven 
 Schveighoffer wrote:
 On 10/24/21 8:00 AM, Selim Ozel wrote:

 It turns out my computer was literally running out of 
 memory as the file was getting unzipped. For some reason  
 to uncompress a 1-gig file with uncompressed size of 4-gig, 
 Zip Archive of D-Lang tries to use more than 16 gig of RAM. 
 I don't know why. Maybe I missed something. I use a Windows 
 10, DMD v2.091 with x86_mscoff.
Wait, x86 is 32-bit. Max address space is 4GB. So maybe it was just trying to use 4GB and running out of memory? -Steve
Good catch, but still, should it use so much memory?
Definitely not. It shouldn't use a lot of memory when unzipping as it should be done in chunks!
You guys aren't getting it: ``` ubyte[] expand(ArchiveMember de); Decompress the contents of a member. Fills in properties extractVersion, flags, compressionMethod, time, crc32, compressedSize, expandedSize, expandedData[], name[], extra[]. ``` Where is it supposed to store that `ubyte[]`? -Steve
It's not supposed, but a new implementation can utilize something like a stream.
Oct 27 2021