www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Working with files over 2GB in D2

reply dsimcha <dsimcha yahoo.com> writes:
Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
overflow bugs (I haven't isolated them yet) and can't return their size
correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
apparently takes an int when it should take a long, and std.file only supports
reading the whole file, which I can't do in 2GB address space.

It appears none of the file I/O on Phobos has been tested on huge files (until
now).
Oct 16 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.
 
 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

What platform are you using? You should report your issue on bugzilla. I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie
Oct 16 2009
next sibling parent Jeremie Pelletier <jeremiep gmail.com> writes:
Jeremie Pelletier wrote:
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream 
 has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because 
 fseek()
 apparently takes an int when it should take a long, and std.file only 
 supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge 
 files (until
 now).

What platform are you using? You should report your issue on bugzilla. I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

I meant SetFilePointerEx :x
Oct 16 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

Mostly Linux. Everything seems to be working on Windows, though I haven't tested it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?
Oct 16 2009
next sibling parent reply Frank Benoit <keinfarbton googlemail.com> writes:
dsimcha schrieb:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

Mostly Linux. Everything seems to be working on Windows, though I haven't tested it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?

Tango has full support for that. On linux platform, there are two C APIs, one up to 2GB and one for LFS - Large File Support.
Oct 16 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Frank Benoit wrote:
 dsimcha schrieb:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?

Tango has full support for that. On linux platform, there are two C APIs, one up to 2GB and one for LFS - Large File Support.

I just had a quick peek at std.stdio, it is using the C standard library for file I/O on every platform. Phobos should support the CreateFile related APIs on windows and LFS on linux to get around quirks like that 2Gb limitation. Jeremie
Oct 16 2009
parent reply Frank Benoit <keinfarbton googlemail.com> writes:
Jeremie Pelletier schrieb:
 Frank Benoit wrote:
 dsimcha schrieb:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2? 
 std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their
 size
 correctly, std.stdio.File throws a ConvOverflowError in seek()
 because fseek()
 apparently takes an int when it should take a long, and std.file
 only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge
 files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

haven't tested it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?

Tango has full support for that. On linux platform, there are two C APIs, one up to 2GB and one for LFS - Large File Support.

I just had a quick peek at std.stdio, it is using the C standard library for file I/O on every platform. Phobos should support the CreateFile related APIs on windows and LFS on linux to get around quirks like that 2Gb limitation. Jeremie

In Tango search for "__USE_LARGEFILE64" to find the relevant places. Not only other functions are used, also types and structures are different.
Oct 17 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
language_fan wrote:
 Sat, 17 Oct 2009 10:58:15 +0200, Frank Benoit thusly wrote:
 
 In Tango search for "__USE_LARGEFILE64" to find the relevant places. Not
 only other functions are used, also types and structures are different.

I think there was some talk about merging Tango and Phobos, but now since Tango has been abandoned (no D2 port is planned it seems), would it make sense to rewrite those parts of Tango that are missing in Phobos, and license them using a more liberal practical license?

Abandoned?! Nobody has abandoned Tango. Tango hasn't been ported to D2 because it's too much of a moving target.
Oct 17 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
Christopher Wright wrote:
 language_fan wrote:
 Sat, 17 Oct 2009 10:58:15 +0200, Frank Benoit thusly wrote:

 In Tango search for "__USE_LARGEFILE64" to find the relevant places. Not
 only other functions are used, also types and structures are different.

I think there was some talk about merging Tango and Phobos, but now since Tango has been abandoned (no D2 port is planned it seems), would it make sense to rewrite those parts of Tango that are missing in Phobos, and license them using a more liberal practical license?

Abandoned?! Nobody has abandoned Tango. Tango hasn't been ported to D2 because it's too much of a moving target.

I think he took the april 1st post seriously about tango moving to python :)
Oct 17 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

Mostly Linux. Everything seems to be working on Windows, though I haven't tested it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?

No, but I'd be glad to fix any bugs you may find in std.stdio. I fixed a couple myself, but it looks there are more to go. Andrei
Oct 17 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 dsimcha wrote:
 == Quote from Jeremie Pelletier (jeremiep gmail.com)'s article
 dsimcha wrote:
 Does anyone know how to work with huge (2GB+) files in D2?  std.stream has
 overflow bugs (I haven't isolated them yet) and can't return their size
 correctly, std.stdio.File throws a ConvOverflowError in seek() because fseek()
 apparently takes an int when it should take a long, and std.file only supports
 reading the whole file, which I can't do in 2GB address space.

 It appears none of the file I/O on Phobos has been tested on huge files (until
 now).

I had similar issues on windows when using stdio's fseek and ftell, I had no problems using GetFilePointerEx, you could try that while it is fixed. Jeremie

Mostly Linux. Everything seems to be working on Windows, though I haven't tested it that thoroughly. I will file Bugzillas eventually, but I'm still trying to understand some of these issues, i.e. to what extent they're limitations vs. real bugs. What I'm really interested in knowing is: 1. To what extent is the fact that working with 2GB+ files a platform limitation rather than a real bug? (I vaguely understand that it has to do with files being indexed by signed ints, but I don't know the details of how it's implemented on each platform and what is different between platforms.) 2. Does anyone know of a method of doing file I/O in D2 that is well-tested with files above 2GB?

couple myself, but it looks there are more to go. Andrei

Yeah, I've filed a few Bugzillas. I really didn't anticipate large file support not being there and need it badly pronto, but would be willing to help out to make that happen.
Oct 17 2009
prev sibling parent language_fan <foo bar.com.invalid> writes:
Sat, 17 Oct 2009 10:58:15 +0200, Frank Benoit thusly wrote:

 In Tango search for "__USE_LARGEFILE64" to find the relevant places. Not
 only other functions are used, also types and structures are different.

I think there was some talk about merging Tango and Phobos, but now since Tango has been abandoned (no D2 port is planned it seems), would it make sense to rewrite those parts of Tango that are missing in Phobos, and license them using a more liberal practical license?
Oct 17 2009