www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ranges longer than size_t.max

reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
There was a discussion a while ago about the type returned by 
.length for ranges. I believe the conclusion was that it should 
always be size_t.

My question now is what to do with really large ranges? For 
example, if we have a range of permutations of the elements in a 
set, when the set is larger than 21 the number of permutations is 
21 factorial, which is over 2^64. As far as I see, there are 
three options:

1. Allow ranges to return BigInt for length.
2. Allow ranges like this but assert when .length overflows.
3. Allow ranges like this and just allow .length to overflow 
dangerously.
4. Do not allow ranges like this.

Option 1 solves the problem, but significantly complicates Phobos 
development for the rare case.

Option 2 works, and is practical, although runtime asserts are 
undesirable.

Option 3 is current Phobos practice (presumably because overflow 
is unlikely). See example below. This may be acceptable 
currently, but perhaps less so when overflow is more likely.
--------------------------------
auto a = iota(size_t.max / 2 + 1);	
auto b = chain(a, a);
writeln(a.length); // 9223372036854775808
writeln(b.length); // 0
--------------------------------

Option 4 works, but it would be a shame to lose these ranges.

Thoughts?
Dec 28 2012
next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On 2012-18-29 02:12, Peter Alexander <peter.alexander.au gmail.com> wrote:

 There was a discussion a while ago about the type returned by .length  
 for ranges. I believe the conclusion was that it should always be size_t.

 My question now is what to do with really large ranges? For example, if  
 we have a range of permutations of the elements in a set, when the set  
 is larger than 21 the number of permutations is 21 factorial, which is  
 over 2^64. As far as I see, there are three options:

 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow dangerously.
 4. Do not allow ranges like this.

 Option 1 solves the problem, but significantly complicates Phobos  
 development for the rare case.

 Option 2 works, and is practical, although runtime asserts are  
 undesirable.

 Option 3 is current Phobos practice (presumably because overflow is  
 unlikely). See example below. This may be acceptable currently, but  
 perhaps less so when overflow is more likely.
 --------------------------------
 auto a = iota(size_t.max / 2 + 1);	
 auto b = chain(a, a);
 writeln(a.length); // 9223372036854775808
 writeln(b.length); // 0
 --------------------------------

 Option 4 works, but it would be a shame to lose these ranges.

 Thoughts?

64 bits ought to be enough for anyone. :p This is a bit of a tough one, and I believe the root problem is that user-defined types are second-class citizens. Specifically, CommonType is incapable of finding the common type of BigInt and int. This makes it hard to find the correct return type for e.g. chain. Behind the scenes, CommonType is using ?: to figure out the correct type. If D supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?: could figure out the correct type, and chain's .length could be: private template LengthType(Range) { alias typeof(Range.length) LengthType; } property auto length() { CommonType!(staticMap!(LengthType, R)) result = 0; foreach (i, Unused; R) { result += source[i].length; } return result; } This would thus support BigInt .length just as well as size_t, byte, or whatever. -- Simen
Dec 28 2012
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
 There was a discussion a while ago about the type returned by
 .length for ranges. I believe the conclusion was that it should
 always be size_t.
 
 My question now is what to do with really large ranges? For
 example, if we have a range of permutations of the elements in a
 set, when the set is larger than 21 the number of permutations is
 21 factorial, which is over 2^64. As far as I see, there are
 three options:
 
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow
 dangerously.
 4. Do not allow ranges like this.
 
 Option 1 solves the problem, but significantly complicates Phobos
 development for the rare case.
 
 Option 2 works, and is practical, although runtime asserts are
 undesirable.
 
 Option 3 is current Phobos practice (presumably because overflow
 is unlikely). See example below. This may be acceptable
 currently, but perhaps less so when overflow is more likely.
 --------------------------------
 auto a = iota(size_t.max / 2 + 1);
 auto b = chain(a, a);
 writeln(a.length); // 9223372036854775808
 writeln(b.length); // 0
 --------------------------------
 
 Option 4 works, but it would be a shame to lose these ranges.
 
 Thoughts?

I'd be very strongly inclined to go with #4 and just say that anyone who actually cares about numbers that large should use 64-bit. Needing ranges of length greater than uint.max is definitely not the norm, and I would expect people caring that much about computational stuff to be using 64-bit anyway, since odds are they'll need the memory. Allowing for length to be anything other than size_t is extremely annoying for generic code. - Jonathan M Davis
Dec 28 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/28/12 9:21 PM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
 On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
 I'd be very strongly inclined to go with #4 and just say that anyone who
 actually cares about numbers that large should use 64-bit. Needing
 ranges of
 length greater than uint.max is definitely not the norm, and I would
 expect
 people caring that much about computational stuff to be using 64-bit
 anyway,
 since odds are they'll need the memory. Allowing for length to be
 anything
 other than size_t is extremely annoying for generic code.

Sorry, I don't think I was clear on what the issue is. I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases. Andrei
Dec 28 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/29/12 4:00 AM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
 isn't even enough (for example in the case of a permutations range). I'm
 asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflow

I'd say give it a different name, i.e. bigLength. Andrei
Dec 29 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/29/12 3:38 PM, Peter Alexander wrote:
 How about, have .length, and allow it to dangerously overflow, but also
 provide .bigLength, which returns a BigInt for when people need it?

I'd think you convinced yourself it's a bad idea while writing it. I did the same :o). Andrei
Dec 29 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12/29/2012 1:30 PM, Jonathan M Davis пишет:
 On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis

 wrote:
 I'd be very strongly inclined to go with #4 and just say that
 anyone who actually cares about numbers that large should use
 64-bit.

Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?)

You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know.

s/array/slice It maps a view of a part of file. Obviously one can't map parts greater then address space.
 Plenty of stuff right now in Phobos isn't going to work right now if you try
 and have a length of 64 bits on a 32-bit system. size_t is used pretty much
 everywhere. And it's going to make generic code a _lot_ less pleasant if we
 have to worry about using anything other than size_t. It's already been
 causing us problems that iota does, and we've discussed making it so that it
 doesn't. But even if we allow ulong for length and indexing on 32-bit systems,
 that's a completely different ball game from permitting BigInt.

 So, I'd _very_ much like to always restruct length and indices to size_t for
 ranges. It will make all of the generic handling of that stuff far more
 pleasant. But we may very well be forced to permit ulong on 32-bit systems due
 to practical concerns such as large file handling on 32-bit systems.

I think 32-bit version have to bite the bullet and try ulong/uint deduction.
 Oh, how I wish that 32-bit systems would just die out.
It would make so many
 things so much easier. No such luck on that yet though, much as most PC
 processors are 64-bit now.

Yeah, bad sadly they are practical ;) As 32bit is now dominating MCU world... -- Dmitry Olshansky
Dec 29 2012
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12/30/2012 10:23 PM, SomeDude пишет:
 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?

 What about large files (> 4 GiB) on 32-bit systems?

 (Say, if a range wants to provide random access to the bytes of an 8
 GiB file?)

 And what about large bit vectors? A 600 MiB bit vector needs > 2^32
 bits for length...

AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

False. It's solely a question of FS used. NTFS supports files larger then 4 Gb regardless of version Windows (Win2K+ or even earlier). It doesn't matter what the bitness of OS in question is. I suspect 32bit linux also has > 4Gb files even with ext2 no problem. And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS. -- Dmitry Olshansky
Dec 30 2012
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 31/12/2012 17:50, Era Scarecrow wrote:
 On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky wrote:
 False. It's solely a question of FS used.
 NTFS supports files larger then 4 Gb regardless of version Windows
 (Win2K+ or even earlier). It doesn't matter what the bitness of OS in
 question is. I suspect 32bit linux also has > 4Gb files even with ext2
 no problem.

 And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.

I was writing some code to go through the FAT12/16/32, and some interesting information was found (Although there was so much overhead I kinda stopped in the middle of it). FAT32 actually uses something like 28/29 bits for the sector id, the other 3-4 bits were flags like bad sectors, last sector and other minor data. At least that's what I remember off hand.

Sector? Do you mean cluster? I would have thought it used the whole 32 bits for cluster number, with magic values for "unused", "end of chain" and "bad". In each case you don't need to point to the next cluster as well. Unless it supports something like marking an in-use cluster as bad but leaving until another day the task of moving the data from it into a good cluster. But looking through
   http://en.wikipedia.org/wiki/Fat32

there are indeed a handful of magic values for things like this. Anyway, that states that it uses 28 bits for the cluster number, but nothing about what the other 4 bits are for. But a possibility I can see is that these 4 bits were reserved for bit flags that may be added in the future. Stewart.
Dec 31 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:
 On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
 I'd be very strongly inclined to go with #4 and just say that 
 anyone who
 actually cares about numbers that large should use 64-bit. 
 Needing ranges of
 length greater than uint.max is definitely not the norm, and I 
 would expect
 people caring that much about computational stuff to be using 
 64-bit anyway,
 since odds are they'll need the memory. Allowing for length to 
 be anything
 other than size_t is extremely annoying for generic code.

Sorry, I don't think I was clear on what the issue is. I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes isn't even enough (for example in the case of a permutations range). I'm asking if allowing length to return BigInt would be reasonable.
Dec 28 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas 
wrote:
    property auto length()
   {
       CommonType!(staticMap!(LengthType, R)) result = 0;
       foreach (i, Unused; R)
       {
           result += source[i].length;
       }
       return result;
   }

 This would thus support BigInt .length just as well as size_t, 
 byte,
 or whatever.

This doesn't solve the issue. For example, even with your update, the code snippet I provided in the original post with iota and chain would still break (assuming iota(size_t.max).length returns a size_t). Also, if we allow BigInt, even with better support in CommonType it would significantly complicate Phobos with LengthType!Range popping up everywhere.
Dec 28 2012
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:
 I'd be very strongly inclined to go with #4 and just say that 
 anyone who actually cares about numbers that large should use 
 64-bit.

Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?) And what about large bit vectors? A 600 MiB bit vector needs > 2^32 bits for length...
Dec 28 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei 
Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit 
 sometimes
 isn't even enough (for example in the case of a permutations 
 range). I'm
 asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflow
Dec 29 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei
 
 Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit
 sometimes
 isn't even enough (for example in the case of a permutations
 range). I'm
 asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflow

I expect that it'll be silently overflow in most cases, because that's a lot simpler and the problem is almost never an issue. It also incurs a performance penalty in non-release mode where there's almost never any benefit from it. But if you want to provide pull requests with assertions for overflow in various Phobos algorithms, then they'll probably be merged in. - Jonathan M Davis
Dec 29 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
 
 wrote:
 I'd be very strongly inclined to go with #4 and just say that
 anyone who actually cares about numbers that large should use
 64-bit.

Uhm, 64-bit? What about large files (> 4 GiB) on 32-bit systems? (Say, if a range wants to provide random access to the bytes of an 8 GiB file?)

You bring up a good point, but not even arrays support that, so stuff like std.mmfile can't possibly work with large files on 32-bit systems, not with the current design anyway (as it gives you an array to operate on). It may be that you just need to do something different if you want to operate on large files on 32-bit systems. I don't know. Plenty of stuff right now in Phobos isn't going to work right now if you try and have a length of 64 bits on a 32-bit system. size_t is used pretty much everywhere. And it's going to make generic code a _lot_ less pleasant if we have to worry about using anything other than size_t. It's already been causing us problems that iota does, and we've discussed making it so that it doesn't. But even if we allow ulong for length and indexing on 32-bit systems, that's a completely different ball game from permitting BigInt. So, I'd _very_ much like to always restruct length and indices to size_t for ranges. It will make all of the generic handling of that stuff far more pleasant. But we may very well be forced to permit ulong on 32-bit systems due to practical concerns such as large file handling on 32-bit systems. Oh, how I wish that 32-bit systems would just die out. It would make so many things so much easier. No such luck on that yet though, much as most PC processors are 64-bit now. - Jonathan M Davis
Dec 29 2012
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 29 December 2012 at 09:31:02 UTC, Jonathan M Davis 
wrote:
 not even arrays support that, so stuff like std.mmfile can't 
 possibly work with large files on 32-bit systems

I didn't mean mmfile, I meant random access to the bytes of a regular file. No need for mapping it to memory.
Dec 29 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 29 December 2012 at 14:23:09 UTC, Andrei 
Alexandrescu wrote:
 On 12/29/12 4:00 AM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei 
 Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit 
 sometimes
 isn't even enough (for example in the case of a permutations 
 range). I'm
 asking if allowing length to return BigInt would be 
 reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

I agree, but the range of permutations will be longer than 2^64 on occasion, so what do we do? * Don't provide .length at all * Provide it as size_t, and assert on overflow * Provide it as size_t, and silently overflow

I'd say give it a different name, i.e. bigLength.

But then the range doesn't have a length, even though it would be useful to have it in a lot of cases. How about, have .length, and allow it to dangerously overflow, but also provide .bigLength, which returns a BigInt for when people need it?
Dec 29 2012
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 29/12/2012 01:18, Peter Alexander wrote:
 There was a discussion a while ago about the type returned by .length
 for ranges. I believe the conclusion was that it should always be size_t.

 My question now is what to do with really large ranges? For example, if
 we have a range of permutations of the elements in a set, when the set
 is larger than 21 the number of permutations is 21 factorial, which is
 over 2^64. As far as I see, there are three options:

They look like four to me.
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow dangerously.
 4. Do not allow ranges like this.

5. Don't allow such ranges to define a .length property. From the std.range documentation: template hasLength(R) Returns true if R has a length member that returns an integral type. R does not have to be a range. Note that length is an optional primitive as no range must implement it. Some ranges do not store their length explicitly, some cannot compute it without actually exhausting the range (e.g. socket streams), and some other ranges may be infinite. There you go. For reasons such as this, ranges aren't required to define a length property. As well as I/O ranges and infinite ranges, ranges where the length is liable to be greater than ulong.max are among those that naturally won't have this property. If we define a length that may overflow, then sooner or later some code will be called on it that calls hasLength on the range, finds it to be true, and then relies on length and malfunctions because the value returned doesn't match the actual length. Stewart.
Dec 29 2012
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 29/12/2012 21:13, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:

 If we define a length that may overflow, then sooner or later some
 code will be called on it that calls hasLength on the range, finds it
 to be true, and then relies on length and malfunctions because the
 value returned doesn't match the actual length.

Using this logic, chain shouldn't have length either (it can overflow, as I demonstrated in the original post), but it does, and it should. There are many ranges in Phobos that may overflow in length.

Yes, you have a point there. Any facility for creating a range by combining ranges is liable to overflow if it defines a length property. But I would expect that in practice 99% of finite ranges would be nowhere near 2^64 in size, so an overflow would be an exceptional case.
 The problem with permutations is that, for the kinds of permutations you
 are likely to use often, length will not overflow. It would be really
 useful to have length available when this is the case.

Maybe the solution is to define two permutation range classes: one with length that asserts that the number of elements <= 20, and one without length that places no bound. If the set size is a template parameter rather than being set at run time, then it can define length conditionally. Stewart.
Dec 30 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon 
wrote:
 On 29/12/2012 01:18, Peter Alexander wrote:
 My question now is what to do with really large ranges? For 
 example, if
 we have a range of permutations of the elements in a set, when 
 the set
 is larger than 21 the number of permutations is 21 factorial, 
 which is
 over 2^64. As far as I see, there are three options:

They look like four to me.

It's also three, for large values of three :-)
 If we define a length that may overflow, then sooner or later 
 some code will be called on it that calls hasLength on the 
 range, finds it to be true, and then relies on length and 
 malfunctions because the value returned doesn't match the 
 actual length.

Using this logic, chain shouldn't have length either (it can overflow, as I demonstrated in the original post), but it does, and it should. There are many ranges in Phobos that may overflow in length. The problem with permutations is that, for the kinds of permutations you are likely to use often, length will not overflow. It would be really useful to have length available when this is the case.
Dec 29 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon
wrote:
 5. Don't allow such ranges to define a .length property.

I still think there is a fundamental difference between: 5a: Don't allow such ranges to define a .length property. and 5b: Don't guarantee *support* for such ranges. The current status quo in phobos is "5b" BTW. There are a few select algorithms that do support it, but only because the implementers decided to go the extra mile, and it usually makes things more complicated than it needs to be. BTW iota, in certain cases, will have a length returned in ulong. This is used to test *certain* algorithms, but really, nothing is guaranteed. At the very least, there was a discussion some time ago that for a (finite) RA range, range[range.length] must compile. Ditto for slicing. This was never enforced though... Which is a shame (IMO)... ...Well, I was the one to suggest that actually, but I never did it, so blame falls on me :/ I'll put it back on my todo list.
Dec 29 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 29, 2012 22:18:53 monarch_dodra wrote:
 At the very least, there was a discussion some time ago that for
 a (finite) RA range, range[range.length] must compile. Ditto for
 slicing.
 
 This was never enforced though... Which is a shame (IMO)...
 ...Well, I was the one to suggest that actually, but I never did
 it, so blame falls on me :/
 
 I'll put it back on my todo list.

It's now enforced that finite random access ranges have length, so we're part of the way there. The big thing that we really need though is for issue# 7177 to be implemented. http://d.puremagic.com/issues/show_bug.cgi?id=7177 If that gets implemented, then we can require that random access ranges and ranges with slicing fully support $. We'd probably still need to check range[range.length] though, just to check that length and indexing are properly compatible. - Jonathan M Davis
Dec 29 2012
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?

 What about large files (> 4 GiB) on 32-bit systems?

 (Say, if a range wants to provide random access to the bytes of 
 an 8 GiB file?)

 And what about large bit vectors? A 600 MiB bit vector needs > 
 2^32 bits for length...

AFAIK, no 32-bit OS supports files larger than 4Gb anyway.
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 30, 2012 19:23:13 SomeDude wrote:
 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?
 
 What about large files (> 4 GiB) on 32-bit systems?
 
 (Say, if a range wants to provide random access to the bytes of
 an 8 GiB file?)
 
 And what about large bit vectors? A 600 MiB bit vector needs >
 2^32 bits for length...

AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Um. No. There's a reason that things like the macro _LARGE_FILE_API exist. While some file systems won't allow larger file sizes, that's generally an FS limitation, not an OS limitation. The OSes provide ways to operate on 64-bit files on 32-bit systems. If they didn't, it would be impossible to do something like read a Blu-ray on a 32-bit system. http://en.wikipedia.org/wiki/Large_file_support - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sun, 30 Dec 2012 19:23:13 +0100
schrieb "SomeDude" <lovelydear mailmetrash.com>:

 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Anachronism!!! 4.7 GB DVD burners: ~1997 64-bit personal computers: 2003 (Also backups of large partitions into disk images were made) That said, my father owns a stand-alone, multi-media, USB disk drive, that cannot store files > 4GB because of the used FAT32. -- Marco
Dec 31 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.
Dec 31 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow wrote:
 On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
 wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.

Two problems with this: 1. Still complicates range code (have to accomodate for both size_t and ucent return value for .length). 2. ucent probably isn't enough either for these sorts of ranges.
Dec 31 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky 
wrote:
 False. It's solely a question of FS used.
 NTFS supports files larger then 4 Gb regardless of version 
 Windows (Win2K+ or even earlier). It doesn't matter what the 
 bitness of OS in question is. I suspect 32bit linux also has > 
 4Gb files even with ext2 no problem.

 And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.

I was writing some code to go through the FAT12/16/32, and some interesting information was found (Although there was so much overhead I kinda stopped in the middle of it). FAT32 actually uses something like 28/29 bits for the sector id, the other 3-4 bits were flags like bad sectors, last sector and other minor data. At least that's what I remember off hand. http://en.wikipedia.org/wiki/Fat32
Dec 31 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Monday, 31 December 2012 at 17:47:36 UTC, Peter Alexander 
wrote:
 On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow 
 wrote:
 On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
 wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

Option 5: Incorporate (emulation code?) for cent/ucent, then consider that the same as 1 but returning cent rather than BitInt.

Two problems with this: 1. Still complicates range code (have to accomodate for both size_t and ucent return value for .length). 2. ucent probably isn't enough either for these sorts of ranges.

Perhaps, but having cent/ucent available would make a few people happy at least, although except for a few cases with databases and encryption it may not be that important right now. I remember trying to learn C/C++ and trying to do a very very simple code to get the number 1 doubled something like 100 times. After 31 it went to garbage (hey I was 14 at the time, god that was a long time ago). It took me a while to understand the problem. I ended up re-writing the whole thing in assembly that took 101 bytes (COM file) and could double the number as many times as I wanted.
Dec 31 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow wrote:
  Perhaps, but having cent/ucent available would make a few 
 people happy at least

Not to change the subject, but couldn't we just implement Cent and UCent as library types, such as Complex, or HalfFloat? I'd be tempted to open an ER and see if walter is up for it...
Dec 31 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Monday, 31 December 2012 at 18:18:08 UTC, monarch_dodra wrote:
 On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow 
 wrote:
 Perhaps, but having cent/ucent available would make a few 
 people happy at least

Not to change the subject, but couldn't we just implement Cent and UCent as library types, such as Complex, or HalfFloat? I'd be tempted to open an ER and see if walter is up for it...

Yes we could. Just that they are already defined built-in types according to the specs. *shrugs* But changing your code from Cent to cent after it's properly implemented seems like a minimal change. If walter says go ahead (and I'm sure he won't have a problem with it), then I guess it's up to whoever to make it.
Dec 31 2012
prev sibling next sibling parent "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Monday, 31 December 2012 at 18:55:04 UTC, Stewart Gordon wrote:
 Sector?  Do you mean cluster?

Probably. They mean the same thing to me.
 I would have thought it used the whole 32 bits for cluster 
 number, with magic values for "unused", "end of chain" and 
 "bad".  In each case you don't need to point to the next 
 cluster as well.  Unless it supports something like marking an 
 in-use cluster as bad but leaving until another day the task of 
 moving the data from it into a good cluster.

Wouldn't have been practical. Historically the FAT table was a layout of all the clusters that pointed to the next cluster, and used the largest number to denote EOF. (there were 8 codes or so reserved, I don't remember exactly). Taking the math if you were to lay it all out via FAT32 using the same scheme, you'd end up with 2^34 bytes for a single table. FAT by default had 2 tables (a backup) meaning 2^35 would be needed, that is just overhead (assuming you needed it all). Back then you had 8Gig drives at the most and space being sparse, 28bits makes more sense (1-2Gigs vs 16gigs-32gigs). Of course obviously it wouldn't make the table(s) bigger if the drive didn't support above X clusters.
 But looking through
  http://en.wikipedia.org/wiki/Fat32

there are indeed a handful of magic values for things like this. Anyway, that states that it uses 28 bits for the cluster number, but nothing about what the other 4 bits are for. But a possibility I can see is that these 4 bits were reserved for bit flags that may be added in the future.

I don't remember where I read it, but I was certain they were used for overhead/flags; Course I was also reading up on Long File Name (LFN) too and directory structures. Likely lost somewhere in those texts. FAT32 would/could have supported files over 4Gig, however the Folder/FS Max filesize was 32bit, and anything over would have likely been more complex to incorporate, along with programs depending on them all being 32bit. Guess it was easier to make the hard limit rather than make it extend to further sizes. Plus programmers are going to be lazy and prefer int whenever possible. Hmmm actually back then long long's weren't supported (except maybe by gcc), so I don't think that was much an option.
Dec 31 2012
prev sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Sunday, 30 December 2012 at 22:08:59 UTC, Jonathan M Davis 
wrote:
 On Sunday, December 30, 2012 19:23:13 SomeDude wrote:
 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Um. No. There's a reason that things like the macro _LARGE_FILE_API exist. While some file systems won't allow larger file sizes, that's generally an FS limitation, not an OS limitation. The OSes provide ways to operate on 64-bit files on 32-bit systems. If they didn't, it would be impossible to do something like read a Blu-ray on a 32-bit system. http://en.wikipedia.org/wiki/Large_file_support - Jonathan M Davis

That's true, I stand corrected.
Jan 02 2013