digitalmars.D - Ranges longer than size

digitalmars.D - Ranges longer than size_t.max

Peter Alexander (28/28) Dec 28 2012 There was a discussion a while ago about the type returned by

Simen Kjaeraas (26/51) Dec 28 2012 64 bits ought to be enough for anyone. :p

Peter Alexander (9/21) Dec 28 2012 This doesn't solve the issue. For example, even with your update,

Jonathan M Davis (8/43) Dec 28 2012 I'd be very strongly inclined to go with #4 and just say that anyone who...

Peter Alexander (7/19) Dec 28 2012 Sorry, I don't think I was clear on what the issue is.

Andrei Alexandrescu (3/19) Dec 28 2012 I think it would complicate a lot of things for the benefit of a few cas...

Peter Alexander (7/15) Dec 29 2012 I agree, but the range of permutations will be longer than 2^64

Jonathan M Davis (7/26) Dec 29 2012 I expect that it'll be silently overflow in most cases, because that's a...
Andrei Alexandrescu (3/16) Dec 29 2012 I'd say give it a different name, i.e. bigLength.

Peter Alexander (7/30) Dec 29 2012 But then the range doesn't have a length, even though it would be

Andrei Alexandrescu (4/6) Dec 29 2012 I'd think you convinced yourself it's a bad idea while writing it. I did...

Mehrdad (8/11) Dec 28 2012 Uhm, 64-bit?

Jonathan M Davis (21/34) Dec 29 2012 You bring up a good point, but not even arrays support that, so stuff li...

Dmitry Olshansky (9/43) Dec 29 2012 s/array/slice
Mehrdad (4/6) Dec 29 2012 I didn't mean mmfile, I meant random access to the bytes of a

SomeDude (2/8) Dec 30 2012 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Dmitry Olshansky (9/20) Dec 30 2012 False. It's solely a question of FS used.

Era Scarecrow (9/15) Dec 31 2012 I was writing some code to go through the FAT12/16/32, and some

Stewart Gordon (14/29) Dec 31 2012 Sector? Do you mean cluster?

Era Scarecrow (27/41) Dec 31 2012 Wouldn't have been practical. Historically the FAT table was a

Jonathan M Davis (8/20) Dec 30 2012 Um. No. There's a reason that things like the macro _LARGE_FILE_API exis...

SomeDude (3/16) Jan 02 2013 That's true, I stand corrected.

Marco Leise (11/13) Dec 31 2012 Anachronism!!!

Stewart Gordon (20/30) Dec 29 2012

Peter Alexander (11/25) Dec 29 2012 It's also three, for large values of three :-)

Stewart Gordon (12/23) Dec 30 2012 Yes, you have a point there. Any facility for creating a range by

monarch_dodra (20/21) Dec 29 2012 I still think there is a fundamental difference between:

Jonathan M Davis (10/19) Dec 29 2012 It's now enforced that finite random access ranges have length, so we're...

Era Scarecrow (4/9) Dec 31 2012 Option 5: Incorporate (emulation code?) for cent/ucent, then

Peter Alexander (5/15) Dec 31 2012 Two problems with this:

Era Scarecrow (13/30) Dec 31 2012 Perhaps, but having cent/ucent available would make a few people

monarch_dodra (4/6) Dec 31 2012 Not to change the subject, but couldn't we just implement Cent

Era Scarecrow (7/14) Dec 31 2012 Yes we could. Just that they are already defined built-in types

"Peter Alexander" <peter.alexander.au gmail.com> writes:

There was a discussion a while ago about the type returned by 
.length for ranges. I believe the conclusion was that it should 
always be size_t.

My question now is what to do with really large ranges? For 
example, if we have a range of permutations of the elements in a 
set, when the set is larger than 21 the number of permutations is 
21 factorial, which is over 2^64. As far as I see, there are 
three options:

1. Allow ranges to return BigInt for length.
2. Allow ranges like this but assert when .length overflows.
3. Allow ranges like this and just allow .length to overflow 
dangerously.
4. Do not allow ranges like this.

Option 1 solves the problem, but significantly complicates Phobos 
development for the rare case.

Option 2 works, and is practical, although runtime asserts are 
undesirable.

Option 3 is current Phobos practice (presumably because overflow 
is unlikely). See example below. This may be acceptable 
currently, but perhaps less so when overflow is more likely.
--------------------------------
auto a = iota(size_t.max / 2 + 1);	
auto b = chain(a, a);
writeln(a.length); // 9223372036854775808
writeln(b.length); // 0
--------------------------------

Option 4 works, but it would be a shame to lose these ranges.

Thoughts?

Dec 28 2012

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

On 2012-18-29 02:12, Peter Alexander <peter.alexander.au gmail.com> wrote:

 There was a discussion a while ago about the type returned by .length  
 for ranges. I believe the conclusion was that it should always be size_t.

 My question now is what to do with really large ranges? For example, if  
 we have a range of permutations of the elements in a set, when the set  
 is larger than 21 the number of permutations is 21 factorial, which is  
 over 2^64. As far as I see, there are three options:

 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow dangerously.
 4. Do not allow ranges like this.

 Option 1 solves the problem, but significantly complicates Phobos  
 development for the rare case.

 Option 2 works, and is practical, although runtime asserts are  
 undesirable.

 Option 3 is current Phobos practice (presumably because overflow is  
 unlikely). See example below. This may be acceptable currently, but  
 perhaps less so when overflow is more likely.
 --------------------------------
 auto a = iota(size_t.max / 2 + 1);	
 auto b = chain(a, a);
 writeln(a.length); // 9223372036854775808
 writeln(b.length); // 0
 --------------------------------

 Option 4 works, but it would be a shame to lose these ranges.

 Thoughts?

64 bits ought to be enough for anyone. :p

This is a bit of a tough one, and I believe the root problem is that
user-defined types are second-class citizens. Specifically, CommonType
is incapable of finding the common type of BigInt and int. This makes
it hard to find the correct return type for e.g. chain. Behind the
scenes, CommonType is using ?: to figure out the correct type. If D
supported opImplicitCastFrom (as suggested in WalterAndrei.pdf), ?:
could figure out the correct type, and chain's .length could be:

   private template LengthType(Range)
   {
       alias typeof(Range.length) LengthType;
   }

    property auto length()
   {
       CommonType!(staticMap!(LengthType, R)) result = 0;
       foreach (i, Unused; R)
       {
           result += source[i].length;
       }
       return result;
   }

This would thus support BigInt .length just as well as size_t, byte,
or whatever.

-- 
Simen

Dec 28 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 29 December 2012 at 01:56:04 UTC, Simen Kjaeraas 
wrote:
    property auto length()
   {
       CommonType!(staticMap!(LengthType, R)) result = 0;
       foreach (i, Unused; R)
       {
           result += source[i].length;
       }
       return result;
   }

 This would thus support BigInt .length just as well as size_t, 
 byte,
 or whatever.

This doesn't solve the issue. For example, even with your update, 
the code snippet I provided in the original post with iota and 
chain would still break (assuming iota(size_t.max).length returns 
a size_t).

Also, if we allow BigInt, even with better support in CommonType 
it would significantly complicate Phobos with LengthType!Range 
popping up everywhere.

Dec 28 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:
 There was a discussion a while ago about the type returned by
 .length for ranges. I believe the conclusion was that it should
 always be size_t.
 
 My question now is what to do with really large ranges? For
 example, if we have a range of permutations of the elements in a
 set, when the set is larger than 21 the number of permutations is
 21 factorial, which is over 2^64. As far as I see, there are
 three options:
 
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow
 dangerously.
 4. Do not allow ranges like this.
 
 Option 1 solves the problem, but significantly complicates Phobos
 development for the rare case.
 
 Option 2 works, and is practical, although runtime asserts are
 undesirable.
 
 Option 3 is current Phobos practice (presumably because overflow
 is unlikely). See example below. This may be acceptable
 currently, but perhaps less so when overflow is more likely.
 --------------------------------
 auto a = iota(size_t.max / 2 + 1);
 auto b = chain(a, a);
 writeln(a.length); // 9223372036854775808
 writeln(b.length); // 0
 --------------------------------
 
 Option 4 works, but it would be a shame to lose these ranges.
 
 Thoughts?


actually cares about numbers that large should use 64-bit. Needing ranges of 
length greater than uint.max is definitely not the norm, and I would expect 
people caring that much about computational stuff to be using 64-bit anyway, 
since odds are they'll need the memory. Allowing for length to be anything 
other than size_t is extremely annoying for generic code.

- Jonathan M Davis

Dec 28 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:
 On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:

 anyone who
 actually cares about numbers that large should use 64-bit. 
 Needing ranges of
 length greater than uint.max is definitely not the norm, and I 
 would expect
 people caring that much about computational stuff to be using 
 64-bit anyway,
 since odds are they'll need the memory. Allowing for length to 
 be anything
 other than size_t is extremely annoying for generic code.

Sorry, I don't think I was clear on what the issue is.

I'm already assuming 64-bit. What I'm saying is that 64-bit 
sometimes isn't even enough (for example in the case of a 
permutations range). I'm asking if allowing length to return 
BigInt would be reasonable.

Dec 28 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/28/12 9:21 PM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis wrote:
 On Saturday, December 29, 2012 02:18:36 Peter Alexander wrote:

 actually cares about numbers that large should use 64-bit. Needing
 ranges of
 length greater than uint.max is definitely not the norm, and I would
 expect
 people caring that much about computational stuff to be using 64-bit
 anyway,
 since odds are they'll need the memory. Allowing for length to be
 anything
 other than size_t is extremely annoying for generic code.

 Sorry, I don't think I was clear on what the issue is.

 I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
 isn't even enough (for example in the case of a permutations range). I'm
 asking if allowing length to return BigInt would be reasonable.

I think it would complicate a lot of things for the benefit of a few cases.

Andrei

Dec 28 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei 
Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit 
 sometimes
 isn't even enough (for example in the case of a permutations 
 range). I'm
 asking if allowing length to return BigInt would be reasonable.

 I think it would complicate a lot of things for the benefit of 
 a few cases.

I agree, but the range of permutations will be longer than 2^64 
on occasion, so what do we do?

* Don't provide .length at all
* Provide it as size_t, and assert on overflow
* Provide it as size_t, and silently overflow

Dec 29 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, December 29, 2012 10:00:44 Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei
 
 Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit
 sometimes
 isn't even enough (for example in the case of a permutations
 range). I'm
 asking if allowing length to return BigInt would be reasonable.

 
 I think it would complicate a lot of things for the benefit of
 a few cases.

 
 I agree, but the range of permutations will be longer than 2^64
 on occasion, so what do we do?
 
 * Don't provide .length at all
 * Provide it as size_t, and assert on overflow
 * Provide it as size_t, and silently overflow

I expect that it'll be silently overflow in most cases, because that's a lot 
simpler and the problem is almost never an issue. It also incurs a performance 
penalty in non-release mode where there's almost never any benefit from it. But 
if you want to provide pull requests with assertions for overflow in various 
Phobos algorithms, then they'll probably be merged in.

- Jonathan M Davis

Dec 29 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/29/12 4:00 AM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit sometimes
 isn't even enough (for example in the case of a permutations range). I'm
 asking if allowing length to return BigInt would be reasonable.

 I think it would complicate a lot of things for the benefit of a few
 cases.

 I agree, but the range of permutations will be longer than 2^64 on
 occasion, so what do we do?

 * Don't provide .length at all
 * Provide it as size_t, and assert on overflow
 * Provide it as size_t, and silently overflow

I'd say give it a different name, i.e. bigLength.

Andrei

Dec 29 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 29 December 2012 at 14:23:09 UTC, Andrei 
Alexandrescu wrote:
 On 12/29/12 4:00 AM, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 03:28:47 UTC, Andrei 
 Alexandrescu wrote:
 On 12/28/12 9:21 PM, Peter Alexander wrote:
 I'm already assuming 64-bit. What I'm saying is that 64-bit 
 sometimes
 isn't even enough (for example in the case of a permutations 
 range). I'm
 asking if allowing length to return BigInt would be 
 reasonable.

 I think it would complicate a lot of things for the benefit 
 of a few
 cases.

 I agree, but the range of permutations will be longer than 
 2^64 on
 occasion, so what do we do?

 * Don't provide .length at all
 * Provide it as size_t, and assert on overflow
 * Provide it as size_t, and silently overflow

 I'd say give it a different name, i.e. bigLength.

But then the range doesn't have a length, even though it would be 
useful to have it in a lot of cases.

How about, have .length, and allow it to dangerously overflow, 
but also provide .bigLength, which returns a BigInt for when 
people need it?

Dec 29 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/29/12 3:38 PM, Peter Alexander wrote:
 How about, have .length, and allow it to dangerously overflow, but also
 provide .bigLength, which returns a BigInt for when people need it?

I'd think you convinced yourself it's a bad idea while writing it. I did 
the same :o).

Andrei

Dec 29 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis 
wrote:

 anyone who actually cares about numbers that large should use 
 64-bit.



Uhm, 64-bit?

What about large files (> 4 GiB) on 32-bit systems?

(Say, if a range wants to provide random access to the bytes of 
an 8 GiB file?)

And what about large bit vectors? A 600 MiB bit vector needs > 
2^32 bits for length...

Dec 28 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis
 
 wrote:

 anyone who actually cares about numbers that large should use
 64-bit.

 
 Uhm, 64-bit?
 
 What about large files (> 4 GiB) on 32-bit systems?
 
 (Say, if a range wants to provide random access to the bytes of
 an 8 GiB file?)

You bring up a good point, but not even arrays support that, so stuff like 
std.mmfile can't possibly work with large files on 32-bit systems, not with the 
current design anyway (as it gives you an array to operate on). It may be that 
you just need to do something different if you want to operate on large files
on 
32-bit systems. I don't know.

Plenty of stuff right now in Phobos isn't going to work right now if you try 
and have a length of 64 bits on a 32-bit system. size_t is used pretty much 
everywhere. And it's going to make generic code a _lot_ less pleasant if we 
have to worry about using anything other than size_t. It's already been 
causing us problems that iota does, and we've discussed making it so that it 
doesn't. But even if we allow ulong for length and indexing on 32-bit systems, 
that's a completely different ball game from permitting BigInt.

So, I'd _very_ much like to always restruct length and indices to size_t for 
ranges. It will make all of the generic handling of that stuff far more 
pleasant. But we may very well be forced to permit ulong on 32-bit systems due 
to practical concerns such as large file handling on 32-bit systems.

Oh, how I wish that 32-bit systems would just die out. It would make so many 
things so much easier. No such luck on that yet though, much as most PC 
processors are 64-bit now.

- Jonathan M Davis

Dec 29 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12/29/2012 1:30 PM, Jonathan M Davis пишет:
 On Saturday, December 29, 2012 07:12:49 Mehrdad wrote:
 On Saturday, 29 December 2012 at 02:14:22 UTC, Jonathan M Davis

 wrote:

 anyone who actually cares about numbers that large should use
 64-bit.

 Uhm, 64-bit?

 What about large files (> 4 GiB) on 32-bit systems?

 (Say, if a range wants to provide random access to the bytes of
 an 8 GiB file?)

 You bring up a good point, but not even arrays support that, so stuff like
 std.mmfile can't possibly work with large files on 32-bit systems, not with the
 current design anyway (as it gives you an array to operate on). It may be that
 you just need to do something different if you want to operate on large files
on
 32-bit systems. I don't know.

s/array/slice
It maps a view of a part of file. Obviously one can't map parts greater 
then address space.

 Plenty of stuff right now in Phobos isn't going to work right now if you try
 and have a length of 64 bits on a 32-bit system. size_t is used pretty much
 everywhere. And it's going to make generic code a _lot_ less pleasant if we
 have to worry about using anything other than size_t. It's already been
 causing us problems that iota does, and we've discussed making it so that it
 doesn't. But even if we allow ulong for length and indexing on 32-bit systems,
 that's a completely different ball game from permitting BigInt.

 So, I'd _very_ much like to always restruct length and indices to size_t for
 ranges. It will make all of the generic handling of that stuff far more
 pleasant. But we may very well be forced to permit ulong on 32-bit systems due
 to practical concerns such as large file handling on 32-bit systems.

I think 32-bit version have to bite the bullet and try ulong/uint deduction.

 Oh, how I wish that 32-bit systems would just die out.
It would make so many
 things so much easier. No such luck on that yet though, much as most PC
 processors are 64-bit now.

Yeah, bad sadly they are practical ;)
As 32bit is now dominating MCU world...

-- 
Dmitry Olshansky

Dec 29 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Saturday, 29 December 2012 at 09:31:02 UTC, Jonathan M Davis 
wrote:
 not even arrays support that, so stuff like std.mmfile can't 
 possibly work with large files on 32-bit systems


I didn't mean mmfile, I meant random access to the bytes of a 
regular file. No need for mapping it to memory.

Dec 29 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?

 What about large files (> 4 GiB) on 32-bit systems?

 (Say, if a range wants to provide random access to the bytes of 
 an 8 GiB file?)

 And what about large bit vectors? A 600 MiB bit vector needs > 
 2^32 bits for length...

AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Dec 30 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12/30/2012 10:23 PM, SomeDude пишет:
 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?

 What about large files (> 4 GiB) on 32-bit systems?

 (Say, if a range wants to provide random access to the bytes of an 8
 GiB file?)

 And what about large bit vectors? A 600 MiB bit vector needs > 2^32
 bits for length...

 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

False. It's solely a question of FS used.
NTFS supports files larger then 4 Gb regardless of version Windows 
(Win2K+ or even earlier). It doesn't matter what the bitness of OS in 
question is. I suspect 32bit linux also has > 4Gb files even with ext2 
no problem.

And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.
-- 
Dmitry Olshansky

Dec 30 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky 
wrote:
 False. It's solely a question of FS used.
 NTFS supports files larger then 4 Gb regardless of version 
 Windows (Win2K+ or even earlier). It doesn't matter what the 
 bitness of OS in question is. I suspect 32bit linux also has > 
 4Gb files even with ext2 no problem.

 And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.

  I was writing some code to go through the FAT12/16/32, and some 
interesting information was found (Although there was so much 
overhead I kinda stopped in the middle of it). FAT32 actually 
uses something like 28/29 bits for the sector id, the other 3-4 
bits were flags like bad sectors, last sector and other minor 
data. At least that's what I remember off hand.

  http://en.wikipedia.org/wiki/Fat32

Dec 31 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 31/12/2012 17:50, Era Scarecrow wrote:
 On Sunday, 30 December 2012 at 19:11:51 UTC, Dmitry Olshansky wrote:
 False. It's solely a question of FS used.
 NTFS supports files larger then 4 Gb regardless of version Windows
 (Win2K+ or even earlier). It doesn't matter what the bitness of OS in
 question is. I suspect 32bit linux also has > 4Gb files even with ext2
 no problem.

 And e.g. FAT32 remarkably can't handle 4 Gb+ files with any OS.

   I was writing some code to go through the FAT12/16/32, and some
 interesting information was found (Although there was so much overhead I
 kinda stopped in the middle of it). FAT32 actually uses something like
 28/29 bits for the sector id, the other 3-4 bits were flags like bad
 sectors, last sector and other minor data. At least that's what I
 remember off hand.

Sector?  Do you mean cluster?

I would have thought it used the whole 32 bits for cluster number, with 
magic values for "unused", "end of chain" and "bad".  In each case you 
don't need to point to the next cluster as well.  Unless it supports 
something like marking an in-use cluster as bad but leaving until 
another day the task of moving the data from it into a good cluster.

But looking through
   http://en.wikipedia.org/wiki/Fat32

there are indeed a handful of magic values for things like this.

Anyway, that states that it uses 28 bits for the cluster number, but 
nothing about what the other 4 bits are for.

But a possibility I can see is that these 4 bits were reserved for bit 
flags that may be added in the future.

Stewart.

Dec 31 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 31 December 2012 at 18:55:04 UTC, Stewart Gordon wrote:
 Sector?  Do you mean cluster?

  Probably. They mean the same thing to me.

 I would have thought it used the whole 32 bits for cluster 
 number, with magic values for "unused", "end of chain" and 
 "bad".  In each case you don't need to point to the next 
 cluster as well.  Unless it supports something like marking an 
 in-use cluster as bad but leaving until another day the task of 
 moving the data from it into a good cluster.

  Wouldn't have been practical. Historically the FAT table was a 
layout of all the clusters that pointed to the next cluster, and 
used the largest number to denote EOF. (there were 8 codes or so 
reserved, I don't remember exactly).

  Taking the math if you were to lay it all out via FAT32 using 
the same scheme, you'd end up with 2^34 bytes for a single table. 
FAT by default had 2 tables (a backup) meaning 2^35 would be 
needed, that is just overhead (assuming you needed it all). Back 
then you had 8Gig drives at the most and space being sparse, 
28bits makes more sense (1-2Gigs vs 16gigs-32gigs). Of course 
obviously it wouldn't make the table(s) bigger if the drive 
didn't support above X clusters.

 But looking through
  http://en.wikipedia.org/wiki/Fat32

 there are indeed a handful of magic values for things like this.

 Anyway, that states that it uses 28 bits for the cluster 
 number, but nothing about what the other 4 bits are for.

 But a possibility I can see is that these 4 bits were reserved 
 for bit flags that may be added in the future.

  I don't remember where I read it, but I was certain they were 
used for overhead/flags; Course I was also reading up on Long 
File Name (LFN) too and directory structures. Likely lost 
somewhere in those texts.

  FAT32 would/could have supported files over 4Gig, however the 
Folder/FS Max filesize was 32bit, and anything over would have 
likely been more complex to incorporate, along with programs 
depending on them all being 32bit.

  Guess it was easier to make the hard limit rather than make it 
extend to further sizes. Plus programmers are going to be lazy 
and prefer int whenever possible. Hmmm actually back then long 
long's weren't supported (except maybe by gcc), so I don't think 
that was much an option.

Dec 31 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, December 30, 2012 19:23:13 SomeDude wrote:
 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 Uhm, 64-bit?
 
 What about large files (> 4 GiB) on 32-bit systems?
 
 (Say, if a range wants to provide random access to the bytes of
 an 8 GiB file?)
 
 And what about large bit vectors? A 600 MiB bit vector needs >
 2^32 bits for length...

 
 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Um. No. There's a reason that things like the macro _LARGE_FILE_API exist. 
While some file systems won't allow larger file sizes, that's generally an FS 
limitation, not an OS limitation. The OSes provide ways to operate on 64-bit 
files on 32-bit systems. If they didn't, it would be impossible to do something 
like read a Blu-ray on a 32-bit system.

http://en.wikipedia.org/wiki/Large_file_support

- Jonathan M Davis

Dec 30 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Sunday, 30 December 2012 at 22:08:59 UTC, Jonathan M Davis 
wrote:
 On Sunday, December 30, 2012 19:23:13 SomeDude wrote:
 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

 Um. No. There's a reason that things like the macro 
 _LARGE_FILE_API exist.
 While some file systems won't allow larger file sizes, that's 
 generally an FS
 limitation, not an OS limitation. The OSes provide ways to 
 operate on 64-bit
 files on 32-bit systems. If they didn't, it would be impossible 
 to do something
 like read a Blu-ray on a 32-bit system.

 http://en.wikipedia.org/wiki/Large_file_support

 - Jonathan M Davis

That's true, I stand corrected.

Jan 02 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Sun, 30 Dec 2012 19:23:13 +0100
schrieb "SomeDude" <lovelydear mailmetrash.com>:

 On Saturday, 29 December 2012 at 06:12:52 UTC, Mehrdad wrote:
 AFAIK, no 32-bit OS supports files larger than 4Gb anyway.

Anachronism!!!
4.7 GB DVD burners: ~1997
64-bit personal computers: 2003
(Also backups of large partitions into disk images were made)

That said, my father owns a stand-alone, multi-media, USB
disk drive, that cannot store files > 4GB because of the used
FAT32.

-- 
Marco

Dec 31 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 29/12/2012 01:18, Peter Alexander wrote:
 There was a discussion a while ago about the type returned by .length
 for ranges. I believe the conclusion was that it should always be size_t.

 My question now is what to do with really large ranges? For example, if
 we have a range of permutations of the elements in a set, when the set
 is larger than 21 the number of permutations is 21 factorial, which is
 over 2^64. As far as I see, there are three options:

They look like four to me.

 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow dangerously.
 4. Do not allow ranges like this.

<snip>

5. Don't allow such ranges to define a .length property.

 From the std.range documentation:

template hasLength(R)
     Returns true if R has a length member that returns an integral 
type. R does not have to be a range. Note that length is an optional 
primitive as no range must implement it. Some ranges do not store their 
length explicitly, some cannot compute it without actually exhausting 
the range (e.g. socket streams), and some other ranges may be infinite.

There you go.  For reasons such as this, ranges aren't required to 
define a length property.  As well as I/O ranges and infinite ranges, 
ranges where the length is liable to be greater than ulong.max are among 
those that naturally won't have this property.

If we define a length that may overflow, then sooner or later some code 
will be called on it that calls hasLength on the range, finds it to be 
true, and then relies on length and malfunctions because the value 
returned doesn't match the actual length.

Stewart.

Dec 29 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon 
wrote:
 On 29/12/2012 01:18, Peter Alexander wrote:
 My question now is what to do with really large ranges? For 
 example, if
 we have a range of permutations of the elements in a set, when 
 the set
 is larger than 21 the number of permutations is 21 factorial, 
 which is
 over 2^64. As far as I see, there are three options:

 They look like four to me.

It's also three, for large values of three :-)


 If we define a length that may overflow, then sooner or later 
 some code will be called on it that calls hasLength on the 
 range, finds it to be true, and then relies on length and 
 malfunctions because the value returned doesn't match the 
 actual length.

Using this logic, chain shouldn't have length either (it can 
overflow, as I demonstrated in the original post), but it does, 
and it should. There are many ranges in Phobos that may overflow 
in length.

The problem with permutations is that, for the kinds of 
permutations you are likely to use often, length will not 
overflow. It would be really useful to have length available when 
this is the case.

Dec 29 2012

Stewart Gordon <smjg_1998 yahoo.com> writes:

On 29/12/2012 21:13, Peter Alexander wrote:
 On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon wrote:

<snip>
 If we define a length that may overflow, then sooner or later some
 code will be called on it that calls hasLength on the range, finds it
 to be true, and then relies on length and malfunctions because the
 value returned doesn't match the actual length.

 Using this logic, chain shouldn't have length either (it can overflow,
 as I demonstrated in the original post), but it does, and it should.
 There are many ranges in Phobos that may overflow in length.

Yes, you have a point there.  Any facility for creating a range by 
combining ranges is liable to overflow if it defines a length property. 
  But I would expect that in practice 99% of finite ranges would be 
nowhere near 2^64 in size, so an overflow would be an exceptional case.

 The problem with permutations is that, for the kinds of permutations you
 are likely to use often, length will not overflow. It would be really
 useful to have length available when this is the case.

Maybe the solution is to define two permutation range classes: one with 
length that asserts that the number of elements <= 20, and one without 
length that places no bound.

If the set size is a template parameter rather than being set at run 
time, then it can define length conditionally.

Stewart.

Dec 30 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 29 December 2012 at 20:59:56 UTC, Stewart Gordon
wrote:
 5. Don't allow such ranges to define a .length property.

I still think there is a fundamental difference between:
5a: Don't allow such ranges to define a .length property.
and
5b: Don't guarantee *support* for such ranges.

The current status quo in phobos is "5b" BTW. There are a few
select algorithms that do support it, but only because the 
implementers decided to go the extra mile, and it usually makes 
things more complicated than it needs to be.

BTW iota, in certain cases, will have a length returned in ulong.
This is used to test *certain* algorithms, but really, nothing is
guaranteed.

At the very least, there was a discussion some time ago that for
a (finite) RA range, range[range.length] must compile. Ditto for
slicing.

This was never enforced though... Which is a shame (IMO)...
...Well, I was the one to suggest that actually, but I never did 
it, so blame falls on me :/

I'll put it back on my todo list.

Dec 29 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, December 29, 2012 22:18:53 monarch_dodra wrote:
 At the very least, there was a discussion some time ago that for
 a (finite) RA range, range[range.length] must compile. Ditto for
 slicing.
 
 This was never enforced though... Which is a shame (IMO)...
 ...Well, I was the one to suggest that actually, but I never did
 it, so blame falls on me :/
 
 I'll put it back on my todo list.

It's now enforced that finite random access ranges have length, so we're part 
of the way there.



http://d.puremagic.com/issues/show_bug.cgi?id=7177

If that gets implemented, then we can require that random access ranges and 
ranges with slicing fully support $.

We'd probably still need to check range[range.length] though, just to check 
that length and indexing are properly compatible.

- Jonathan M Davis

Dec 29 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

  Option 5: Incorporate (emulation code?) for cent/ucent, then 
consider that the same as 1 but returning cent rather than BitInt.

Dec 31 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow wrote:
 On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
 wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

  Option 5: Incorporate (emulation code?) for cent/ucent, then 
 consider that the same as 1 but returning cent rather than 
 BitInt.

Two problems with this:

1. Still complicates range code (have to accomodate for both 
size_t and ucent return value for .length).

2. ucent probably isn't enough either for these sorts of ranges.

Dec 31 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 31 December 2012 at 17:47:36 UTC, Peter Alexander 
wrote:
 On Monday, 31 December 2012 at 17:43:34 UTC, Era Scarecrow 
 wrote:
 On Saturday, 29 December 2012 at 01:18:37 UTC, Peter Alexander 
 wrote:
 1. Allow ranges to return BigInt for length.
 2. Allow ranges like this but assert when .length overflows.
 3. Allow ranges like this and just allow .length to overflow 
 dangerously.
 4. Do not allow ranges like this.

 Option 5: Incorporate (emulation code?) for cent/ucent, then 
 consider that the same as 1 but returning cent rather than 
 BitInt.

 Two problems with this:

 1. Still complicates range code (have to accomodate for both 
 size_t and ucent return value for .length).

 2. ucent probably isn't enough either for these sorts of ranges.

  Perhaps, but having cent/ucent available would make a few people 
happy at least, although except for a few cases with databases 
and encryption it may not be that important right now.

  I remember trying to learn C/C++ and trying to do a very very 
simple code to get the number 1 doubled something like 100 times. 
After 31 it went to garbage (hey I was 14 at the time, god that 
was a long time ago). It took me a while to understand the 
problem.

  I ended up re-writing the whole thing in assembly that took 101 
bytes (COM file) and could double the number as many times as I 
wanted.

Dec 31 2012

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow wrote:
  Perhaps, but having cent/ucent available would make a few 
 people happy at least

Not to change the subject, but couldn't we just implement Cent 
and UCent as library types, such as Complex, or HalfFloat?

I'd be tempted to open an ER and see if walter is up for it...

Dec 31 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Monday, 31 December 2012 at 18:18:08 UTC, monarch_dodra wrote:
 On Monday, 31 December 2012 at 17:56:37 UTC, Era Scarecrow 
 wrote:
 Perhaps, but having cent/ucent available would make a few 
 people happy at least

 Not to change the subject, but couldn't we just implement Cent 
 and UCent as library types, such as Complex, or HalfFloat?

 I'd be tempted to open an ER and see if walter is up for it...

  Yes we could. Just that they are already defined built-in types 
according to the specs. *shrugs* But changing your code from Cent 
to cent after it's properly implemented seems like a minimal 
change.

  If walter says go ahead (and I'm sure he won't have a problem 
with it), then I guess it's up to whoever to make it.

Dec 31 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Ranges longer than size_t.max