www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - how to handle very large array?

reply MichaelBi <shunjie.bi gmail.com> writes:
day6 of the advent of code 2021 needs to handle an array of 10^12 
length, or even bigger... plus change elements and append 
elements. normal implementation such as length, appender and ref 
element etc, seems cannot handle that big array? is there any 
alternative data structure or algorithm can handle such large 
array properly? thanks.
Feb 09 2022
next sibling parent reply MichaelBi <shunjie.bi gmail.com> writes:
On Wednesday, 9 February 2022 at 10:03:21 UTC, MichaelBi wrote:
 day6 of the advent of code 2021 needs to handle an array of 
 10^12 length, or even bigger... plus change elements and append 
 elements. normal implementation such as length, appender and 
 ref element etc, seems cannot handle that big array? is there 
 any alternative data structure or algorithm can handle such 
 large array properly? thanks.
got outofmemory error: core.exception.OutOfMemoryError src\core\lifetime.d(126): Memory allocation failed
Feb 09 2022
parent reply MichaelBi <shunjie.bi gmail.com> writes:
On Wednesday, 9 February 2022 at 10:05:23 UTC, MichaelBi wrote:
 On Wednesday, 9 February 2022 at 10:03:21 UTC, MichaelBi wrote:
 day6 of the advent of code 2021 needs to handle an array of 
 10^12 length, or even bigger... plus change elements and 
 append elements. normal implementation such as length, 
 appender and ref element etc, seems cannot handle that big 
 array? is there any alternative data structure or algorithm 
 can handle such large array properly? thanks.
got outofmemory error: core.exception.OutOfMemoryError src\core\lifetime.d(126): Memory allocation failed
https://adventofcode.com/2021/day/6#part2 "Suppose the lanternfish live forever and have unlimited food and space. Would they take over the entire ocean? After 256 days in the example above, there would be a total of 26984457539 lanternfish! How many lanternfish would there be after 256 days" 26984457539 in above is the array length.
Feb 09 2022
parent reply ag0aep6g <anonymous example.com> writes:
On 09.02.22 11:09, MichaelBi wrote:
 On Wednesday, 9 February 2022 at 10:05:23 UTC, MichaelBi wrote:
[...]
 got outofmemory error:
 core.exception.OutOfMemoryError src\core\lifetime.d(126): Memory 
 allocation failed
https://adventofcode.com/2021/day/6#part2 "Suppose the lanternfish live forever and have unlimited food and space. Would they take over the entire ocean? After 256 days in the example above, there would be a total of 26984457539 lanternfish! How many lanternfish would there be after 256 days" 26984457539 in above is the array length.
If you store one byte per lanternfish, that's 25 GiB. You don't seem to have enough RAM for such a large array. Try to think of a more efficient way of storing the information.
Feb 09 2022
parent user1234 <user1234 12.de> writes:
On Wednesday, 9 February 2022 at 10:29:03 UTC, ag0aep6g wrote:
 Try to think of a more efficient way of storing the information.
I cant agree more. The problem of OP is not dynamic arrays, it's that he uses an inadequate data structure.
Feb 09 2022
prev sibling next sibling parent bauss <jj_1337 live.dk> writes:
On Wednesday, 9 February 2022 at 10:03:21 UTC, MichaelBi wrote:
 day6 of the advent of code 2021 needs to handle an array of 
 10^12 length, or even bigger... plus change elements and append 
 elements. normal implementation such as length, appender and 
 ref element etc, seems cannot handle that big array? is there 
 any alternative data structure or algorithm can handle such 
 large array properly? thanks.
Use a memorymapped file that holds the array values, in theory it can be infinite then. Since an array has a known size for each entry then you can treat the file as an array. Let's say you have an array of ints. For a memorymapped file you obviously only have bytes to work with, so each entry will be 4 bytes, since a 32 bit integer (int) is 4 bytes. So to read something at a specific index you simply do N x I where N is the size of the type and I is the index you want to read at. Otherwise the size of an array cannot exceed the RAM you have, if your system can't use diskspace as RAM of course.
Feb 09 2022
prev sibling parent reply MoonlightSentinel <moonlightsentinel disroot.org> writes:
On Wednesday, 9 February 2022 at 10:03:21 UTC, MichaelBi wrote:
 day6 of the advent of code 2021 needs to handle an array of 
 10^12 length, or even bigger...
As others have mentioned, huge arrays require appropriate memory / the use of memory-mapped files to actually store it. But the calculation will take a *long* time even if your program could allocate an array of that size. There's a way to use a much smaller array to manage the lanternfish population...
Feb 09 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 2/9/22 10:38, MoonlightSentinel wrote:

 There's a way to use a much smaller array to manage the lanternfish
 population...
As soon as I read your comment, I was reminded of a certain ingenious sorting algorithm that is O(N). ;) After reading the problem description, I see your hint was useful. Ali
Feb 09 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Feb 09, 2022 at 11:05:23AM -0800, Ali Çehreli via Digitalmars-d-learn
wrote:
 On 2/9/22 10:38, MoonlightSentinel wrote:
 
 There's a way to use a much smaller array to manage the lanternfish
 population...
As soon as I read your comment, I was reminded of a certain ingenious sorting algorithm that is O(N). ;) After reading the problem description, I see your hint was useful.
[...] If you know beforehand that your data is discrete and bounded, you can achieve O(N) sort complexity (as opposed to O(N log N), which is the lower bound if you cannot make any assumptions about your data beyond their ordering). You can use pigeonhole sort, for example, i.e., create an M-element array of counters where M is the number of distinct values your elements may have, then just iterate over your data and increment the counter corresponding to each element you find (using array indexing for O(1) access to each counter). Then iterate over the array of counters in order and replace the input data with that many copies of each element. Your overall complexity then would be O(N+M), which would be O(N) if M is about the same order of magnitude as N or less. However, this algorithm quickly becomes impractical when M grows large, because you will soon need to create many more slots than there are input elements. For example, sorting a general int[] this way would require 2^32 counters, which is usually overkill unless your input has close to 2^32 elements. (You could do it if you knew your int's are ≤ M for some small value of M, though. But it won't work for general int[] that can contain any value.) And trying to sort a general ulong[] this way will not work because you would need an array of 2^64 counters, which would not only exhaust all available RAM in today's computers, but also take a ridiculously long amount of time to iterate in the second step. The insight here is *taking advantage of additional structure in your data*. If you take the general, minimal-assumptions approach, the best you can do is the general O(N log N) algorithm. However, by exploiting additional structure in your data, you can do better. Similarly, if you take the naïve approach to modelling your lanternfish, you'll end up with an unwieldy array that's far too large to fit in RAM. If you exploit the additional structure in your data, however, you can accomplish the same task with a much smaller array. T -- Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy
Feb 09 2022
parent reply MichaelBi <shunjie.bi gmail.com> writes:
On Wednesday, 9 February 2022 at 19:48:49 UTC, H. S. Teoh wrote:
 [...]
thanks, very helpful! i am using a assocArray now...
Feb 09 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 10, 2022 at 01:32:00AM +0000, MichaelBi via Digitalmars-d-learn
wrote:
 On Wednesday, 9 February 2022 at 19:48:49 UTC, H. S. Teoh wrote:
 [...]
thanks, very helpful! i am using a assocArray now...
Are you sure that's what you need? T -- Computers are like a jungle: they have monitor lizards, rams, mice, c-moss, binary trees... and bugs.
Feb 09 2022
next sibling parent forkit <forkit gmail.com> writes:
On Thursday, 10 February 2022 at 01:43:54 UTC, H. S. Teoh wrote:
 On Thu, Feb 10, 2022 at 01:32:00AM +0000, MichaelBi via 
 Digitalmars-d-learn wrote:
 On Wednesday, 9 February 2022 at 19:48:49 UTC, H. S. Teoh 
 wrote:
 [...]
thanks, very helpful! i am using a assocArray now...
Are you sure that's what you need? T
https://youtu.be/yJjpXJm7x0o?t=213
Feb 09 2022
prev sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Thursday, 10 February 2022 at 01:43:54 UTC, H. S. Teoh wrote:
 On Thu, Feb 10, 2022 at 01:32:00AM +0000, MichaelBi via 
 Digitalmars-d-learn wrote:
 thanks, very helpful! i am using a assocArray now...
Are you sure that's what you need?
Depends. if you do say TYPE[long/int] then you have effectively a sparse array, if there's a few entries (*Say a hundred million or something*) it will probably be fine. Depending on what you're storing, say if it's a few bits per entry you can probably use bitarray to store differing values. The 10^12 would take up.... 119Gb? That won't work. Wonder how 25Gb was calculated. Though data of that size sounds more like a database. So maybe making index-access that does file access to store to media might be better, where it swaps a single 1Mb block (*or some power^2 size*) doing read/writes. Again if the sparseness/density isn't heavy you could then get away using zram drive and leaving the allocating/compression to the OS so it all remains in memory (*Though that's not going to be a universal workaround and only works on linux*), or doing compression using zlib on blocks of data and swapping them in/out handled by the array-like object, but that i'm more iffy on. If the array is just to hold say results of a formula, you could then instead do a range with index support to generate the particular value and uses very little space (*depending on the minimum size of data needed to generate it*) though it may be slower than direct memory access.
Feb 12 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Feb 12, 2022 at 06:41:14PM +0000, Era Scarecrow via Digitalmars-d-learn
wrote:
 On Thursday, 10 February 2022 at 01:43:54 UTC, H. S. Teoh wrote:
 On Thu, Feb 10, 2022 at 01:32:00AM +0000, MichaelBi via
 Digitalmars-d-learn wrote:
 thanks, very helpful! i am using a assocArray now...
Are you sure that's what you need?
Depends. if you do say TYPE[long/int] then you have effectively a sparse array, if there's a few entries (*Say a hundred million or something*) it will probably be fine. Depending on what you're storing, say if it's a few bits per entry you can probably use bitarray to store differing values. The 10^12 would take up.... 119Gb? That won't work. Wonder how 25Gb was calculated.
[...] That was not my point. My point was to question whether the OP has discovered the insight that would allow him to accomplish his task with a LOT less space than the naïve approach of storing everything in a gigantic array. Substituting an AA for a gigantic array matters little as long as the basic approach remains the same -- you have only modified the implementation details but the algorithm is still a (highly) suboptimal one. --T
Feb 12 2022
parent reply MichaelBi <shunjie.bi gmail.com> writes:
On Saturday, 12 February 2022 at 20:31:29 UTC, H. S. Teoh wrote:
 On Sat, Feb 12, 2022 at 06:41:14PM +0000, Era Scarecrow via 
 Digitalmars-d-learn wrote:
 [...]
[...] That was not my point. My point was to question whether the OP has discovered the insight that would allow him to accomplish his task with a LOT less space than the naïve approach of storing everything in a gigantic array. Substituting an AA for a gigantic array matters little as long as the basic approach remains the same -- you have only modified the implementation details but the algorithm is still a (highly) suboptimal one. --T
thanks, you are all correct. i just change the algorithm and use the AA, previously using the naïve method...:), now solved perfectly. thanks again.
Feb 14 2022
parent Matheus <matheus gmail.com> writes:
On Monday, 14 February 2022 at 13:20:45 UTC, MichaelBi wrote:
 thanks, you are all correct. i just change the algorithm and 
 use the AA, previously using the naïve method...:), now solved 
 perfectly. thanks again.
You could have used a normal Int Array for this task too, you're dealing with numbers and could go with array of 9 elements 0..8, then move data circularly and adding new lives as hit -1. I pretty sure there is a Mathematical way to solve this, I mean without any Arrays, but I didn't bother about it since that even my basic JavaScript implementation runs in less than 1ms for 256 days in an old computer. Matheus.
Feb 14 2022