www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Multi-threaded sorting of text file

reply MGW <mgw yandex.ru> writes:
Need help:
There' s a large text file (hundreds of thousands of lines).
The structure is as follows:
2345|wedwededwedwedwe ......
872625|rfrferwewweww .....
23|rergrferfefer ....
................

It is necessary to sort this file by the first field having 
received:
23|rergrferfefer.......
2345|wedwededwedwedwe.......
872625|rfrferwewweww.......

There are also N CPU (from 4 to 8) and 16 Gb of Memory. Necessary
come up with an algorithm in D for fast sorting using 
multithreading.
Jan 03 2020
next sibling parent Alex <sascha.orlov gmail.com> writes:
On Saturday, 4 January 2020 at 07:51:49 UTC, MGW wrote:
 Need help:
 There' s a large text file (hundreds of thousands of lines).
 The structure is as follows:
 2345|wedwededwedwedwe ......
 872625|rfrferwewweww .....
 23|rergrferfefer ....
 ................

 It is necessary to sort this file by the first field having 
 received:
 23|rergrferfefer.......
 2345|wedwededwedwedwe.......
 872625|rfrferwewweww.......

 There are also N CPU (from 4 to 8) and 16 Gb of Memory. 
 Necessary
 come up with an algorithm in D for fast sorting using 
 multithreading.
As far as I know, there isn't any native in D. Maybe I overlooked some at code.dlang.org. But there are plenty out there in the wild. Found this on the first shoot: https://stackoverflow.com/questions/23531625/multithreaded-sorting-application/23532317
Jan 03 2020
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/3/20 11:51 PM, MGW wrote:
 Need help:
 There' s a large text file (hundreds of thousands of lines).
How long are the lines? If 1K bytes, 100M would fit in memory just fine. There is a parallel quick sort example on the std.parallelism page: https://dlang.org/phobos/std_parallelism.html
 The structure is as follows:
 2345|wedwededwedwedwe ......
 872625|rfrferwewweww .....
 23|rergrferfefer ....
 .................
 
 It is necessary to sort this file by the first field having received:
 23|rergrferfefer.......
 2345|wedwededwedwedwe.......
 872625|rfrferwewweww.......
Are you going to write the result back to a file? Then you would hardly notice any improvement from parallelism because relative slowness of I/O would determine the overall performance.
 
 There are also N CPU (from 4 to 8) and 16 Gb of Memory. Necessary
 come up with an algorithm in D for fast sorting using multithreading.
 
 
Ali
Jan 04 2020