www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why is Json parsing slower in multiple threads?

reply Alexandre Bourquelot <alexandre.bourquelot ahrefs.com> writes:
Hello everyone. We have some D code running in production that 
reads files containing lines of JSON data, that we would like to 
parse and process.

These files can be processed in parallel, so we create one thread 
for processing each file. However I noticed significant slowdowns 
when processing multiple files in parallel, as opposed to 
processing only one file.

Here is a simple code snippet reproducing the issue. It reads 
from a file containing the same json copy pasted 100k times, like 
so:
```json
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
...
```

It gives the following output:
```
➜ ./test 1
(file ) (thread id 140310703728384) starting processing file
(file  )Done in 1 sec, 549 ms, 257 μs, and 6 hnsecs

➜ ./test 3
(file ) (thread id 140071550318336) starting processing file
(file ) (thread id 140078235236096) starting processing file
(file ) (thread id 140078221063936) starting processing file
(file  )Done in 4 secs, 296 ms, 780 μs, and 9 hnsecs
(file  )Done in 4 secs, 360 ms, 498 μs, and 3 hnsecs
(file  )Done in 4 secs, 393 ms, 342 μs, and 6 hnsecs
```
Another curious thing is that this behaviour is not present when 
compiling the code with the `--build=profile` option.

For reference:
```bash
➜ ldc2 --version
LDC - the LLVM D compiler (1.24.0):
   based on DMD v2.094.1 and LLVM 11.0.1
```

```d
import std.file;
import core.thread.osthread;
import std.conv;
import std.concurrency;
import std.json;
import std.stdio;
import std.encoding;
import std.datetime.systime : Clock;
import std.process;
import std.functional;
import std.algorithm;
import std.bitmanip;



void parseInThread(string[] lines)
{
     writefln("(file %s) (thread id %s) starting processing file", 
"", thisThreadID);

     auto startTime = Clock.currTime;

     foreach (line; lines)
     {
         line.parseJSON;
     }

     writefln("(file %s )Done in %s", "", Clock.currTime - 
startTime);
}

class T
{
     Thread t_;
     string _filename;
     string[] _lines;

     this(string[] lines)
     {
         _lines = lines.dup;
         t_ = new Thread(() { parseInThread(_lines); });
     }

     void opCall()
     {
         t_.start;
     }

     void join()
     {
         t_.join;
     }
}

int main(string[] args)
{

     T[] threads;

     string filenameBase = "./file";
     foreach (k; 1 .. args[1].to!int + 1)
     {
         auto v = filenameBase ~ k.to!string;

         auto newFile = File(v ~ "", "r");

         string[] lines;

         foreach (line; newFile.byLine)
         {
             lines ~= (line.to!string);
         }
         newFile.close;

         threads ~= new T(lines);
     }

     foreach (thread; threads)
     {
         thread();
     }

     foreach (thread; threads)
     {
         thread.join;
     }

     return 0;
}
```

Thanks in advance, this has been annoying me for a couple of days 
and I have no idea what might be the problem. Strangely enough I 
also have the same problem when using `vibe-d` json library for 
parsing.
Jun 20 2023
next sibling parent reply Kagamin <spam here.lot> writes:
The program does 3 times more work and gets it done in 3 times 
more time: 1.5*3=4.5
Jun 20 2023
parent reply Alexandre Bourquelot <alexandre.bourquelot ahrefs.com> writes:
On Tuesday, 20 June 2023 at 10:29:04 UTC, Kagamin wrote:
 The program does 3 times more work and gets it done in 3 times 
 more time: 1.5*3=4.5
Thanks for your reply. I am using threads so the work should get done in the same time that it takes for one file to get processed, since its distributed across cores. I even wrote a similar C++ program just to be sure, that performs as expected.
Jun 20 2023
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 20 June 2023 at 10:39:42 UTC, Alexandre Bourquelot 
wrote:
 On Tuesday, 20 June 2023 at 10:29:04 UTC, Kagamin wrote:
 The program does 3 times more work and gets it done in 3 times 
 more time: 1.5*3=4.5
Thanks for your reply. I am using threads so the work should get done in the same time that it takes for one file to get processed, since its distributed across cores. I even wrote a similar C++ program just to be sure, that performs as expected.
try preallocating the memory you need. it might very well be that the GC allocation lock slows you down.
Jun 20 2023
prev sibling next sibling parent FeepingCreature <feepingcreature gmail.com> writes:
On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot 
wrote:
 Hello everyone. We have some D code running in production that 
 reads files containing lines of JSON data, that we would like 
 to parse and process.

 These files can be processed in parallel, so we create one 
 thread for processing each file. However I noticed significant 
 slowdowns when processing multiple files in parallel, as 
 opposed to processing only one file.

 ...
 
 Thanks in advance, this has been annoying me for a couple of 
 days and I have no idea what might be the problem. Strangely 
 enough I also have the same problem when using `vibe-d` json 
 library for parsing.
Yeah if you look with `perf record`, you will see that the program spends approximately all its runtime in the garbage collector. JSON parsing is very memory hungry. So you get no parallelization because the allocator takes a lock, and you also get the overhead of lots and lots of lock waits. I recommend using a streaming JSON parser like std_data_json https://github.com/dlang-community/std_data_json and loading into a well-typed data structure directly, to keep the amount of unnecessary allocations to a minimum.
Jun 20 2023
prev sibling next sibling parent Sergey <kornburn yandex.ru> writes:
On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot 
wrote:
 Hello everyone. We have some D code running in production that 
 reads files containing lines of JSON data, that we would like 
 to parse and process.
 Thanks in advance, this has been annoying me for a couple of 
 days and I have no idea what might be the problem. Strangely 
 enough I also have the same problem when using `vibe-d` json 
 library for parsing.
Btw if you want really fast solution, I can recommend to try ASDF library. There is also Mir-ION successor, but I haven't tried it. With help of ASDF I was able to prepare almost the best solution for JSON serde problem. Also with low memory consumption! https://programming-language-benchmarks.vercel.app/problem/json-serde
Jun 20 2023
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/20/23 5:31 AM, Alexandre Bourquelot wrote:

 Thanks in advance, this has been annoying me for a couple of days and I 
 have no idea what might be the problem. Strangely enough I also have the 
 same problem when using `vibe-d` json library for parsing.
The issue, undoubtedly, is memory allocation. Your JSON parsers (both std.json and vibe-d) allocate an AA for each object, and parse the entire string into a DOM structure. The D GC has a single global lock to allocate memory -- even memory that might be on a free list. So the threads are all bottlenecked on waiting their turn for the lock. Depending on what you want to do, like others here, I'd recommend a stream-based json parser. And then you also don't have to split it into lines. If the goal is to build a huge representation of all the data, then there's not much else to be done, unless you want to pre-allocate. But you may end up having to drive that yourself. I can possibly recommend, in addition to what others have mentioned, my jsoniopipe library. -Steve
Jun 20 2023
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On Wednesday, 21 June 2023 at 00:35:42 UTC, Steven Schveighoffer 
wrote:
 The D GC has a single global lock to allocate memory -- even 
 memory that might be on a free list. So the threads are all 
 bottlenecked on waiting their turn for the lock.
This would be something that's important enough to list on the spec page for the GC: https://dlang.org/spec/garbage.html It's only mentioned in passing here: https://dlang.org/articles/d-array-article.html#caching in the sentence "not to mention acquiring the global GC lock". In theory the GC is replaceable but I think we should document the behavior of the default one. I'll submit an issue for it.
Jun 21 2023
prev sibling parent Alexandre Bourquelot <alexandre.bourquelot ahrefs.com> writes:
On Wednesday, 21 June 2023 at 00:35:42 UTC, Steven Schveighoffer 
wrote:
 The issue, undoubtedly, is memory allocation. Your JSON parsers 
 (both std.json and vibe-d) allocate an AA for each object, and 
 parse the entire string into a DOM structure. The D GC has a 
 single global lock to allocate memory -- even memory that might 
 be on a free list. So the threads are all bottlenecked on 
 waiting their turn for the lock.
This makes a lot of sense. I ended up using asdf and it works great. Thank you everyone for your insight.
Jun 21 2023