www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Help optimize D solution to phone encoding problem: extremely slow

reply Renato <renato athaydes.com> writes:
I like to use a phone encoding problem to determine the 
strenghtness and weaknesses of programming languages because this 
problem is easy enough I can write solutions in any language in a 
few hours, but complex enough to exercise lots of interesting 
parts of the language.

You can check [my initial blog 
post](https://renato.athaydes.com/posts/revisiting-prechelt-paper-
omparing-languages) about this, which was an analysis or the original study by
Prechelt about the productivity differences between programmers using different
languages.

This original problem had a flaw when used in modern computers as 
the programs can find solutions so quickly that most of the time 
in the benchmarks was being used to actually print solutions to 
stdout, so I modified the problem so that there's an option to 
either print the solutions, or just count the number of solutions 
- so the programs still need to do all the work, but are not 
required to print anything other than how many solutions were 
found.

Anyway, I ported the Common Lisp solution to D because, like CL, 
D has built-in data structures like associative arrays and 
`BigInt` (at least it's in the stdlib)... I thought this would 
actually give D an edge! But it turns out D performs very badly 
for larger input sets. It starts quite well on smaller inputs, 
but scales very poorly.

My initial Rust solution also performed very poorly, so I'm 
afraid the same is happening with my initial D solution, despite 
my best effort to write something "decent".

You can find my D solution [in this Pull 
Request](https://github.com/renatoathaydes/prechelt-phone-number-encoding/pull/16).

The solution is almost identical, to the extent possible, to the 
Common Lisp solution, which can be [found 
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/lisp/main.lisp).

The secret to high performance for the algorithm being used is 
having a very efficient `BigInt` implementation, and fast hashing 
function for the hash table (associative array in D, with 
`BigInt` as keys). Hence, I suppose D's hash function or `BigInt` 
are not that fast (Rust's default hash is also not great due to 
security concerns, but it's very easy to use a custom one which 
is much faster by changing a single line of code).

Anyway, here's the current relative performance (the other 
languages are pretty heavily optimised, the Java solution uses a 
different algorithm so it's not directly comparable, [the Rust 
solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_
ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but
instead of `BigInt`, it uses a `Vec<u8>` as key as that turned out to be faster
- I may try that technique in D as well - but notice that even using Rust's
slower BigInt, it was orders of magnitude faster than D).

```
Benchmarking...
Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,109895680,377
java-Main,179634176,1025
java-Main,167149568,1621
java-Main,180203520,2493
java-Main,96481280,6112
java-Main,95526912,7989
===> sbcl --script src/lisp/main.fasl
sbcl,31780864,74
sbcl,79437824,888
sbcl,79613952,3991
sbcl,80654336,7622
sbcl,80420864,18623
sbcl,83402752,29503
===> ./rust
./rust,23257088,58
./rust,23437312,260
./rust,23433216,1077
./rust,23416832,2017
./rust,7106560,6757
./rust,7110656,10165
===> src/d/dencoder
src/d/dencoder,38748160,223
src/d/dencoder,75800576,3154
src/d/dencoder,75788288,14905
src/d/dencoder,75751424,30271
```

I had to abort D as it was taking too long.

The above is with the `print` option (that's normally slow when 
stdout is not buffered, but I did buffer D's writes and it's 
still very slow).

With the `count` option, which does not print anything except a 
number at the end, D took so long I couldn't even collect its 
results (I waited several minutes)... the other languages 
finished in much less than a minute:

```
Benchmarking...
Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,124112896,7883
java-Main,107487232,9273
===> sbcl --script src/lisp/main.fasl
sbcl,82669568,25638
sbcl,83759104,33501
===> ./rust
./rust,7061504,9488
./rust,7127040,11441
===> src/d/dencoder
^C
(ldc-1.36.0)
```

I tried to profile the D code but the profiler seems to be broken 
on my OS (Mac):

```
â–¶ dub build -b profile
     Starting Performing "profile" build using 
/Users/renato/dlang/ldc-1.36.0/bin/ldc2 for x86_64.
     Building prechelt ~master: building configuration 
[application]
      Linking dencoder
(ldc-1.36.0)
prechelt-phone-number-encoding/src/d  dlang ✗                     
                                                             9m â—’
â–¶ cd ../..
(ldc-1.36.0)
programming/projects/prechelt-phone-number-encoding  dlang ✗      
                                                             9m â—’
â–¶ src/d/dencoder
[1]    17877 illegal hardware instruction  src/d/dencoder
(ldc-1.36.0)
```

It's a bit hard to do any optmisation while "blind".

So, I decided to post it here to collect some hints about what to 
try and whether I hit some common performance bottlenecks in my 
current solution.

Would appreciate anything you can suggest, as long as you respect 
the fact that I'm still learning D so you can't expect me to know 
everything about it.
Jan 13
next sibling parent reply Renato <renato athaydes.com> writes:
On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I tried to profile the D code but the profiler seems to be 
 broken on my OS (Mac):
I profiled it on Linux and stored [the trace.log file](https://gist.github.com/renatoathaydes/fd8752ed81b0cf792ed7ef5aa3d68acd) on a public Gist. I tried using the HTML viewer [recommended in the wiki](https://wiki.dlang.org/Development_tools) but that doesn't work... first try: ``` Running ../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer Corrupt trace.log (can't compute ticks per second), please re-profile and try again ``` Second try with a longer trace (the one I saved in the gist): ``` Running ../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer std.conv.ConvException /home/renato/dlang/ldc-1.36.0/bin/../impo t/std/conv.d(2533): Unexpected '-' when converting from type char[] to type ulong ---------------- ??:? [0x55c4be9bc99e] ??:? [0x55c4be9bc612] ??:? [0x55c4be9de81e] ??:? [0x55c4be9c4fbf] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2533 [0x55c4be98ba2f] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2002 [0x55c4be98b6fc] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:210 [0x55c4be9665ec] ../../../.dub/packages/d-profile-viewer/~master/d-profile-view r/source/app.d:1095 [0x55c4be965e21] ../../../.dub/packages/d-profile-viewer/~master/d-profile-view r/source/app.d:1138 [0x55c4be96698a] ??:? [0x55c4be9c4c9c] ``` Not a great profiling experience :). Anyone has a better suggestion to "parse" the trace file?
Jan 13
parent reply Anonymouse <zorael gmail.com> writes:
On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
 Not a great profiling experience :). Anyone has a better 
 suggestion to "parse" the trace file?
As a drive-by suggestion and I hope it doesn't derail anything, but if you have the opportunity to run it on linux, have you tried profiling with callgrind instead, with {Q,K}Cachegrind to visualise things? Your repositories probably have them. (callgrind is a part of valgrind.) The wiki only mentions callgrind in passing, but it has worked well for me. [(example)](https://i.imgur.com/WWZAwy3.png)
Jan 13
parent reply Renato <renato athaydes.com> writes:
On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
 Not a great profiling experience :). Anyone has a better 
 suggestion to "parse" the trace file?
As a drive-by suggestion and I hope it doesn't derail anything, but if you have the opportunity to run it on linux, have you tried profiling with callgrind instead, with {Q,K}Cachegrind to visualise things? Your repositories probably have them. (callgrind is a part of valgrind.) The wiki only mentions callgrind in passing, but it has worked well for me. [(example)](https://i.imgur.com/WWZAwy3.png)
Thanks for the suggestion, this looks promising as I do have a Linux laptop (just not my main one). I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.
Jan 13
parent reply Sergey <kornburn yandex.ru> writes:
On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:
 On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.
In the repo is hard to find the proper version. I've checked the Rust from master branch and it looks a bit different from D implementation.. I would suggest to rewrite in the same way as Rust implemented. Probably you would like to try: * do not use BigInt from std. It could be quite slow. Try to use GMP library from Dub instead * don't do "idup" every time * instead of byLine, try byLineCopy * instead of "arr ~= data" try to use Appender (https://dlang.org/library/std/array/appender.html) * also you could try to use splitter (https://dlang.org/library/std/algorithm/iteration/splitter.html) to lazily process each part of the data * isLastDigit function has many checks, but I think it could be implemented easier in a Rust way * also consider to use functions from Range (filter, map) as you use it in Rust, instead of using for loops
Jan 13
next sibling parent Anonymouse <zorael gmail.com> writes:
On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
 I would suggest to rewrite in the same way as Rust implemented.
 Probably you would like to try:
 [...]
I would strongly argue for profiling first instead of optimising based on conjecture. If you profile you have solid evidence on what is actually slow. If you're very good at analysing D, well-educated hypotheses *may* be enough, until they suddenly aren't and you will have spent a lot of time on the wrong problem.
Jan 14
prev sibling parent Renato <renato athaydes.com> writes:
On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
 On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:
 On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.
In the repo is hard to find the proper version. I've checked the Rust from master branch and it looks a bit different from D implementation..
I explicitly linked to the Rust implementation in my question:
  the [Rust 
 solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_
ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but
instead of BigInt, it uses a Vec<u8> as key
Can you be more specific about which part of the Rust implementation is not roughly equivalent to the D implementation?? There are obvious differences in style and syntax, but I hope that it's mostly equivalent algorithm-wise. But to clarify: the branch where the fastest solutions are is called `fastest-implementations-print-or-count`. The D branches with my alternative implementations are all called `dlang-*`.
 I would suggest to rewrite in the same way as Rust implemented.
 Probably you would like to try:
 * do not use BigInt from std. It could be quite slow. Try to 
 use GMP library from Dub instead
Right, but as I posted above, even using a byte-array just like in Rust, the performance was still very bad (but around 2x faster than using BigInt). Also, using GMP is always suggested to me, but I can't accept that because my goal is not to use C libraries to achieve the fastest speed (which I could do by using C or FFI in any other language), it's to use D (and the other languages) to solve my problem in an idiomatic way. I [ended up using `Int128`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/98dcbcf1c77d1ded3406c 03af9026e126df5b2d) (the problem description requires more than i64, but int128 is enough), which grealy improved performance over my D baseline AND on the byte-array solution.
 * don't do "idup" every time
 * instead of byLine, try byLineCopy
`idup` is necessary because the strings are stored in a Map after the iteration ends. Anyway, I replaced that with `byLineCopy` and it became sligthtly slower.
 * instead of "arr ~= data" try to use Appender 
 (https://dlang.org/library/std/array/appender.html)
I was worried about `~=` allocating too much, though IIUC it shouldn't be a problem the way I used it... in any case, I [replaced it with `Appender`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/f1364d0897f1f37882f1d 9b92c16b84b1abdc31) and the performance was completely unchanged - which is good as I think it means I used `~=` correctly to avoid overallocation (or I messed up using `Appender` :D - pls have a look).
 * also you could try to use splitter 
 (https://dlang.org/library/std/algorithm/iteration/splitter.html) to lazily
process each part of the data
This is not necessary because that would only affect the time spent loading the dictionary, which is the faster part of the problem... nearly all of the time is spent looking for solutions instead.
 * isLastDigit function has many checks, but I think it could be 
 implemented easier in a Rust way
The Rust solution uses sum types for that, but that had a very minor effect on the overall performance (though in Rust, that's "cleaner" to use anyway)... I know D does have SumType in the stdlib, but I thought it is not as "zero cost" as in Rust, and because both variants wrap a String, it's awkward to use that... so I instead tried using a struct like this: ```d struct WordOrDigit { string value; bool isDigit; } ``` You can check [the commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0f6b4ce83373c46b14b5bb40c53bb0e2d0905e66). This change made the code slightly slower.
 * also consider to use functions from Range (filter, map) as 
 you use it in Rust, instead of using for loops
Why would for loops be slower? Or you're just saying I should make the D code nicer? Anyway, thanks for the suggestions! On Sunday, 14 January 2024 at 08:33:24 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
 I would suggest to rewrite in the same way as Rust implemented.
 Probably you would like to try:
 [...]
I would strongly argue for profiling first instead of optimising based on conjecture. If you profile you have solid evidence on what is actually slow. If you're very good at analysing D, well-educated hypotheses *may* be enough, until they suddenly aren't and you will have spent a lot of time on the wrong problem.
Totally agree. I will spend some time on my Linux machine to see if I can get more useful profiling data. For now, with the int128 change from BigInt, the code is about 4x faster than before, but on larger problems, it's still scaling very badly compared to other languages (this suggests there's still some "exponential growth" going on, which should not happen as the problem is not exponential). ``` Proc,Memory(bytes),Time(ms) ===> ./rust ./rust,23252992,59 ./rust,23420928,263 ./rust,23412736,1025 ./rust,7069696,8661 ===> src/d/dencoder src/d/dencoder,11268096,72 src/d/dencoder,23781376,938 src/d/dencoder,23818240,4443 src/d/dencoder,10723328,134703 ``` On the bright side: D is using almost as little memory as Rust, which is orders of magnitude better than the other, usual GC languages.
Jan 14
prev sibling parent reply Jordan Wilson <wilsonjord gmail.com> writes:
On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I like to use a phone encoding problem to determine the 
 strenghtness and weaknesses of programming languages because 
 this problem is easy enough I can write solutions in any 
 language in a few hours, but complex enough to exercise lots of 
 interesting parts of the language.

 [...]
Hello Renato, This seems to be quite a lot of calls: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 19204964 3761 3756 0 pure nothrow ref trusted immutable(char)[][] core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][], immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][], ulong) 19204924 8957 3474 0 safe void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][]) ``` This is when using the `words-quarter.txt` input (the `dictionary.txt` input seems to finish much faster, although still slower than `java`/`rust`). I also used only 100 phone numbers as input. My final observation is that `words-quarter.txt` contains some 1-letter inputs, (for example, `i` or `m`)...this may result in a large number of encoding permutations, which may explain the high number of recursion calls? Jordan
Jan 14
parent reply Renato <renato athaydes.com> writes:
On Sunday, 14 January 2024 at 10:02:58 UTC, Jordan Wilson wrote:
 On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I like to use a phone encoding problem to determine the 
 strenghtness and weaknesses of programming languages because 
 this problem is easy enough I can write solutions in any 
 language in a few hours, but complex enough to exercise lots 
 of interesting parts of the language.

 [...]
Hello Renato, This seems to be quite a lot of calls: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 19204964 3761 3756 0 pure nothrow ref trusted immutable(char)[][] core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][], immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][], ulong) 19204924 8957 3474 0 safe void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][]) ``` This is when using the `words-quarter.txt` input (the `dictionary.txt` input seems to finish much faster, although still slower than `java`/`rust`). I also used only 100 phone numbers as input. My final observation is that `words-quarter.txt` contains some 1-letter inputs, (for example, `i` or `m`)...this may result in a large number of encoding permutations, which may explain the high number of recursion calls? Jordan
The characteristics of the dictionary impact the number of solutions greatly. I explored this in my blog post about [Common Lisp, Part 2](https://renato.athaydes.com/posts/revenge_of_lisp-part-2). The fact there's a ridiculous amount of function calls is why this problem can take minutes even without having to allocate much memory or print anything. ** I've managed to improve the D code enough that it is now faster than Common Lisp and the equivalent algorithm in Java.** It took some profiling to do that, though... thanks to Anonymouse for the suggestion to use Valgrind... with that, I was able to profile the code nicely (Valgrind works nicely with D, it doesn't even show mangled names!). Here's what I did. First: the solution using int128 was much faster than the BigInt solution, as I had already mentioned. But when profiling that, it was clear that for the problems with a very large number of solutions, GC became a problem: ``` -------------------------------------------------------------------------------- 23,044,096,944 (100.0%) PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 7,079,878,197 (30.72%) ???:core.internal.gc.impl.conservative.gc.Gcx.mark!(false, true, true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRang !(false).ScanRange) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 2,375,100,857 (10.31%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][])'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,971,210,820 ( 8.55%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,922,961,924 ( 8.34%) ???:_d_arraysetlengthT [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,298,747,622 ( 5.64%) ???:core.int128.mul(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,134,644,706 ( 4.92%) ???:core.internal.gc.bits.GCBits.setLocked(ulong) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 849,587,834 ( 3.69%) ???:core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref ulong, uint, const(TypeInfo)) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 827,407,988 ( 3.59%) ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcp _avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6] 688,845,027 ( 2.99%) ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6] 575,415,884 ( 2.50%) ???:_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZ QDd6memory8BlkInfo_ [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 562,146,592 ( 2.44%) ???:core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 526,067,586 ( 2.28%) ???:core.internal.spinlock.SpinLock.lock() shared [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] ``` So, I [decided to use `std.container.Array`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/b927bc9cc4e33f9f6ba45 d8abc3bccbc17c7f1e) (I had tried Appender before but that didn't really work) to make sure no allocation was happening when adding/removing words from the `words` which are carried through the `printTranslations` calls (that's the hot path). As you can see in the commit, the code is just slightly less convenient... this made the code run considerably faster for the very long runs! Here's the Callgrind profiling data AFTER this change: ``` 6,547,316,944 (32.42%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array)'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 5,229,076,596 (25.90%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 3,871,644,402 (19.17%) ???:core.int128.mul(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 677,533,800 ( 3.36%) ???:std.int128.Int128.toHash() const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 543,388,688 ( 2.69%) ???:std.int128.Int128.this(core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 543,388,688 ( 2.69%) ???:std.int128.Int128.this(long, long) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 542,027,045 ( 2.68%) ???:object.TypeInfo_Struct.getHash(scope const(void*)) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 424,849,937 ( 2.10%) ???:object.TypeInfo_Struct.equals(in void, in void) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] ``` No more GC! It seems that now, as expected, most time is spent calculating hashes for each possible solution (as it should be). I am not entirely sure what the `_aaInX` call is, but I believe that's the associative array's membership check? The D profiler seems to show similar data: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 104001600 158 158 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("*").opBinary(std.int128.Int128) 402933936 63 63 0 const pure nothrow property nogc safe bool std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized() 183744166 87 59 0 inout pure nothrow ref property nogc return safe inout(std.container.array.Array!(immutable(char)[]).Array.Payload) std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.refCountedPayload() 103784880 68 38 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+", int).opBinary(const(int)) 103784880 99 31 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("+", int).opOpAssign(int) 104001600 30 30 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+").opBinary(std.int128.Int128) 61167377 67 29 0 pure nothrow nogc void std.container.array.Array!(immutable(char)[]).Array.__fieldDtor() 43444575 61 27 0 pure nothrow nogc safe void core.internal.lifetime.emplaceRef!(immutable(char)[], immutable(char)[], immutable(char)[]).emplaceRef(ref immutable(char)[], ref immutable(char)[]) 48427530 79 27 0 const pure nothrow property nogc safe bool std.container.array.Array!(immutable(char)[]).Array.empty() 61167377 38 27 0 pure nothrow nogc void std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.__dtor() 43444575 130 24 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 61167376 32 23 0 pure nothrow nogc safe void std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.__postblit() ``` Anyway, after this change, the code now scaled much better. But there's nothing much lest to be optimised... except the arithmetics (notice how the `("*").opBinary` call is near the top. I know how to make that faster using a similar strategy that I used in Rust (you can check my [journey to optimise the Rust code on my blog post](https://renato.athaydes.com/posts/how-to-write-fast-rust-code) about that - specifically, check the `Using packed bytes for more efficient storage` section)... knowing how the encoding works, I knew that multiplication is not really needed. So, instead of doing this: ```d n *= 10; n += c - '0'; ``` I could do this: ```d n <<= 4; n += c; ``` This changes the actual number, but that doesn't matter: the code is still unique for each number, which is all that we need. You can see [the full commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c). This improved the performance for sure, even if not by much. The profiling data after this arithmetic trick looks like this: Valgrind: ``` 4,821,217,799 (35.75%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array)'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 4,241,404,850 (31.45%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 1,038,733,782 ( 7.70%) ???:core.int128.shl(core.int128.Cent, uint) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 575,372,270 ( 4.27%) ???:std.int128.Int128.toHash() const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 461,659,456 ( 3.42%) ???:std.int128.Int128.this(core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 460,297,816 ( 3.41%) ???:object.TypeInfo_Struct.getHash(scope const(void*)) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 347,198,989 ( 2.57%) ???:object.TypeInfo_Struct.equals(in void, in void) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 288,537,160 ( 2.14%) ???:core.int128.add(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] ``` D Profiler: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 29879388 44037 11267 0 void dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array) 126827950 4306 3599 0 inout pure nothrow ref property nogc return safe inout(std.container.array.Array!(immutable(char)[]).Array.Payload) std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.refCountedPayload() 29879268 7293 3100 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 33534720 4896 2780 0 const pure nothrow property nogc safe bool std.container.array.Array!(immutable(char)[]).Array.empty() 29879268 11834 2479 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 29879268 8702 2279 0 pure nothrow nogc safe void std.container.array.Array!(immutable(char)[]).Array.removeBack() 282919518 2045 2045 0 const pure nothrow property nogc safe bool std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized() 29879268 2534 1859 0 pure nothrow nogc safe void core.internal.lifetime.emplaceRef!(immutable(char)[], immutable(char)[], immutable(char)[]).emplaceRef(ref immutable(char)[], ref immutable(char)[]) 44511077 3264 1505 0 pure nothrow nogc void std.container.array.Array!(immutable(char)[]).Array.__fieldDtor() 44511076 2588 1254 0 pure nothrow nogc scope void std.container.array.Array!(immutable(char)[]).Array.__fieldPostblit() 49758574 1684 1243 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+", char).opBinary(const(char)) 49758574 2907 1223 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("+", immutable(char)).opOpAssign(ref immutable(char)) 49975294 1796 1208 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("<<", int).opOpAssign(int) ``` As far as I can tell, the only two bottlenecks now bacame: * `std.typecons.RefCounted!(Array.Payload, 0).RefCounted.refCountedPayload()` * the `Int128` implementation! To fix the former, I would need to implement my own `Array`, I suppose... using ref-counting here seems unnecessary?? But that is a bit beyond what I am willing to do! The latter could probably be fixed by not using a numeric key at all, just bytes - but my previous attempt at doing that didn't really give better results. Or perhaps by overriding the hashing for Int128 (I didn't try that because I assume the default should be pretty optimal already)?? So, I ran out of options and am going to call it done. [Here's my final D solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d). The code is still readable (I think everyone can agree D is one of the most readable languages of all)! You can see how it performs in the last comparison run I did ([all data, including a nice chart, in this gist](https://gist.github.com/renatoathaydes/ab5c86b0ea59152693a7236c333ac334)): ``` Proc,Run,Memory(bytes),Time(ms) ===> java -Xms20M -Xmx100M -cp build/java Main java-Main,3190730752,441 java-Main,3194789888,1440 java-Main,3193692160,2316 java-Main,3194720256,3604 java-Main,3192963072,12861 java-Main,3261128704,31282 ===> sbcl --script src/lisp/main.fasl sbcl,1275994112,83 sbcl,1275998208,1396 sbcl,1275998208,5925 sbcl,1275998208,9752 sbcl,1275998208,64811 sbcl,1276006400,153799 ===> ./rust ./rust,23138304,56 ./rust,23138304,234 ./rust,23138304,1288 ./rust,23138304,2444 ./rust,9027584,15867 ./rust,9027584,36985 ===> src/d/dencoder src/d/dencoder,219041792,67 src/d/dencoder,229904384,1114 src/d/dencoder,229904384,4421 src/d/dencoder,229904384,9087 src/d/dencoder,219041792,51315 src/d/dencoder,219041792,120818 ``` Using `Int128` as a key instead of `BigInt` made the code much faster (around 4x faster). Using `Array` to avoid too many reallocations with primitive dynamic arrays made the code run much faster for very large numbers of calls (very little difference when running for less than 10s, but at least 2x faster at round 1min runs where the tiny overhead adds up). As you can see [in the chart I posted](https://gist.github.com/renatoathaydes/ab5c86b0ea5915 693a7236c333ac334), D's speed performance is close to that of Common Lisp for all runs, though between 10 and 20% faster. It's nowhere near Rust, unfortunately, with Rust being almost 4x faster (pls ignore the Java solution as that's using a Trie instead of the "numeric" solution - so it's not a fair comparison). I had expected to get very close to Rust, but that didn't happen... I just can't see in the profiling data what's causing D to fall so far behind! On the memory data: D uses much less memory than Java and Common Lisp, but still a lot higher than Rust. If anyone can find any flaw in my methodology or optmise my code so that it can still get a couple of times faster, approaching Rust's performance, I would greatly appreciate that! But for now, my understanding is that the most promising way to get there would be to write D in `betterC` style?!
Jan 14
parent reply Sergey <kornburn yandex.ru> writes:
On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my 
 code so that it can still get a couple of times faster, 
 approaching Rust's performance, I would greatly appreciate 
 that! But for now, my understanding is that the most promising 
 way to get there would be to write D in `betterC` style?!
I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
Jan 14
parent reply Renato <renato athaydes.com> writes:
On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:
 On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my 
 code so that it can still get a couple of times faster, 
 approaching Rust's performance, I would greatly appreciate 
 that! But for now, my understanding is that the most promising 
 way to get there would be to write D in `betterC` style?!
I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
As discussed on GitHub, the line-by-line port of the Rust code is 5x slower than [my latest solution using int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/sr /d/src/dencoder.d), which is itself 3 to 4x slower than the Rust implementation (at around the same order of magnitude as algorithm-equivalent Java and Common Lisp implementations, D is perhaps 15% faster). I did the best I could to make D run faster, but we hit a limit that's a bit hard to get past now. Happy to be given suggestions (see profiling information in previous messages), but I've run out of ideas myself.
Jan 15
next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Jan 15, 2024 at 08:10:55PM +0000, Renato via Digitalmars-d-learn wrote:
 On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:
 On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my code so
 that it can still get a couple of times faster, approaching Rust's
 performance, I would greatly appreciate that! But for now, my
 understanding is that the most promising way to get there would be
 to write D in `betterC` style?!
I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
As discussed on GitHub, the line-by-line port of the Rust code is 5x slower than [my latest solution using int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d), which is itself 3 to 4x slower than the Rust implementation (at around the same order of magnitude as algorithm-equivalent Java and Common Lisp implementations, D is perhaps 15% faster). I did the best I could to make D run faster, but we hit a limit that's a bit hard to get past now. Happy to be given suggestions (see profiling information in previous messages), but I've run out of ideas myself.
This problem piqued my interest, so yesterday and today I worked on it and came up with my own solution (I did not look at existing solutions in order to prevent bias). I have not profiled it or anything, but the runtime seems quite promising. Here it is: ---------------------------------snip---------------------------------- /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. if (impl(dict, suffix, path ~ word, true)) { allowDigit = false; ret = true; } } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } impl(dict, phoneNumber[], [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ---------------------------------snip---------------------------------- The underlying idea is to store the dictionary entries in a trie (radix tree) keyed by the prescribed digit mapping. Once this is done, encoding a phone number is as simple as following each digit down the trie and enumerating words along the path. This means that once the dictionary is encoded, we no longer need to do any digit mappings; the digits of the input phone number is enough to determine the path down the trie. Of course, pursuant to the original specifications of the problem, the original word forms are stored as-is along each node corresponding with the end of the word, so once a match is found the program simply enumerates the words found at that node. // Unfortunately there seems to be some discrepancy between the output I got and the prescribed output in your repository. For example, in your output the number 1556/0 does not have an encoding, but isn't "1 Mai 0" a valid encoding according to your dictionary and the original problem description? T -- Engineering almost by definition should break guidelines. -- Ali Çehreli
Jan 16
parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:
 Unfortunately there seems to be some discrepancy between the 
 output I got and the prescribed output in your repository. For 
 example, in your output the number 1556/0 does not have an 
 encoding, but isn't "1 Mai 0" a valid encoding according to 
 your dictionary and the original problem description?
You are not allowed to emit "1" as the first token in the output as long as there are any dictionary word matches at that position. The relevant paragraph from the problem statement: ==snip== Encodings of phone numbers can consist of a single word or of multiple words separated by spaces. The encodings are built word by word from left to right. If and only if at a particular point no word at all from the dictionary can be inserted, a single digit from the phone number can be copied to the encoding instead. Two subsequent digits are never allowed, though. To put it differently: In a partial encoding that currently covers k digits, digit k+1 is encoded by itself if and only if, first, digit k was not encoded by a digit and, second, there is no word in the dictionary that can be used in the encoding starting at digit k+1. ==snip== I also spent a bit of time trying to figure out this nuance when implementing my solution. It doesn't make much sense visually (no back-to-back digits in the output either way), but that's how it is.
Jan 16
parent reply Renato <renato athaydes.com> writes:
On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka 
wrote:
 On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:
 Unfortunately there seems to be some discrepancy between the 
 output I got and the prescribed output in your repository. For 
 example, in your output the number 1556/0 does not have an 
 encoding, but isn't "1 Mai 0" a valid encoding according to 
 your dictionary and the original problem description?
You are not allowed to emit "1" as the first token in the output as long as there are any dictionary word matches at that position. The relevant paragraph from the problem statement: ==snip== Encodings of phone numbers can consist of a single word or of multiple words separated by spaces. The encodings are built word by word from left to right. If and only if at a particular point no word at all from the dictionary can be inserted, a single digit from the phone number can be copied to the encoding instead. Two subsequent digits are never allowed, though. To put it differently: In a partial encoding that currently covers k digits, digit k+1 is encoded by itself if and only if, first, digit k was not encoded by a digit and, second, there is no word in the dictionary that can be used in the encoding starting at digit k+1. ==snip== I also spent a bit of time trying to figure out this nuance when implementing my solution. It doesn't make much sense visually (no back-to-back digits in the output either way), but that's how it is.
Exactly, this is one of the things that make this problem a bit annoying to solve :) "H. S. Teoh" you implemented the solution as a Trie!! Nice, that's also what I did when I "participated" in the study. Here's [my Trie solution in Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java). These are basically the two common approaches to the problem: a Trie or a numeric-based table. According to the study, people who use scripting languages almost always go with the numeric approach, while people coming from lower level languages tend to use a data structure like Trie (if they don't know Trie, they come up with something similar which is fascinating), which is harder to implement but more efficient in general. Can I ask you why didn't you use the [D stdlib Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure that would've worked, but did you consider that?
Jan 16
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 16, 2024 at 06:54:56PM +0000, Renato via Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka wrote:
[...]
 You are not allowed to emit "1" as the first token in the output as
 long as there are any dictionary word matches at that position. The
 relevant paragraph from the problem statement:
Ohhh now I get it. Initially I misunderstood that as saying that if the rest of the phone number has at least one match, then a digit is not allowed. Now I see that what it's actually saying is that even if some random dictionary word matches at that position, even if it does not lead to any full matches, then a digit is excluded. [...]
 I also spent a bit of time trying to figure out this nuance when
 implementing my solution. It doesn't make much sense visually (no
 back-to-back digits in the output either way), but that's how it is.
Exactly, this is one of the things that make this problem a bit annoying to solve :)
It's a strange requirement, for sure, but I don't think it's annoying. It makes the problem more Interesting(tm). ;-) Anyway, I've fixed the problem, now my program produces the exact same output as Renato's repo. Code is posted below. Interestingly enough, the running time has now halved to about 0.9 seconds for 1 million phone numbers. I guess that's caused by the more stringent requirement excluding many more matching possibilities, effectively pruning away large parts of the search tree.
  "H. S. Teoh" you implemented the solution as a Trie!! Nice, that's
 also what I did when I "participated" in the study. Here's [my Trie
 solution in
 Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java).
 
 These are basically the two common approaches to the problem: a Trie
 or a numeric-based table. According to the study, people who use
 scripting languages almost always go with the numeric approach, while
 people coming from lower level languages tend to use a data structure
 like Trie (if they don't know Trie, they come up with something
 similar which is fascinating), which is harder to implement but more
 efficient in general.
Interesting. I guess my C/C++ background is showing. ;-) I'm not sure what exactly motivated me to go this route; I guess it was just my default preference of choosing the path of least work as far as the algorithm is concerned: I chose the algorithm that did the least amount of work needed to produce the right answer. Scanning through sections of the dictionary to find a match was therefore excluded; so my first thought was an AA. But then how much of the initial prefix to look up an in AA? Since it can't be known beforehand, I'd have to gradually lengthen the prefix to search for, which does a lot of repetitive work (we keep looking up the first few digits repeatedly each time we search for a longer prefix). Plus, multiple consecutive AA lookups is not cache-friendly. So my next thought was, what should I do such that I don't have to look at the initial digits anymore once I already processed it? This line of thought naturally led to a trie structure. Once I arrived at a trie structure, the next question was how exactly dictionary entries would be stored in it. Again, in the vein of doing the least amount of work I could get away with, I thought, if I stored words in the trie directly, with each edge encoding a letter, then during the search I'd have to repeatedly convert letters to the corresponding phone number digit and vice versa. So why not do this conversion beforehand, and store only phone digits in the trie? This also had the additional benefit of letting me effectively search multiple letters simultaneously, since multiple letters map to the same digit, so scanning a digit is equivalent to searching multiple letters at the same time. The output, of course, required the original form of the words -- so the obvious solution was to attach the original words as a list of words attached to the trie node representing the end of that word. Once this was all decided, the only remaining question was the search algorithm. This turned out to take the most time in solving this problem, due to the recursive nature of the search, I had to grapple with where and how to make the recursive calls, and how to propagate return values correctly. The initial implementation only found word matches, and did not allow the single digits. Happily, the recursive algorithm turned out to have enough structure to encode the single digit requirements as well, although it took a bit of trial and error to find the correct implementation.
 Can I ask you why didn't you use the [D stdlib
 Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure
 that would've worked, but did you consider that?
Haha, I didn't even think of that. :-D I wouldn't have wanted to use it anyway, because it was optimized for single codepoint lookups, and wouldn't have matched what I wanted to do. Besides, I also understood it as an internal helper structure used by std.uni, so it isn't really designed as a trie implementation for general use. Also, since performance was an issue in your OP, I wanted to avoid off-the-bat any hidden costs that may be present in a pre-existing trie implementation that I did not fully understand. Maybe I'm just affected with NIH syndrome, but IME, when performance is important you usually want to write a custom data structure for your inner loop, one that's fine-tuned to your specific algorithm. Otherwise the impedance mismatch may give you suboptimal results. T -- Knowledge is that area of ignorance that we arrange and classify. -- Ambrose Bierce
Jan 16
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 Anyway, I've fixed the problem, now my program produces the exact same
 output as Renato's repo. Code is posted below.
[...] Oops, forgot to actually paste the code. Here it is: ------------------------------------snip------------------------------------ /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. ret = true; if (impl(dict, suffix, path ~ word, true)) allowDigit = false; } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Mai Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/7-3576-", "/8-", "1556/0", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15" * matches "ja", thus excluding a digit in that position. */ ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ------------------------------------snip------------------------------------ T -- The early bird gets the worm. Moral: ewww...
Jan 16
parent reply Renato <renato athaydes.com> writes:
On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:
 On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 Anyway, I've fixed the problem, now my program produces the 
 exact same output as Renato's repo. Code is posted below.
[...]
Great, I ran the benchmarks for you :) I had to change how you accept arguments, even though you did "the right thing" using `getopt`, the solutions should just take a `count` or `print` argument first... Anyway, here's your result: ``` ===> ./rust ./rust,24133632,25 ./rust,24739840,130 ./rust,24477696,536 ./rust,25247744,1064 ./rust,8175616,6148 ./rust,8306688,8315 ===> src/d/dencoder src/d/dencoder,46055424,43 src/d/dencoder,96337920,146 src/d/dencoder,102350848,542 src/d/dencoder,102268928,1032 src/d/dencoder,40206336,99936 ^C ``` It took too long with the `count` option, so I had to abort before the last run ended... there's probably some bug there, otherwise the Trie runs very fast, as I had expected. For the record (I already posted this on GitHub)... here's [my current fastest solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s c/d/src/dencoder.d) time using the same algorithm as Rust (after I fixed my solution that was using `int128`, which was not valid, as ssvb pointed out): ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23986176,24 ./rust,24346624,118 ./rust,24543232,525 ./rust,24346624,1027 ./rust,8011776,3830 ./rust,8486912,18403 ===> src/d/dencoder src/d/dencoder,13615104,30 src/d/dencoder,24723456,374 src/d/dencoder,24592384,1750 src/d/dencoder,24788992,3472 src/d/dencoder,11517952,10235 src/d/dencoder,11517952,51211 ``` Not great :( But ssvb came up with a really fast implementation, though totally different algorithm (so dont' take this as evidence of D being faster than Rust or anything like that): ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23986176,24 ./rust,24461312,135 ./rust,24494080,514 ./rust,24526848,981 ./rust,8175616,3647 ./rust,8011776,15404 ===> src/d/dencoder src/d/dencoder,16433152,30 src/d/dencoder,16613376,125 src/d/dencoder,16613376,468 src/d/dencoder,16613376,941 src/d/dencoder,5570560,12 src/d/dencoder,6701056,18 ``` I can't explain why it's so incredibly fast, specially for the `count` case. I tried using the same hashing function on my solution, but that didn't really help me! Pretty cool to see different solutions to the problem, but I'm still curious to know where the solution I wrote is being slow compared to Rust which is using identical, to my understanding, code! I'm sure people could write incredibly fast code to solve this problem, but what I am really curious about is what the code I wrote is doing wrong that causes it to run 4x slower than Rust despite doing "the same thing"...
Jan 16
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 I can't explain why it's so incredibly fast, specially for the 
 `count` case. I tried using the same hashing function on my 
 solution, but that didn't really help me!
That's dynamic programming with memoization. Basically caching the already calculated results (in the `dp` array) and avoiding recalculations when the recursive function revisits the same state again.
Jan 16
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 16, 2024 at 09:15:19PM +0000, Renato via Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:
 On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via
 Digitalmars-d-learn wrote: [...]
 Anyway, I've fixed the problem, now my program produces the exact
 same output as Renato's repo. Code is posted below.
[...]
Great, I ran the benchmarks for you :) I had to change how you accept arguments, even though you did "the right thing" using `getopt`, the solutions should just take a `count` or `print` argument first...
Oops, haha :-P
 Anyway, here's your result:
 
 ```
 ===> ./rust
 ./rust,24133632,25
 ./rust,24739840,130
 ./rust,24477696,536
 ./rust,25247744,1064
 ./rust,8175616,6148
 ./rust,8306688,8315
 ===> src/d/dencoder
 src/d/dencoder,46055424,43
 src/d/dencoder,96337920,146
 src/d/dencoder,102350848,542
 src/d/dencoder,102268928,1032
 src/d/dencoder,40206336,99936
 ^C
 ```
 
 It took too long with the `count` option, so I had to abort before the
 last run ended... there's probably some bug there, otherwise the Trie
 runs very fast, as I had expected.
[...] Do you have the problematic data file handy? I'd like to look into any potential bugs. Also, the profiler revealed that a lot of time was spent in the GC and in small allocations. The cause is in all likelihood the .format() call for each found match, and array append being used for the recursive calls. Getting rid of the .format ought to speed it up a bit. Will try that now... T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Jan 16
parent reply Renato <renato athaydes.com> writes:
On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
 used for the recursive calls. Getting rid of the .format ought 
 to speed it up a bit. Will try that now...
That will make no difference for the `count` option which is where your solution was very slow. To run the slow test manually use the `words_quarter.txt` dictionary (the phone numbers file doesn't matter much - it's all in the dictionary). But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.
Jan 16
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
 used for the recursive calls. Getting rid of the .format ought to
 speed it up a bit. Will try that now...
 
That will make no difference for the `count` option which is where your solution was very slow.
Of course it will. Passing the data directly to the callback that bumps a counter is faster than allocating a new string, formatting the data, and then passing it to the callback that bumps a counter. It may not look like much, but avoiding unnecessary GC allocations means the GC will have less work to do later when a collection is run, thus you save time over the long term.
 To run the slow test manually use the `words_quarter.txt` dictionary
 (the phone numbers file doesn't matter much - it's all in the
 dictionary).
 
 But pls run the benchmarks yourself as I am not going to keep running
 it for you, and would be nice if you posted your solution on a Gist
 for example, pasting lots of code in the forum makes it difficult to
 follow.
I'll push the code to github. T -- "No, John. I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
Jan 17
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:
[...]
 But pls run the benchmarks yourself as I am not going to keep running
 it for you, and would be nice if you posted your solution on a Gist
 for example, pasting lots of code in the forum makes it difficult to
 follow.
I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked. Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this. Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark. T -- Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.
Jan 17
parent Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 16:30:08 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via 
 Digitalmars-d-learn wrote: [...]
 But pls run the benchmarks yourself as I am not going to keep 
 running it for you, and would be nice if you posted your 
 solution on a Gist for example, pasting lots of code in the 
 forum makes it difficult to follow.
I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked. Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this. Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark. T
I've just updated the Rust version to the benchmark monitor could work on Linux (it only worked on Mac before) :D that's probably why your rustc didn't work, though as the project is still using edition2018 I would've thought even a very old compiler would have worked... anyway, if you ever find yourself actually using Rust, you should use `rustup` (https://rustup.rs/) which makes it trivial to update Rust. About the "count" option: I had assumed you didn't call format on the count option as it's never needed, there's nothing to print.
Jan 17
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 I'll push the code to github.
[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T -- Why do conspiracy theories always come from the same people??
Jan 17
next sibling parent Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 I'll push the code to github.
[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T
Ok, last time I'm running this for someone else :D ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23920640,30 ./rust,24018944,147 ./rust,24068096,592 ./rust,24150016,1187 ./rust,7766016,4972 ./rust,8011776,46101 ===> src/d/dencoder src/d/dencoder,44154880,42 src/d/dencoder,51347456,87 src/d/dencoder,51380224,273 src/d/dencoder,51462144,441 src/d/dencoder,18644992,4414 src/d/dencoder,18710528,43548 ``` Congratulations on beating Rust :D but remember: you're using a much more efficient algorithm! I must conclude that the Rust translation of the Trie algorithm would be much faster still, unfortunately (you may have noticed that I am on D's side here!).
Jan 18
prev sibling parent Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 I'll push the code to github.
[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T
BTW here's you main function so it can run on the benchmark: ```d int main(string[] args) { bool countOnly = args.length > 1 ? (() { final switch (args[1]) { case "count": return true; case "print": return false; } })() : false; auto dictfile = args.length > 2 ? args[2] : "tests/words.txt"; auto input = args.length > 3 ? args[3] : "tests/numbers.txt"; Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(File(input).byLine, dict, (phone, match) { count++; }); writefln("%d", count); } else { encodePhoneNumbers(File(input).byLine, dict, (phone, match) { writefln("%s: %s", phone, match); }); } return 0; } ```
Jan 18
prev sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 For the record (I already posted this on GitHub)... here's [my 
 current fastest 
 solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s
c/d/src/dencoder.d) time using the same algorithm as Rust ...
[...]
 ... what I am really curious about is what the code I wrote is 
 doing wrong that causes it to run 4x slower than Rust despite 
 doing "the same thing"...
It's a GC allocations fest. Things like this make it slow: ```diff { - string digit = [digits[0]]; + string digit = digits[0 .. 1]; words.insertBack(digit); ``` And at the top is the associative array lookup (when profiling the handling of the "phones_1_000_000_with_empty.txt" input file): ``` 36.85% dencoder dencoder [.] _aaInX 12.38% dencoder dencoder [.] void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array) 6.43% dencoder dencoder [.] nothrow nogc scope void core.internal.gc.impl.conservative.gc.Gcx.mark!(false, true, true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRange!(false).ScanRange) 4.53% dencoder dencoder [.] pure safe void std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult.popFront() 4.08% dencoder dencoder [.] _d_array_slice_copy 2.43% dencoder dencoder [.] _d_newarrayU 2.21% dencoder dencoder [.] shared nothrow nogc trusted void core.internal.spinlock.SpinLock.lock() 2.00% dencoder dencoder [.] pure safe void std.array.Appender!(immutable(char)[]).Appender.put!(dchar).put(dchar) 1.91% dencoder dencoder [.] nothrow void* core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref ulong, uint, const(TypeInfo)) 1.67% dencoder dencoder [.] _DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_ 1.66% dencoder dencoder [.] pure nothrow nogc scope trusted ulong core.internal.gc.bits.GCBits.setLocked(ulong) 1.60% dencoder dencoder [.] const pure nothrow trusted ulong object.TypeInfo_Struct.getHash(scope const(void*)) 1.53% dencoder dencoder [.] nothrow safe void core.internal.util.array.enforceRawArraysConformable(const(char[]), const(ulong), const(void[]), const(void[]), const(bool)) 1.49% dencoder dencoder [.] nothrow void* core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) 1.27% dencoder dencoder [.] pure nothrow safe void std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong) 0.81% dencoder dencoder [.] pure property safe dchar std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult.front() 0.79% dencoder dencoder [.] nothrow safe void core.internal.util.array._enforceNoOverlap(const(char[]), ulong, ulong, const(ulong)) 0.74% dencoder dencoder [.] nothrow void core.internal.gc.impl.conservative.gc.Pool.setBits(ulong, uint) 0.73% dencoder dencoder [.] pure safe void std.array.Appender!(immutable(char)[]).Appender.put!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult).put(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult) 0.70% dencoder dencoder [.] pure safe ulong std.range.primitives.walkLength!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult).walkLength(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult) 0.60% dencoder dencoder [.] bool dencoder.lastItemIsDigit(std.container.array.Array!(immutable(char)[]).Array) 0.54% dencoder dencoder [.] _d_newarrayT [...] ```
Jan 16
next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 16, 2024 at 10:15:04PM +0000, Siarhei Siamashka via
Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
[...]
 ... what I am really curious about is what the code I wrote is doing
 wrong that causes it to run 4x slower than Rust despite doing "the
 same thing"...
It's a GC allocations fest.
Indeed. I have just completed 2 rounds of optimizations of my version of the code, and both times the profiler also showed the problem to be excessive allocations in the inner loop. So, I did the following optimizations: 1) Get rid of .format in the inner loop. Not only does .format cause a lot of allocations, it is also a known performance hog. So instead of constructing the output string in the search function, I changed it to take a delegate instead, and the delegate either counts the result or prints it directly (bypassing the construction of an intermediate string). This improved performance quite a bit for the count-only runs, but also wins some performance even when output is generated. Overall, however, this optimization only gave me some minor savings. 2) Changed the `path` parameter from string[] to string, since I didn't really need it to be an array of strings anyway. This in itself only improved performance marginally, barely noticeable, but it led to (3), which gave a huge performance boost. 3) Basically, in the earlier version of the code, the `path` parameter was appended to every time I recursed, and furthermore the same initial segment gets appended to many times with different trailers as the algorithm walks the trie. As a result, this triggers a lot of array reallocations to store the new strings. Most of these allocations are unnecessary, because we already know that the initial segment of the string will stay constant, only the tail end changes. Furthermore, we only ever have a single instance of .path at any point in time in the algorithm. So we could use a single buffer to hold all of these instances of .path, and simply return slices to it as we go along, overwriting the tail end each time we need to append something. This significantly cut down on the number of allocations, and along with (1) and (2), performance improved by about 3x (!). It didn't completely remove all allocations, but I'm reasonably happy with the performance now that I probably won't try to optimize it more unless it's still losing out to another language. ;-) (I'm especially curious to see if this beats the Rust version. :-P) Optimized version of the code: -----------------------------------snip------------------------------------ /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } alias MatchCallback = void delegate(const(char)[] phone, const(char)[] match); /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(Trie dict, const(char)[] phoneNumber, MatchCallback cb) { /* * Optimization: use a common buffer for constructing the output string, * instead of using string append, which allocates many new strings. * * The `path` parameter passed to .impl is always a slice of .buffer, * either its current instance or a previous instance left over from a * buffer reallocation. As such, its contents are always an initial segment * of whatever's in .buffer, so appending is just a matter of writing to * the tail end of the buffer and returning a slice of the new buffer. * * The fact that .path higher up the call tree may be pointing to old * versions of .buffer is not a problem; contentwise they are always an * initial segment of the current .buffer, so any subsequent appends will * copy the new word to the right place in the current .buffer and return a * slice to it, and the contents will always be consistent. */ static char[] buffer; const(char)[] appendPath(const(char)[] path, const(char)[] word) { // Assumption: path is an initial segment of buffer (either its current // incarnation or a pre-reallocated initial copy). So we don't need to // copy the initial segment, just whatever needs to be appended. if (path.length == 0) { auto newlen = word.length; if (buffer.length < newlen) buffer.length = newlen; buffer[0 .. newlen] = word[]; return buffer[0 .. newlen]; } else { auto newlen = path.length + 1 + word.length; if (buffer.length < newlen) buffer.length = newlen; buffer[path.length] = ' '; buffer[path.length + 1 .. newlen] = word[]; return buffer[0 .. newlen]; } } bool impl(Trie node, const(char)[] suffix, const(char)[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { cb(phoneNumber, appendPath(path, word)); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. ret = true; if (impl(dict, suffix, appendPath(path, word), true)) allowDigit = false; } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { cb(phoneNumber, appendPath(path, suffix[0 .. 1])); ret = true; } else { if (impl(dict, suffix[1 .. $], appendPath(path, suffix[0 .. 1]), false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, buffer[0 .. 0], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R)(R input, Trie dict, MatchCallback cb) if (isInputRange!R & is(ElementType!R : const(char)[])) { foreach (line; input) { findMatches(dict, line, cb); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (phone, match) { app.put(format("%s: %s", phone, match)); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Mai Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/7-3576-", "/8-", "1556/0", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (phone, match) { app.put(format("%s: %s", phone, match)); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15" * matches "ja", thus excluding a digit in that position. */ ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (phone, match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (phone, match) { writefln("%s: %s", phone, match); }); } return 0; } -----------------------------------snip------------------------------------ T -- People tell me that I'm paranoid, but they're just out to get me.
Jan 16
parent reply Renato <renato athaydes.com> writes:
On Tuesday, 16 January 2024 at 23:43:16 UTC, H. S. Teoh wrote:
 (I'm especially curious to see if this beats the Rust version. 
 :-P)
That's just irrelevant when you're using a completely different algorithm. My conclusion in the study was exactly that: much more important than the language is the algorithm. The language does matter, but much, much less. If you want to check your performance, you know you can run the `./benchmark.sh` yourself?
Jan 16
parent reply evilrat <evilrat666 gmail.com> writes:
On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:
 If you want to check your performance, you know you can run the 
 `./benchmark.sh` yourself?
Out of curiosity I've tried to manually run this on Windows and it seems that Java generator for these numbers files is "broken", the resulting count or print runs fine for both Java and D versions provided in your D branch, but fails with generated files. D version complains about bad utf8 encoding. I've opened the generated file in text editor and it is UTF-16 (little-endian with BOM). Tried with adoptium jdk 17 and 21 (former openjdk), but I guess it doesn't matter since UTF-16 is default on Windows.
Jan 17
parent reply Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 09:15:02 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:
 If you want to check your performance, you know you can run 
 the `./benchmark.sh` yourself?
Out of curiosity I've tried to manually run this on Windows and it seems that Java generator for these numbers files is "broken", the resulting count or print runs fine for both Java and D versions provided in your D branch, but fails with generated files. D version complains about bad utf8 encoding. I've opened the generated file in text editor and it is UTF-16 (little-endian with BOM). Tried with adoptium jdk 17 and 21 (former openjdk), but I guess it doesn't matter since UTF-16 is default on Windows.
It's not Java writing the file, it's the bash script [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31): ``` java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt ``` Java is just printing to stdout. I wasn't aware that piping like this would use the OS default encoding. Unfortunately I don't have Windows here, but try to change the encoding used by the bash script maybe?
Jan 17
parent reply Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```
Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
Jan 17
parent reply evilrat <evilrat666 gmail.com> writes:
On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:
 On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```
Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
I've used powershell env var to set output to utf8, D version now works but java doesn't. ``` java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10 at Trie.completeSolution(Main.java:216) at Trie.forEachSolution(Main.java:192) at PhoneNumberEncoder.encode(Main.java:132) at Main.lambda$main$1(Main.java:38) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at Main.main(Main.java:38) ```
Jan 17
parent reply Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 10:50:26 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:
 On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```
Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
I've used powershell env var to set output to utf8, D version now works but java doesn't. ``` java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10 at Trie.completeSolution(Main.java:216) at Trie.forEachSolution(Main.java:192) at PhoneNumberEncoder.encode(Main.java:132) at Main.lambda$main$1(Main.java:38) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at Main.main(Main.java:38) ```
This is this line: ``` var digit = chars[ index ] - 48; ``` That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.
Jan 17
parent reply evilrat <evilrat666 gmail.com> writes:
On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:
 That means the input file is still not ASCII (or UTF-8) as it 
 should. Java is reading files with the ASCII encoding so it 
 should've worked fine.
It seems that it is only works with ASCII encoding though.
Jan 17
parent Renato <renato athaydes.com> writes:
On Wednesday, 17 January 2024 at 11:56:19 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:
 That means the input file is still not ASCII (or UTF-8) as it 
 should. Java is reading files with the ASCII encoding so it 
 should've worked fine.
It seems that it is only works with ASCII encoding though.
Yes, that's according to the rules - only ASCII for everything.
Jan 17
prev sibling parent reply Renato <renato athaydes.com> writes:
On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka 
wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 For the record (I already posted this on GitHub)... here's [my 
 current fastest 
 solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s
c/d/src/dencoder.d) time using the same algorithm as Rust ...
[...]
 ... what I am really curious about is what the code I wrote is 
 doing wrong that causes it to run 4x slower than Rust despite 
 doing "the same thing"...
It's a GC allocations fest. Things like this make it slow: ```diff { - string digit = [digits[0]]; + string digit = digits[0 .. 1]; words.insertBack(digit); ```
I was under the impression that `[digits[0]]` would just use a stack allocation?? The profiler does not show any GC anymore, are you sure it's a "GC allocations fest"???
 And at the top is the associative array lookup (when profiling 
 the handling of the "phones_1_000_000_with_empty.txt" input 
 file):

 ```
     36.85%  dencoder  dencoder              [.] _aaInX
     12.38%  dencoder  dencoder              [.] void
Well, I know, but I did everything to make the hash "cheap" to compute so I still don't see a way to improve it.
Jan 16
parent evilrat <evilrat666 gmail.com> writes:
On Wednesday, 17 January 2024 at 07:06:25 UTC, Renato wrote:
 On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka 
 wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:

 It's a GC allocations fest. Things like this make it slow:

 ```diff
      {
 -        string digit = [digits[0]];
 +        string digit = digits[0 .. 1];
          words.insertBack(digit);
 ```
I was under the impression that `[digits[0]]` would just use a stack allocation?? The profiler does not show any GC anymore, are you sure it's a "GC allocations fest"???
nah, you are allocating new array out of single digit while the latter is just takes a slice. there is 'scope' storage specifier for when you know your variable won't escape the scope to place it on stack (works for classes too), but I'm not sure if it will work with array. `scope string digit = [digits[0]];`
Jan 17
prev sibling next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Jan 16, 2024 at 07:50:35AM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 Unfortunately there seems to be some discrepancy between the output I
 got and the prescribed output in your repository. For example, in your
 output the number 1556/0 does not have an encoding, but isn't "1 Mai 0"
 a valid encoding according to your dictionary and the original problem
 description?
[...] Also, found a bug in my program that misses some solutions when the phone number has trailing non-digits. Here's the updated code. It still finds extra encodings from the output in your repo, though. Maybe I misunderstood part of the requirements? ------------------------------snip-------------------------------------- /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. if (impl(dict, suffix, path ~ word, true)) { allowDigit = false; ret = true; } } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/8-", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ------------------------------snip-------------------------------------- T -- "You are a very disagreeable person." "NO."
Jan 16
prev sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
P.S. Compiling my program with `ldc -O2`, it runs so fast that I
couldn't measure any meaningful running time that's greater than startup
overhead.  So I wrote a helper program to generate random phone numbers
up to 50 characters long, and found that it could encode 1 million phone
numbers in 2.2 seconds (using the 75,000 entry dictionary from your
repository).  Counting vs. printing the results made no significant
difference to this.


T

-- 
People tell me that I'm skeptical, but I don't believe them.
Jan 16