digitalmars.D.learn - Help optimize D solution to phone encoding problem: extremely slow
- Renato (121/121) Jan 13 I like to use a phone encoding problem to determine the
- Renato (39/41) Jan 13 I profiled it on Linux and stored [the trace.log
- Anonymouse (8/11) Jan 13 As a drive-by suggestion and I hope it doesn't derail anything,
- Renato (7/18) Jan 13 Thanks for the suggestion, this looks promising as I do have a
- Sergey (19/27) Jan 13 In the repo is hard to find the proper version.
- Anonymouse (6/9) Jan 14 I would strongly argue for profiling first instead of optimising
- Renato (74/112) Jan 14 Can you be more specific about which part of the Rust
- Jordan Wilson (24/30) Jan 14 Hello Renato,
- Renato (292/324) Jan 14 The characteristics of the dictionary impact the number of
- Sergey (5/10) Jan 14 I've added port from Rust in the PR comment. Can you please check
- Renato (8/18) Jan 15 As discussed on GitHub, the line-by-line port of the Rust code is
- H. S. Teoh (346/370) Jan 16 This problem piqued my interest, so yesterday and today I worked on it
- Siarhei Siamashka (26/31) Jan 16 You are not allowed to emit "1" as the first token in the output
- Renato (18/51) Jan 16 Exactly, this is one of the things that make this problem a bit
- H. S. Teoh (68/93) Jan 16 Ohhh now I get it. Initially I misunderstood that as saying that if the
- H. S. Teoh (359/361) Jan 16 [...]
- Renato (76/81) Jan 16 Great, I ran the benchmarks for you :)
- Siarhei Siamashka (5/8) Jan 16 That's dynamic programming with memoization. Basically caching
- H. S. Teoh (13/48) Jan 16 [...]
- Renato (9/11) Jan 16 That will make no difference for the `count` option which is
- H. S. Teoh (11/26) Jan 17 Of course it will. Passing the data directly to the callback that bumps
- H. S. Teoh (14/18) Jan 17 I can't. I spent half an hour trying to get ./benchmark.sh to run, but
- Renato (10/27) Jan 17 I've just updated the Rust version to the benchmark monitor could
- H. S. Teoh (7/8) Jan 17 [...]
- Siarhei Siamashka (66/72) Jan 16 It's a GC allocations fest. Things like this make it slow:
- H. S. Teoh (435/441) Jan 16 Indeed.
- Renato (8/10) Jan 16 That's just irrelevant when you're using a completely different
- Renato (8/29) Jan 16 I was under the impression that `[digits[0]]` would just use a
- evilrat (7/23) Jan 17 nah, you are allocating new array out of single digit while the
- H. S. Teoh (358/363) Jan 16 [...]
- H. S. Teoh (10/10) Jan 16 P.S. Compiling my program with `ldc -O2`, it runs so fast that I
I like to use a phone encoding problem to determine the strenghtness and weaknesses of programming languages because this problem is easy enough I can write solutions in any language in a few hours, but complex enough to exercise lots of interesting parts of the language. You can check [my initial blog post](https://renato.athaydes.com/posts/revisiting-prechelt-paper- omparing-languages) about this, which was an analysis or the original study by Prechelt about the productivity differences between programmers using different languages. This original problem had a flaw when used in modern computers as the programs can find solutions so quickly that most of the time in the benchmarks was being used to actually print solutions to stdout, so I modified the problem so that there's an option to either print the solutions, or just count the number of solutions - so the programs still need to do all the work, but are not required to print anything other than how many solutions were found. Anyway, I ported the Common Lisp solution to D because, like CL, D has built-in data structures like associative arrays and `BigInt` (at least it's in the stdlib)... I thought this would actually give D an edge! But it turns out D performs very badly for larger input sets. It starts quite well on smaller inputs, but scales very poorly. My initial Rust solution also performed very poorly, so I'm afraid the same is happening with my initial D solution, despite my best effort to write something "decent". You can find my D solution [in this Pull Request](https://github.com/renatoathaydes/prechelt-phone-number-encoding/pull/16). The solution is almost identical, to the extent possible, to the Common Lisp solution, which can be [found here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/lisp/main.lisp). The secret to high performance for the algorithm being used is having a very efficient `BigInt` implementation, and fast hashing function for the hash table (associative array in D, with `BigInt` as keys). Hence, I suppose D's hash function or `BigInt` are not that fast (Rust's default hash is also not great due to security concerns, but it's very easy to use a custom one which is much faster by changing a single line of code). Anyway, here's the current relative performance (the other languages are pretty heavily optimised, the Java solution uses a different algorithm so it's not directly comparable, [the Rust solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_ ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but instead of `BigInt`, it uses a `Vec<u8>` as key as that turned out to be faster - I may try that technique in D as well - but notice that even using Rust's slower BigInt, it was orders of magnitude faster than D). ``` Benchmarking... Proc,Run,Memory(bytes),Time(ms) ===> java -Xms20M -Xmx100M -cp build/java Main java-Main,109895680,377 java-Main,179634176,1025 java-Main,167149568,1621 java-Main,180203520,2493 java-Main,96481280,6112 java-Main,95526912,7989 ===> sbcl --script src/lisp/main.fasl sbcl,31780864,74 sbcl,79437824,888 sbcl,79613952,3991 sbcl,80654336,7622 sbcl,80420864,18623 sbcl,83402752,29503 ===> ./rust ./rust,23257088,58 ./rust,23437312,260 ./rust,23433216,1077 ./rust,23416832,2017 ./rust,7106560,6757 ./rust,7110656,10165 ===> src/d/dencoder src/d/dencoder,38748160,223 src/d/dencoder,75800576,3154 src/d/dencoder,75788288,14905 src/d/dencoder,75751424,30271 ``` I had to abort D as it was taking too long. The above is with the `print` option (that's normally slow when stdout is not buffered, but I did buffer D's writes and it's still very slow). With the `count` option, which does not print anything except a number at the end, D took so long I couldn't even collect its results (I waited several minutes)... the other languages finished in much less than a minute: ``` Benchmarking... Proc,Run,Memory(bytes),Time(ms) ===> java -Xms20M -Xmx100M -cp build/java Main java-Main,124112896,7883 java-Main,107487232,9273 ===> sbcl --script src/lisp/main.fasl sbcl,82669568,25638 sbcl,83759104,33501 ===> ./rust ./rust,7061504,9488 ./rust,7127040,11441 ===> src/d/dencoder ^C (ldc-1.36.0) ``` I tried to profile the D code but the profiler seems to be broken on my OS (Mac): ``` ▶ dub build -b profile Starting Performing "profile" build using /Users/renato/dlang/ldc-1.36.0/bin/ldc2 for x86_64. Building prechelt ~master: building configuration [application] Linking dencoder (ldc-1.36.0) prechelt-phone-number-encoding/src/d dlang ✗ 9m ◒ ▶ cd ../.. (ldc-1.36.0) programming/projects/prechelt-phone-number-encoding dlang ✗ 9m ◒ ▶ src/d/dencoder [1] 17877 illegal hardware instruction src/d/dencoder (ldc-1.36.0) ``` It's a bit hard to do any optmisation while "blind". So, I decided to post it here to collect some hints about what to try and whether I hit some common performance bottlenecks in my current solution. Would appreciate anything you can suggest, as long as you respect the fact that I'm still learning D so you can't expect me to know everything about it.
Jan 13
On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:I tried to profile the D code but the profiler seems to be broken on my OS (Mac):I profiled it on Linux and stored [the trace.log file](https://gist.github.com/renatoathaydes/fd8752ed81b0cf792ed7ef5aa3d68acd) on a public Gist. I tried using the HTML viewer [recommended in the wiki](https://wiki.dlang.org/Development_tools) but that doesn't work... first try: ``` Running ../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer Corrupt trace.log (can't compute ticks per second), please re-profile and try again ``` Second try with a longer trace (the one I saved in the gist): ``` Running ../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer std.conv.ConvException /home/renato/dlang/ldc-1.36.0/bin/../impo t/std/conv.d(2533): Unexpected '-' when converting from type char[] to type ulong ---------------- ??:? [0x55c4be9bc99e] ??:? [0x55c4be9bc612] ??:? [0x55c4be9de81e] ??:? [0x55c4be9c4fbf] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2533 [0x55c4be98ba2f] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2002 [0x55c4be98b6fc] /home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:210 [0x55c4be9665ec] ../../../.dub/packages/d-profile-viewer/~master/d-profile-view r/source/app.d:1095 [0x55c4be965e21] ../../../.dub/packages/d-profile-viewer/~master/d-profile-view r/source/app.d:1138 [0x55c4be96698a] ??:? [0x55c4be9c4c9c] ``` Not a great profiling experience :). Anyone has a better suggestion to "parse" the trace file?
Jan 13
On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:[...] Not a great profiling experience :). Anyone has a better suggestion to "parse" the trace file?As a drive-by suggestion and I hope it doesn't derail anything, but if you have the opportunity to run it on linux, have you tried profiling with callgrind instead, with {Q,K}Cachegrind to visualise things? Your repositories probably have them. (callgrind is a part of valgrind.) The wiki only mentions callgrind in passing, but it has worked well for me. [(example)](https://i.imgur.com/WWZAwy3.png)
Jan 13
On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:Thanks for the suggestion, this looks promising as I do have a Linux laptop (just not my main one). I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.[...] Not a great profiling experience :). Anyone has a better suggestion to "parse" the trace file?As a drive-by suggestion and I hope it doesn't derail anything, but if you have the opportunity to run it on linux, have you tried profiling with callgrind instead, with {Q,K}Cachegrind to visualise things? Your repositories probably have them. (callgrind is a part of valgrind.) The wiki only mentions callgrind in passing, but it has worked well for me. [(example)](https://i.imgur.com/WWZAwy3.png)
Jan 13
On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:In the repo is hard to find the proper version. I've checked the Rust from master branch and it looks a bit different from D implementation.. I would suggest to rewrite in the same way as Rust implemented. Probably you would like to try: * do not use BigInt from std. It could be quite slow. Try to use GMP library from Dub instead * don't do "idup" every time * instead of byLine, try byLineCopy * instead of "arr ~= data" try to use Appender (https://dlang.org/library/std/array/appender.html) * also you could try to use splitter (https://dlang.org/library/std/algorithm/iteration/splitter.html) to lazily process each part of the data * isLastDigit function has many checks, but I think it could be implemented easier in a Rust way * also consider to use functions from Range (filter, map) as you use it in Rust, instead of using for loopsOn Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.[...]
Jan 13
On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:I would suggest to rewrite in the same way as Rust implemented. Probably you would like to try: [...]I would strongly argue for profiling first instead of optimising based on conjecture. If you profile you have solid evidence on what is actually slow. If you're very good at analysing D, well-educated hypotheses *may* be enough, until they suddenly aren't and you will have spent a lot of time on the wrong problem.
Jan 14
On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:I explicitly linked to the Rust implementation in my question:On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:In the repo is hard to find the proper version. I've checked the Rust from master branch and it looks a bit different from D implementation..On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:I will have to try it... I thought that `BigInt` was to blame for the slowness (from what I could read from the trace logs), but after replacing that with basically a byte array key (see [commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649 e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp and very, very far from Java and Rust.[...]the [Rust solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_ ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but instead of BigInt, it uses a Vec<u8> as keyCan you be more specific about which part of the Rust implementation is not roughly equivalent to the D implementation?? There are obvious differences in style and syntax, but I hope that it's mostly equivalent algorithm-wise. But to clarify: the branch where the fastest solutions are is called `fastest-implementations-print-or-count`. The D branches with my alternative implementations are all called `dlang-*`.I would suggest to rewrite in the same way as Rust implemented. Probably you would like to try: * do not use BigInt from std. It could be quite slow. Try to use GMP library from Dub insteadRight, but as I posted above, even using a byte-array just like in Rust, the performance was still very bad (but around 2x faster than using BigInt). Also, using GMP is always suggested to me, but I can't accept that because my goal is not to use C libraries to achieve the fastest speed (which I could do by using C or FFI in any other language), it's to use D (and the other languages) to solve my problem in an idiomatic way. I [ended up using `Int128`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/98dcbcf1c77d1ded3406c 03af9026e126df5b2d) (the problem description requires more than i64, but int128 is enough), which grealy improved performance over my D baseline AND on the byte-array solution.* don't do "idup" every time * instead of byLine, try byLineCopy`idup` is necessary because the strings are stored in a Map after the iteration ends. Anyway, I replaced that with `byLineCopy` and it became sligthtly slower.* instead of "arr ~= data" try to use Appender (https://dlang.org/library/std/array/appender.html)I was worried about `~=` allocating too much, though IIUC it shouldn't be a problem the way I used it... in any case, I [replaced it with `Appender`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/f1364d0897f1f37882f1d 9b92c16b84b1abdc31) and the performance was completely unchanged - which is good as I think it means I used `~=` correctly to avoid overallocation (or I messed up using `Appender` :D - pls have a look).* also you could try to use splitter (https://dlang.org/library/std/algorithm/iteration/splitter.html) to lazily process each part of the dataThis is not necessary because that would only affect the time spent loading the dictionary, which is the faster part of the problem... nearly all of the time is spent looking for solutions instead.* isLastDigit function has many checks, but I think it could be implemented easier in a Rust wayThe Rust solution uses sum types for that, but that had a very minor effect on the overall performance (though in Rust, that's "cleaner" to use anyway)... I know D does have SumType in the stdlib, but I thought it is not as "zero cost" as in Rust, and because both variants wrap a String, it's awkward to use that... so I instead tried using a struct like this: ```d struct WordOrDigit { string value; bool isDigit; } ``` You can check [the commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0f6b4ce83373c46b14b5bb40c53bb0e2d0905e66). This change made the code slightly slower.* also consider to use functions from Range (filter, map) as you use it in Rust, instead of using for loopsWhy would for loops be slower? Or you're just saying I should make the D code nicer? Anyway, thanks for the suggestions! On Sunday, 14 January 2024 at 08:33:24 UTC, Anonymouse wrote:On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:Totally agree. I will spend some time on my Linux machine to see if I can get more useful profiling data. For now, with the int128 change from BigInt, the code is about 4x faster than before, but on larger problems, it's still scaling very badly compared to other languages (this suggests there's still some "exponential growth" going on, which should not happen as the problem is not exponential). ``` Proc,Memory(bytes),Time(ms) ===> ./rust ./rust,23252992,59 ./rust,23420928,263 ./rust,23412736,1025 ./rust,7069696,8661 ===> src/d/dencoder src/d/dencoder,11268096,72 src/d/dencoder,23781376,938 src/d/dencoder,23818240,4443 src/d/dencoder,10723328,134703 ``` On the bright side: D is using almost as little memory as Rust, which is orders of magnitude better than the other, usual GC languages.I would suggest to rewrite in the same way as Rust implemented. Probably you would like to try: [...]I would strongly argue for profiling first instead of optimising based on conjecture. If you profile you have solid evidence on what is actually slow. If you're very good at analysing D, well-educated hypotheses *may* be enough, until they suddenly aren't and you will have spent a lot of time on the wrong problem.
Jan 14
On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:I like to use a phone encoding problem to determine the strenghtness and weaknesses of programming languages because this problem is easy enough I can write solutions in any language in a few hours, but complex enough to exercise lots of interesting parts of the language. [...]Hello Renato, This seems to be quite a lot of calls: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 19204964 3761 3756 0 pure nothrow ref trusted immutable(char)[][] core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][], immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][], ulong) 19204924 8957 3474 0 safe void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][]) ``` This is when using the `words-quarter.txt` input (the `dictionary.txt` input seems to finish much faster, although still slower than `java`/`rust`). I also used only 100 phone numbers as input. My final observation is that `words-quarter.txt` contains some 1-letter inputs, (for example, `i` or `m`)...this may result in a large number of encoding permutations, which may explain the high number of recursion calls? Jordan
Jan 14
On Sunday, 14 January 2024 at 10:02:58 UTC, Jordan Wilson wrote:On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:The characteristics of the dictionary impact the number of solutions greatly. I explored this in my blog post about [Common Lisp, Part 2](https://renato.athaydes.com/posts/revenge_of_lisp-part-2). The fact there's a ridiculous amount of function calls is why this problem can take minutes even without having to allocate much memory or print anything. ** I've managed to improve the D code enough that it is now faster than Common Lisp and the equivalent algorithm in Java.** It took some profiling to do that, though... thanks to Anonymouse for the suggestion to use Valgrind... with that, I was able to profile the code nicely (Valgrind works nicely with D, it doesn't even show mangled names!). Here's what I did. First: the solution using int128 was much faster than the BigInt solution, as I had already mentioned. But when profiling that, it was clear that for the problems with a very large number of solutions, GC became a problem: ``` -------------------------------------------------------------------------------- 23,044,096,944 (100.0%) PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 7,079,878,197 (30.72%) ???:core.internal.gc.impl.conservative.gc.Gcx.mark!(false, true, true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRang !(false).ScanRange) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 2,375,100,857 (10.31%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][])'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,971,210,820 ( 8.55%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,922,961,924 ( 8.34%) ???:_d_arraysetlengthT [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,298,747,622 ( 5.64%) ???:core.int128.mul(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 1,134,644,706 ( 4.92%) ???:core.internal.gc.bits.GCBits.setLocked(ulong) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 849,587,834 ( 3.69%) ???:core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref ulong, uint, const(TypeInfo)) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 827,407,988 ( 3.59%) ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcp _avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6] 688,845,027 ( 2.99%) ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6] 575,415,884 ( 2.50%) ???:_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZ QDd6memory8BlkInfo_ [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 562,146,592 ( 2.44%) ???:core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] 526,067,586 ( 2.28%) ???:core.internal.spinlock.SpinLock.lock() shared [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128] ``` So, I [decided to use `std.container.Array`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/b927bc9cc4e33f9f6ba45 d8abc3bccbc17c7f1e) (I had tried Appender before but that didn't really work) to make sure no allocation was happening when adding/removing words from the `words` which are carried through the `printTranslations` calls (that's the hot path). As you can see in the commit, the code is just slightly less convenient... this made the code run considerably faster for the very long runs! Here's the Callgrind profiling data AFTER this change: ``` 6,547,316,944 (32.42%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array)'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 5,229,076,596 (25.90%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 3,871,644,402 (19.17%) ???:core.int128.mul(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 677,533,800 ( 3.36%) ???:std.int128.Int128.toHash() const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 543,388,688 ( 2.69%) ???:std.int128.Int128.this(core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 543,388,688 ( 2.69%) ???:std.int128.Int128.this(long, long) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 542,027,045 ( 2.68%) ???:object.TypeInfo_Struct.getHash(scope const(void*)) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 424,849,937 ( 2.10%) ???:object.TypeInfo_Struct.equals(in void, in void) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] ``` No more GC! It seems that now, as expected, most time is spent calculating hashes for each possible solution (as it should be). I am not entirely sure what the `_aaInX` call is, but I believe that's the associative array's membership check? The D profiler seems to show similar data: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 104001600 158 158 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("*").opBinary(std.int128.Int128) 402933936 63 63 0 const pure nothrow property nogc safe bool std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized() 183744166 87 59 0 inout pure nothrow ref property nogc return safe inout(std.container.array.Array!(immutable(char)[]).Array.Payload) std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.refCountedPayload() 103784880 68 38 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+", int).opBinary(const(int)) 103784880 99 31 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("+", int).opOpAssign(int) 104001600 30 30 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+").opBinary(std.int128.Int128) 61167377 67 29 0 pure nothrow nogc void std.container.array.Array!(immutable(char)[]).Array.__fieldDtor() 43444575 61 27 0 pure nothrow nogc safe void core.internal.lifetime.emplaceRef!(immutable(char)[], immutable(char)[], immutable(char)[]).emplaceRef(ref immutable(char)[], ref immutable(char)[]) 48427530 79 27 0 const pure nothrow property nogc safe bool std.container.array.Array!(immutable(char)[]).Array.empty() 61167377 38 27 0 pure nothrow nogc void std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.__dtor() 43444575 130 24 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 61167376 32 23 0 pure nothrow nogc safe void std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.__postblit() ``` Anyway, after this change, the code now scaled much better. But there's nothing much lest to be optimised... except the arithmetics (notice how the `("*").opBinary` call is near the top. I know how to make that faster using a similar strategy that I used in Rust (you can check my [journey to optimise the Rust code on my blog post](https://renato.athaydes.com/posts/how-to-write-fast-rust-code) about that - specifically, check the `Using packed bytes for more efficient storage` section)... knowing how the encoding works, I knew that multiplication is not really needed. So, instead of doing this: ```d n *= 10; n += c - '0'; ``` I could do this: ```d n <<= 4; n += c; ``` This changes the actual number, but that doesn't matter: the code is still unique for each number, which is all that we need. You can see [the full commit here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c). This improved the performance for sure, even if not by much. The profiling data after this arithmetic trick looks like this: Valgrind: ``` 4,821,217,799 (35.75%) ???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array)'2 [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 4,241,404,850 (31.45%) ???:_aaInX [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 1,038,733,782 ( 7.70%) ???:core.int128.shl(core.int128.Cent, uint) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 575,372,270 ( 4.27%) ???:std.int128.Int128.toHash() const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 461,659,456 ( 3.42%) ???:std.int128.Int128.this(core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 460,297,816 ( 3.41%) ???:object.TypeInfo_Struct.getHash(scope const(void*)) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 347,198,989 ( 2.57%) ???:object.TypeInfo_Struct.equals(in void, in void) const [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] 288,537,160 ( 2.14%) ???:core.int128.add(core.int128.Cent, core.int128.Cent) [/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder] ``` D Profiler: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 29879388 44037 11267 0 void dencoder.printTranslations(immutable(char)[][][std.int128.Int128], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array) 126827950 4306 3599 0 inout pure nothrow ref property nogc return safe inout(std.container.array.Array!(immutable(char)[]).Array.Payload) std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.refCountedPayload() 29879268 7293 3100 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 33534720 4896 2780 0 const pure nothrow property nogc safe bool std.container.array.Array!(immutable(char)[]).Array.empty() 29879268 11834 2479 0 pure nothrow nogc ulong std.container.array.Array!(immutable(char)[]).Array.insertBack!(immutable(char)[]).insertBack(immutable(char)[]) 29879268 8702 2279 0 pure nothrow nogc safe void std.container.array.Array!(immutable(char)[]).Array.removeBack() 282919518 2045 2045 0 const pure nothrow property nogc safe bool std.typecons.RefCounted!(std.container.array.Array!(immutable(cha )[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized() 29879268 2534 1859 0 pure nothrow nogc safe void core.internal.lifetime.emplaceRef!(immutable(char)[], immutable(char)[], immutable(char)[]).emplaceRef(ref immutable(char)[], ref immutable(char)[]) 44511077 3264 1505 0 pure nothrow nogc void std.container.array.Array!(immutable(char)[]).Array.__fieldDtor() 44511076 2588 1254 0 pure nothrow nogc scope void std.container.array.Array!(immutable(char)[]).Array.__fieldPostblit() 49758574 1684 1243 0 const pure nothrow nogc safe std.int128.Int128 std.int128.Int128.opBinary!("+", char).opBinary(const(char)) 49758574 2907 1223 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("+", immutable(char)).opOpAssign(ref immutable(char)) 49975294 1796 1208 0 pure nothrow ref nogc safe std.int128.Int128 std.int128.Int128.opOpAssign!("<<", int).opOpAssign(int) ``` As far as I can tell, the only two bottlenecks now bacame: * `std.typecons.RefCounted!(Array.Payload, 0).RefCounted.refCountedPayload()` * the `Int128` implementation! To fix the former, I would need to implement my own `Array`, I suppose... using ref-counting here seems unnecessary?? But that is a bit beyond what I am willing to do! The latter could probably be fixed by not using a numeric key at all, just bytes - but my previous attempt at doing that didn't really give better results. Or perhaps by overriding the hashing for Int128 (I didn't try that because I assume the default should be pretty optimal already)?? So, I ran out of options and am going to call it done. [Here's my final D solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d). The code is still readable (I think everyone can agree D is one of the most readable languages of all)! You can see how it performs in the last comparison run I did ([all data, including a nice chart, in this gist](https://gist.github.com/renatoathaydes/ab5c86b0ea59152693a7236c333ac334)): ``` Proc,Run,Memory(bytes),Time(ms) ===> java -Xms20M -Xmx100M -cp build/java Main java-Main,3190730752,441 java-Main,3194789888,1440 java-Main,3193692160,2316 java-Main,3194720256,3604 java-Main,3192963072,12861 java-Main,3261128704,31282 ===> sbcl --script src/lisp/main.fasl sbcl,1275994112,83 sbcl,1275998208,1396 sbcl,1275998208,5925 sbcl,1275998208,9752 sbcl,1275998208,64811 sbcl,1276006400,153799 ===> ./rust ./rust,23138304,56 ./rust,23138304,234 ./rust,23138304,1288 ./rust,23138304,2444 ./rust,9027584,15867 ./rust,9027584,36985 ===> src/d/dencoder src/d/dencoder,219041792,67 src/d/dencoder,229904384,1114 src/d/dencoder,229904384,4421 src/d/dencoder,229904384,9087 src/d/dencoder,219041792,51315 src/d/dencoder,219041792,120818 ``` Using `Int128` as a key instead of `BigInt` made the code much faster (around 4x faster). Using `Array` to avoid too many reallocations with primitive dynamic arrays made the code run much faster for very large numbers of calls (very little difference when running for less than 10s, but at least 2x faster at round 1min runs where the tiny overhead adds up). As you can see [in the chart I posted](https://gist.github.com/renatoathaydes/ab5c86b0ea5915 693a7236c333ac334), D's speed performance is close to that of Common Lisp for all runs, though between 10 and 20% faster. It's nowhere near Rust, unfortunately, with Rust being almost 4x faster (pls ignore the Java solution as that's using a Trie instead of the "numeric" solution - so it's not a fair comparison). I had expected to get very close to Rust, but that didn't happen... I just can't see in the profiling data what's causing D to fall so far behind! On the memory data: D uses much less memory than Java and Common Lisp, but still a lot higher than Rust. If anyone can find any flaw in my methodology or optmise my code so that it can still get a couple of times faster, approaching Rust's performance, I would greatly appreciate that! But for now, my understanding is that the most promising way to get there would be to write D in `betterC` style?!I like to use a phone encoding problem to determine the strenghtness and weaknesses of programming languages because this problem is easy enough I can write solutions in any language in a few hours, but complex enough to exercise lots of interesting parts of the language. [...]Hello Renato, This seems to be quite a lot of calls: ``` ======== Timer frequency unknown, Times are in Megaticks ======== Num Tree Func Per Calls Time Time Call 19204964 3761 3756 0 pure nothrow ref trusted immutable(char)[][] core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][], immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][], ulong) 19204924 8957 3474 0 safe void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], immutable(char)[][]) ``` This is when using the `words-quarter.txt` input (the `dictionary.txt` input seems to finish much faster, although still slower than `java`/`rust`). I also used only 100 phone numbers as input. My final observation is that `words-quarter.txt` contains some 1-letter inputs, (for example, `i` or `m`)...this may result in a large number of encoding permutations, which may explain the high number of recursion calls? Jordan
Jan 14
On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:If anyone can find any flaw in my methodology or optmise my code so that it can still get a couple of times faster, approaching Rust's performance, I would greatly appreciate that! But for now, my understanding is that the most promising way to get there would be to write D in `betterC` style?!I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
Jan 14
On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:As discussed on GitHub, the line-by-line port of the Rust code is 5x slower than [my latest solution using int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/sr /d/src/dencoder.d), which is itself 3 to 4x slower than the Rust implementation (at around the same order of magnitude as algorithm-equivalent Java and Common Lisp implementations, D is perhaps 15% faster). I did the best I could to make D run faster, but we hit a limit that's a bit hard to get past now. Happy to be given suggestions (see profiling information in previous messages), but I've run out of ideas myself.If anyone can find any flaw in my methodology or optmise my code so that it can still get a couple of times faster, approaching Rust's performance, I would greatly appreciate that! But for now, my understanding is that the most promising way to get there would be to write D in `betterC` style?!I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
Jan 15
On Mon, Jan 15, 2024 at 08:10:55PM +0000, Renato via Digitalmars-d-learn wrote:On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:This problem piqued my interest, so yesterday and today I worked on it and came up with my own solution (I did not look at existing solutions in order to prevent bias). I have not profiled it or anything, but the runtime seems quite promising. Here it is: ---------------------------------snip---------------------------------- /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. if (impl(dict, suffix, path ~ word, true)) { allowDigit = false; ret = true; } } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } impl(dict, phoneNumber[], [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ---------------------------------snip---------------------------------- The underlying idea is to store the dictionary entries in a trie (radix tree) keyed by the prescribed digit mapping. Once this is done, encoding a phone number is as simple as following each digit down the trie and enumerating words along the path. This means that once the dictionary is encoded, we no longer need to do any digit mappings; the digits of the input phone number is enough to determine the path down the trie. Of course, pursuant to the original specifications of the problem, the original word forms are stored as-is along each node corresponding with the end of the word, so once a match is found the program simply enumerates the words found at that node. // Unfortunately there seems to be some discrepancy between the output I got and the prescribed output in your repository. For example, in your output the number 1556/0 does not have an encoding, but isn't "1 Mai 0" a valid encoding according to your dictionary and the original problem description? T -- Engineering almost by definition should break guidelines. -- Ali ÇehreliOn Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:As discussed on GitHub, the line-by-line port of the Rust code is 5x slower than [my latest solution using int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d), which is itself 3 to 4x slower than the Rust implementation (at around the same order of magnitude as algorithm-equivalent Java and Common Lisp implementations, D is perhaps 15% faster). I did the best I could to make D run faster, but we hit a limit that's a bit hard to get past now. Happy to be given suggestions (see profiling information in previous messages), but I've run out of ideas myself.If anyone can find any flaw in my methodology or optmise my code so that it can still get a couple of times faster, approaching Rust's performance, I would greatly appreciate that! But for now, my understanding is that the most promising way to get there would be to write D in `betterC` style?!I've added port from Rust in the PR comment. Can you please check this solution? Most probably it need to be optimized with profiler. Just interesting how close-enough port will work.
Jan 16
On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:Unfortunately there seems to be some discrepancy between the output I got and the prescribed output in your repository. For example, in your output the number 1556/0 does not have an encoding, but isn't "1 Mai 0" a valid encoding according to your dictionary and the original problem description?You are not allowed to emit "1" as the first token in the output as long as there are any dictionary word matches at that position. The relevant paragraph from the problem statement: ==snip== Encodings of phone numbers can consist of a single word or of multiple words separated by spaces. The encodings are built word by word from left to right. If and only if at a particular point no word at all from the dictionary can be inserted, a single digit from the phone number can be copied to the encoding instead. Two subsequent digits are never allowed, though. To put it differently: In a partial encoding that currently covers k digits, digit k+1 is encoded by itself if and only if, first, digit k was not encoded by a digit and, second, there is no word in the dictionary that can be used in the encoding starting at digit k+1. ==snip== I also spent a bit of time trying to figure out this nuance when implementing my solution. It doesn't make much sense visually (no back-to-back digits in the output either way), but that's how it is.
Jan 16
On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka wrote:On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:Exactly, this is one of the things that make this problem a bit annoying to solve :) "H. S. Teoh" you implemented the solution as a Trie!! Nice, that's also what I did when I "participated" in the study. Here's [my Trie solution in Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java). These are basically the two common approaches to the problem: a Trie or a numeric-based table. According to the study, people who use scripting languages almost always go with the numeric approach, while people coming from lower level languages tend to use a data structure like Trie (if they don't know Trie, they come up with something similar which is fascinating), which is harder to implement but more efficient in general. Can I ask you why didn't you use the [D stdlib Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure that would've worked, but did you consider that?Unfortunately there seems to be some discrepancy between the output I got and the prescribed output in your repository. For example, in your output the number 1556/0 does not have an encoding, but isn't "1 Mai 0" a valid encoding according to your dictionary and the original problem description?You are not allowed to emit "1" as the first token in the output as long as there are any dictionary word matches at that position. The relevant paragraph from the problem statement: ==snip== Encodings of phone numbers can consist of a single word or of multiple words separated by spaces. The encodings are built word by word from left to right. If and only if at a particular point no word at all from the dictionary can be inserted, a single digit from the phone number can be copied to the encoding instead. Two subsequent digits are never allowed, though. To put it differently: In a partial encoding that currently covers k digits, digit k+1 is encoded by itself if and only if, first, digit k was not encoded by a digit and, second, there is no word in the dictionary that can be used in the encoding starting at digit k+1. ==snip== I also spent a bit of time trying to figure out this nuance when implementing my solution. It doesn't make much sense visually (no back-to-back digits in the output either way), but that's how it is.
Jan 16
On Tue, Jan 16, 2024 at 06:54:56PM +0000, Renato via Digitalmars-d-learn wrote:On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka wrote:[...]Ohhh now I get it. Initially I misunderstood that as saying that if the rest of the phone number has at least one match, then a digit is not allowed. Now I see that what it's actually saying is that even if some random dictionary word matches at that position, even if it does not lead to any full matches, then a digit is excluded. [...]You are not allowed to emit "1" as the first token in the output as long as there are any dictionary word matches at that position. The relevant paragraph from the problem statement:It's a strange requirement, for sure, but I don't think it's annoying. It makes the problem more Interesting(tm). ;-) Anyway, I've fixed the problem, now my program produces the exact same output as Renato's repo. Code is posted below. Interestingly enough, the running time has now halved to about 0.9 seconds for 1 million phone numbers. I guess that's caused by the more stringent requirement excluding many more matching possibilities, effectively pruning away large parts of the search tree.I also spent a bit of time trying to figure out this nuance when implementing my solution. It doesn't make much sense visually (no back-to-back digits in the output either way), but that's how it is.Exactly, this is one of the things that make this problem a bit annoying to solve :)"H. S. Teoh" you implemented the solution as a Trie!! Nice, that's also what I did when I "participated" in the study. Here's [my Trie solution in Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java). These are basically the two common approaches to the problem: a Trie or a numeric-based table. According to the study, people who use scripting languages almost always go with the numeric approach, while people coming from lower level languages tend to use a data structure like Trie (if they don't know Trie, they come up with something similar which is fascinating), which is harder to implement but more efficient in general.Interesting. I guess my C/C++ background is showing. ;-) I'm not sure what exactly motivated me to go this route; I guess it was just my default preference of choosing the path of least work as far as the algorithm is concerned: I chose the algorithm that did the least amount of work needed to produce the right answer. Scanning through sections of the dictionary to find a match was therefore excluded; so my first thought was an AA. But then how much of the initial prefix to look up an in AA? Since it can't be known beforehand, I'd have to gradually lengthen the prefix to search for, which does a lot of repetitive work (we keep looking up the first few digits repeatedly each time we search for a longer prefix). Plus, multiple consecutive AA lookups is not cache-friendly. So my next thought was, what should I do such that I don't have to look at the initial digits anymore once I already processed it? This line of thought naturally led to a trie structure. Once I arrived at a trie structure, the next question was how exactly dictionary entries would be stored in it. Again, in the vein of doing the least amount of work I could get away with, I thought, if I stored words in the trie directly, with each edge encoding a letter, then during the search I'd have to repeatedly convert letters to the corresponding phone number digit and vice versa. So why not do this conversion beforehand, and store only phone digits in the trie? This also had the additional benefit of letting me effectively search multiple letters simultaneously, since multiple letters map to the same digit, so scanning a digit is equivalent to searching multiple letters at the same time. The output, of course, required the original form of the words -- so the obvious solution was to attach the original words as a list of words attached to the trie node representing the end of that word. Once this was all decided, the only remaining question was the search algorithm. This turned out to take the most time in solving this problem, due to the recursive nature of the search, I had to grapple with where and how to make the recursive calls, and how to propagate return values correctly. The initial implementation only found word matches, and did not allow the single digits. Happily, the recursive algorithm turned out to have enough structure to encode the single digit requirements as well, although it took a bit of trial and error to find the correct implementation.Can I ask you why didn't you use the [D stdlib Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure that would've worked, but did you consider that?Haha, I didn't even think of that. :-D I wouldn't have wanted to use it anyway, because it was optimized for single codepoint lookups, and wouldn't have matched what I wanted to do. Besides, I also understood it as an internal helper structure used by std.uni, so it isn't really designed as a trie implementation for general use. Also, since performance was an issue in your OP, I wanted to avoid off-the-bat any hidden costs that may be present in a pre-existing trie implementation that I did not fully understand. Maybe I'm just affected with NIH syndrome, but IME, when performance is important you usually want to write a custom data structure for your inner loop, one that's fine-tuned to your specific algorithm. Otherwise the impedance mismatch may give you suboptimal results. T -- Knowledge is that area of ignorance that we arrange and classify. -- Ambrose Bierce
Jan 16
On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]Anyway, I've fixed the problem, now my program produces the exact same output as Renato's repo. Code is posted below.[...] Oops, forgot to actually paste the code. Here it is: ------------------------------------snip------------------------------------ /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. ret = true; if (impl(dict, suffix, path ~ word, true)) allowDigit = false; } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Mai Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/7-3576-", "/8-", "1556/0", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15" * matches "ja", thus excluding a digit in that position. */ ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ------------------------------------snip------------------------------------ T -- The early bird gets the worm. Moral: ewww...
Jan 16
On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]Great, I ran the benchmarks for you :) I had to change how you accept arguments, even though you did "the right thing" using `getopt`, the solutions should just take a `count` or `print` argument first... Anyway, here's your result: ``` ===> ./rust ./rust,24133632,25 ./rust,24739840,130 ./rust,24477696,536 ./rust,25247744,1064 ./rust,8175616,6148 ./rust,8306688,8315 ===> src/d/dencoder src/d/dencoder,46055424,43 src/d/dencoder,96337920,146 src/d/dencoder,102350848,542 src/d/dencoder,102268928,1032 src/d/dencoder,40206336,99936 ^C ``` It took too long with the `count` option, so I had to abort before the last run ended... there's probably some bug there, otherwise the Trie runs very fast, as I had expected. For the record (I already posted this on GitHub)... here's [my current fastest solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s c/d/src/dencoder.d) time using the same algorithm as Rust (after I fixed my solution that was using `int128`, which was not valid, as ssvb pointed out): ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23986176,24 ./rust,24346624,118 ./rust,24543232,525 ./rust,24346624,1027 ./rust,8011776,3830 ./rust,8486912,18403 ===> src/d/dencoder src/d/dencoder,13615104,30 src/d/dencoder,24723456,374 src/d/dencoder,24592384,1750 src/d/dencoder,24788992,3472 src/d/dencoder,11517952,10235 src/d/dencoder,11517952,51211 ``` Not great :( But ssvb came up with a really fast implementation, though totally different algorithm (so dont' take this as evidence of D being faster than Rust or anything like that): ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23986176,24 ./rust,24461312,135 ./rust,24494080,514 ./rust,24526848,981 ./rust,8175616,3647 ./rust,8011776,15404 ===> src/d/dencoder src/d/dencoder,16433152,30 src/d/dencoder,16613376,125 src/d/dencoder,16613376,468 src/d/dencoder,16613376,941 src/d/dencoder,5570560,12 src/d/dencoder,6701056,18 ``` I can't explain why it's so incredibly fast, specially for the `count` case. I tried using the same hashing function on my solution, but that didn't really help me! Pretty cool to see different solutions to the problem, but I'm still curious to know where the solution I wrote is being slow compared to Rust which is using identical, to my understanding, code! I'm sure people could write incredibly fast code to solve this problem, but what I am really curious about is what the code I wrote is doing wrong that causes it to run 4x slower than Rust despite doing "the same thing"...Anyway, I've fixed the problem, now my program produces the exact same output as Renato's repo. Code is posted below.[...]
Jan 16
On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:I can't explain why it's so incredibly fast, specially for the `count` case. I tried using the same hashing function on my solution, but that didn't really help me!That's dynamic programming with memoization. Basically caching the already calculated results (in the `dp` array) and avoiding recalculations when the recursive function revisits the same state again.
Jan 16
On Tue, Jan 16, 2024 at 09:15:19PM +0000, Renato via Digitalmars-d-learn wrote:On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:Oops, haha :-POn Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]Great, I ran the benchmarks for you :) I had to change how you accept arguments, even though you did "the right thing" using `getopt`, the solutions should just take a `count` or `print` argument first...Anyway, I've fixed the problem, now my program produces the exact same output as Renato's repo. Code is posted below.[...]Anyway, here's your result: ``` ===> ./rust ./rust,24133632,25 ./rust,24739840,130 ./rust,24477696,536 ./rust,25247744,1064 ./rust,8175616,6148 ./rust,8306688,8315 ===> src/d/dencoder src/d/dencoder,46055424,43 src/d/dencoder,96337920,146 src/d/dencoder,102350848,542 src/d/dencoder,102268928,1032 src/d/dencoder,40206336,99936 ^C ``` It took too long with the `count` option, so I had to abort before the last run ended... there's probably some bug there, otherwise the Trie runs very fast, as I had expected.[...] Do you have the problematic data file handy? I'd like to look into any potential bugs. Also, the profiler revealed that a lot of time was spent in the GC and in small allocations. The cause is in all likelihood the .format() call for each found match, and array append being used for the recursive calls. Getting rid of the .format ought to speed it up a bit. Will try that now... T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Jan 16
On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:used for the recursive calls. Getting rid of the .format ought to speed it up a bit. Will try that now...That will make no difference for the `count` option which is where your solution was very slow. To run the slow test manually use the `words_quarter.txt` dictionary (the phone numbers file doesn't matter much - it's all in the dictionary). But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.
Jan 16
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:Of course it will. Passing the data directly to the callback that bumps a counter is faster than allocating a new string, formatting the data, and then passing it to the callback that bumps a counter. It may not look like much, but avoiding unnecessary GC allocations means the GC will have less work to do later when a collection is run, thus you save time over the long term.used for the recursive calls. Getting rid of the .format ought to speed it up a bit. Will try that now...That will make no difference for the `count` option which is where your solution was very slow.To run the slow test manually use the `words_quarter.txt` dictionary (the phone numbers file doesn't matter much - it's all in the dictionary). But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.I'll push the code to github. T -- "No, John. I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
Jan 17
On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote: [...]But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked. Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this. Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark. T -- Today's society is one of specialization: as you grow, you learn more and more about less and less. Eventually, you know everything about nothing.
Jan 17
On Wednesday, 17 January 2024 at 16:30:08 UTC, H. S. Teoh wrote:On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote: [...]I've just updated the Rust version to the benchmark monitor could work on Linux (it only worked on Mac before) :D that's probably why your rustc didn't work, though as the project is still using edition2018 I would've thought even a very old compiler would have worked... anyway, if you ever find yourself actually using Rust, you should use `rustup` (https://rustup.rs/) which makes it trivial to update Rust. About the "count" option: I had assumed you didn't call format on the count option as it's never needed, there's nothing to print.But pls run the benchmarks yourself as I am not going to keep running it for you, and would be nice if you posted your solution on a Gist for example, pasting lots of code in the forum makes it difficult to follow.I can't. I spent half an hour trying to get ./benchmark.sh to run, but no matter what it could not compile benchmark_runner. It complains that my rustc is too old and some dependencies do not support it. I tried running the suggested cargo update command to pin the versions but none of them worked. Since I'm not a Rust user, I'm not feeling particularly motivated right now to spend any more time on this. Upgrading my rustc isn't really an option because that's the version currently in my distro and I really don't feel like spending more time to install a custom version of rustc just for this benchmark. T
Jan 17
On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]I'll push the code to github.[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T -- Why do conspiracy theories always come from the same people??
Jan 17
On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]Ok, last time I'm running this for someone else :D ``` Proc,Run,Memory(bytes),Time(ms) ===> ./rust ./rust,23920640,30 ./rust,24018944,147 ./rust,24068096,592 ./rust,24150016,1187 ./rust,7766016,4972 ./rust,8011776,46101 ===> src/d/dencoder src/d/dencoder,44154880,42 src/d/dencoder,51347456,87 src/d/dencoder,51380224,273 src/d/dencoder,51462144,441 src/d/dencoder,18644992,4414 src/d/dencoder,18710528,43548 ``` Congratulations on beating Rust :D but remember: you're using a much more efficient algorithm! I must conclude that the Rust translation of the Trie algorithm would be much faster still, unfortunately (you may have noticed that I am on D's side here!).I'll push the code to github.[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T
Jan 18
On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]BTW here's you main function so it can run on the benchmark: ```d int main(string[] args) { bool countOnly = args.length > 1 ? (() { final switch (args[1]) { case "count": return true; case "print": return false; } })() : false; auto dictfile = args.length > 2 ? args[2] : "tests/words.txt"; auto input = args.length > 3 ? args[3] : "tests/numbers.txt"; Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(File(input).byLine, dict, (phone, match) { count++; }); writefln("%d", count); } else { encodePhoneNumbers(File(input).byLine, dict, (phone, match) { writefln("%s: %s", phone, match); }); } return 0; } ```I'll push the code to github.[...] Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d T
Jan 18
On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:For the record (I already posted this on GitHub)... here's [my current fastest solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s c/d/src/dencoder.d) time using the same algorithm as Rust ...[...]... what I am really curious about is what the code I wrote is doing wrong that causes it to run 4x slower than Rust despite doing "the same thing"...It's a GC allocations fest. Things like this make it slow: ```diff { - string digit = [digits[0]]; + string digit = digits[0 .. 1]; words.insertBack(digit); ``` And at the top is the associative array lookup (when profiling the handling of the "phones_1_000_000_with_empty.txt" input file): ``` 36.85% dencoder dencoder [.] _aaInX 12.38% dencoder dencoder [.] void dencoder.printTranslations(immutable(char)[][][dencoder.Key], dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], std.container.array.Array!(immutable(char)[]).Array) 6.43% dencoder dencoder [.] nothrow nogc scope void core.internal.gc.impl.conservative.gc.Gcx.mark!(false, true, true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRange!(false).ScanRange) 4.53% dencoder dencoder [.] pure safe void std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult.popFront() 4.08% dencoder dencoder [.] _d_array_slice_copy 2.43% dencoder dencoder [.] _d_newarrayU 2.21% dencoder dencoder [.] shared nothrow nogc trusted void core.internal.spinlock.SpinLock.lock() 2.00% dencoder dencoder [.] pure safe void std.array.Appender!(immutable(char)[]).Appender.put!(dchar).put(dchar) 1.91% dencoder dencoder [.] nothrow void* core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref ulong, uint, const(TypeInfo)) 1.67% dencoder dencoder [.] _DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_ 1.66% dencoder dencoder [.] pure nothrow nogc scope trusted ulong core.internal.gc.bits.GCBits.setLocked(ulong) 1.60% dencoder dencoder [.] const pure nothrow trusted ulong object.TypeInfo_Struct.getHash(scope const(void*)) 1.53% dencoder dencoder [.] nothrow safe void core.internal.util.array.enforceRawArraysConformable(const(char[]), const(ulong), const(void[]), const(void[]), const(bool)) 1.49% dencoder dencoder [.] nothrow void* core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) 1.27% dencoder dencoder [.] pure nothrow safe void std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong) 0.81% dencoder dencoder [.] pure property safe dchar std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult.front() 0.79% dencoder dencoder [.] nothrow safe void core.internal.util.array._enforceNoOverlap(const(char[]), ulong, ulong, const(ulong)) 0.74% dencoder dencoder [.] nothrow void core.internal.gc.impl.conservative.gc.Pool.setBits(ulong, uint) 0.73% dencoder dencoder [.] pure safe void std.array.Appender!(immutable(char)[]).Appender.put!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult).put(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult) 0.70% dencoder dencoder [.] pure safe ulong std.range.primitives.walkLength!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult).walkLength(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch r)[][]).__lambda12, immutable(char)[]).FilterResult) 0.60% dencoder dencoder [.] bool dencoder.lastItemIsDigit(std.container.array.Array!(immutable(char)[]).Array) 0.54% dencoder dencoder [.] _d_newarrayT [...] ```
Jan 16
On Tue, Jan 16, 2024 at 10:15:04PM +0000, Siarhei Siamashka via Digitalmars-d-learn wrote:On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:[...]Indeed. I have just completed 2 rounds of optimizations of my version of the code, and both times the profiler also showed the problem to be excessive allocations in the inner loop. So, I did the following optimizations: 1) Get rid of .format in the inner loop. Not only does .format cause a lot of allocations, it is also a known performance hog. So instead of constructing the output string in the search function, I changed it to take a delegate instead, and the delegate either counts the result or prints it directly (bypassing the construction of an intermediate string). This improved performance quite a bit for the count-only runs, but also wins some performance even when output is generated. Overall, however, this optimization only gave me some minor savings. 2) Changed the `path` parameter from string[] to string, since I didn't really need it to be an array of strings anyway. This in itself only improved performance marginally, barely noticeable, but it led to (3), which gave a huge performance boost. 3) Basically, in the earlier version of the code, the `path` parameter was appended to every time I recursed, and furthermore the same initial segment gets appended to many times with different trailers as the algorithm walks the trie. As a result, this triggers a lot of array reallocations to store the new strings. Most of these allocations are unnecessary, because we already know that the initial segment of the string will stay constant, only the tail end changes. Furthermore, we only ever have a single instance of .path at any point in time in the algorithm. So we could use a single buffer to hold all of these instances of .path, and simply return slices to it as we go along, overwriting the tail end each time we need to append something. This significantly cut down on the number of allocations, and along with (1) and (2), performance improved by about 3x (!). It didn't completely remove all allocations, but I'm reasonably happy with the performance now that I probably won't try to optimize it more unless it's still losing out to another language. ;-) (I'm especially curious to see if this beats the Rust version. :-P) Optimized version of the code: -----------------------------------snip------------------------------------ /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } alias MatchCallback = void delegate(const(char)[] phone, const(char)[] match); /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(Trie dict, const(char)[] phoneNumber, MatchCallback cb) { /* * Optimization: use a common buffer for constructing the output string, * instead of using string append, which allocates many new strings. * * The `path` parameter passed to .impl is always a slice of .buffer, * either its current instance or a previous instance left over from a * buffer reallocation. As such, its contents are always an initial segment * of whatever's in .buffer, so appending is just a matter of writing to * the tail end of the buffer and returning a slice of the new buffer. * * The fact that .path higher up the call tree may be pointing to old * versions of .buffer is not a problem; contentwise they are always an * initial segment of the current .buffer, so any subsequent appends will * copy the new word to the right place in the current .buffer and return a * slice to it, and the contents will always be consistent. */ static char[] buffer; const(char)[] appendPath(const(char)[] path, const(char)[] word) { // Assumption: path is an initial segment of buffer (either its current // incarnation or a pre-reallocated initial copy). So we don't need to // copy the initial segment, just whatever needs to be appended. if (path.length == 0) { auto newlen = word.length; if (buffer.length < newlen) buffer.length = newlen; buffer[0 .. newlen] = word[]; return buffer[0 .. newlen]; } else { auto newlen = path.length + 1 + word.length; if (buffer.length < newlen) buffer.length = newlen; buffer[path.length] = ' '; buffer[path.length + 1 .. newlen] = word[]; return buffer[0 .. newlen]; } } bool impl(Trie node, const(char)[] suffix, const(char)[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { cb(phoneNumber, appendPath(path, word)); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. ret = true; if (impl(dict, suffix, appendPath(path, word), true)) allowDigit = false; } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { cb(phoneNumber, appendPath(path, suffix[0 .. 1])); ret = true; } else { if (impl(dict, suffix[1 .. $], appendPath(path, suffix[0 .. 1]), false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, buffer[0 .. 0], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R)(R input, Trie dict, MatchCallback cb) if (isInputRange!R & is(ElementType!R : const(char)[])) { foreach (line; input) { findMatches(dict, line, cb); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (phone, match) { app.put(format("%s: %s", phone, match)); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Mai Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/7-3576-", "/8-", "1556/0", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (phone, match) { app.put(format("%s: %s", phone, match)); }); //writefln("\n%-(%s\n%)", app.data); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15" * matches "ja", thus excluding a digit in that position. */ ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (phone, match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (phone, match) { writefln("%s: %s", phone, match); }); } return 0; } -----------------------------------snip------------------------------------ T -- People tell me that I'm paranoid, but they're just out to get me.... what I am really curious about is what the code I wrote is doing wrong that causes it to run 4x slower than Rust despite doing "the same thing"...It's a GC allocations fest.
Jan 16
On Tuesday, 16 January 2024 at 23:43:16 UTC, H. S. Teoh wrote:(I'm especially curious to see if this beats the Rust version. :-P)That's just irrelevant when you're using a completely different algorithm. My conclusion in the study was exactly that: much more important than the language is the algorithm. The language does matter, but much, much less. If you want to check your performance, you know you can run the `./benchmark.sh` yourself?
Jan 16
On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:If you want to check your performance, you know you can run the `./benchmark.sh` yourself?Out of curiosity I've tried to manually run this on Windows and it seems that Java generator for these numbers files is "broken", the resulting count or print runs fine for both Java and D versions provided in your D branch, but fails with generated files. D version complains about bad utf8 encoding. I've opened the generated file in text editor and it is UTF-16 (little-endian with BOM). Tried with adoptium jdk 17 and 21 (former openjdk), but I guess it doesn't matter since UTF-16 is default on Windows.
Jan 17
On Wednesday, 17 January 2024 at 09:15:02 UTC, evilrat wrote:On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:It's not Java writing the file, it's the bash script [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31): ``` java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt ``` Java is just printing to stdout. I wasn't aware that piping like this would use the OS default encoding. Unfortunately I don't have Windows here, but try to change the encoding used by the bash script maybe?If you want to check your performance, you know you can run the `./benchmark.sh` yourself?Out of curiosity I've tried to manually run this on Windows and it seems that Java generator for these numbers files is "broken", the resulting count or print runs fine for both Java and D versions provided in your D branch, but fails with generated files. D version complains about bad utf8 encoding. I've opened the generated file in text editor and it is UTF-16 (little-endian with BOM). Tried with adoptium jdk 17 and 21 (former openjdk), but I guess it doesn't matter since UTF-16 is default on Windows.
Jan 17
On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:It's not Java writing the file, it's the bash script [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31): ``` java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt ```Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
Jan 17
On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:I've used powershell env var to set output to utf8, D version now works but java doesn't. ``` java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10 at Trie.completeSolution(Main.java:216) at Trie.forEachSolution(Main.java:192) at PhoneNumberEncoder.encode(Main.java:132) at Main.lambda$main$1(Main.java:38) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at Main.main(Main.java:38) ```It's not Java writing the file, it's the bash script [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31): ``` java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt ```Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
Jan 17
On Wednesday, 17 January 2024 at 10:50:26 UTC, evilrat wrote:On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:This is this line: ``` var digit = chars[ index ] - 48; ``` That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:I've used powershell env var to set output to utf8, D version now works but java doesn't. ``` java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt phones_1000.txt Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of bounds for length 10 at Trie.completeSolution(Main.java:216) at Trie.forEachSolution(Main.java:192) at PhoneNumberEncoder.encode(Main.java:132) at Main.lambda$main$1(Main.java:38) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at Main.main(Main.java:38) ```It's not Java writing the file, it's the bash script [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31): ``` java -cp "build/util" util.GeneratePhoneNumbers 1000 > phones_1000.txt ```Perhaps using this option when running Java will help: ``` java -DFile.Encoding=UTF-8 ... ```
Jan 17
On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.It seems that it is only works with ASCII encoding though.
Jan 17
On Wednesday, 17 January 2024 at 11:56:19 UTC, evilrat wrote:On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:Yes, that's according to the rules - only ASCII for everything.That means the input file is still not ASCII (or UTF-8) as it should. Java is reading files with the ASCII encoding so it should've worked fine.It seems that it is only works with ASCII encoding though.
Jan 17
On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka wrote:On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:I was under the impression that `[digits[0]]` would just use a stack allocation?? The profiler does not show any GC anymore, are you sure it's a "GC allocations fest"???For the record (I already posted this on GitHub)... here's [my current fastest solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s c/d/src/dencoder.d) time using the same algorithm as Rust ...[...]... what I am really curious about is what the code I wrote is doing wrong that causes it to run 4x slower than Rust despite doing "the same thing"...It's a GC allocations fest. Things like this make it slow: ```diff { - string digit = [digits[0]]; + string digit = digits[0 .. 1]; words.insertBack(digit); ```And at the top is the associative array lookup (when profiling the handling of the "phones_1_000_000_with_empty.txt" input file): ``` 36.85% dencoder dencoder [.] _aaInX 12.38% dencoder dencoder [.] voidWell, I know, but I did everything to make the hash "cheap" to compute so I still don't see a way to improve it.
Jan 16
On Wednesday, 17 January 2024 at 07:06:25 UTC, Renato wrote:On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka wrote:nah, you are allocating new array out of single digit while the latter is just takes a slice. there is 'scope' storage specifier for when you know your variable won't escape the scope to place it on stack (works for classes too), but I'm not sure if it will work with array. `scope string digit = [digits[0]];`On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote: It's a GC allocations fest. Things like this make it slow: ```diff { - string digit = [digits[0]]; + string digit = digits[0 .. 1]; words.insertBack(digit); ```I was under the impression that `[digits[0]]` would just use a stack allocation?? The profiler does not show any GC anymore, are you sure it's a "GC allocations fest"???
Jan 17
On Tue, Jan 16, 2024 at 07:50:35AM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]Unfortunately there seems to be some discrepancy between the output I got and the prescribed output in your repository. For example, in your output the number 1556/0 does not have an encoding, but isn't "1 Mai 0" a valid encoding according to your dictionary and the original problem description?[...] Also, found a bug in my program that misses some solutions when the phone number has trailing non-digits. Here's the updated code. It still finds extra encodings from the output in your repo, though. Maybe I misunderstood part of the requirements? ------------------------------snip-------------------------------------- /** * Encoding phone numbers according to a dictionary. */ import std; /** * Table of digit mappings. */ static immutable ubyte[dchar] digitOf; shared static this() { digitOf = [ 'E': 0, 'J': 1, 'N': 1, 'Q': 1, 'R': 2, 'W': 2, 'X': 2, 'D': 3, 'S': 3, 'Y': 3, 'F': 4, 'T': 4, 'A': 5, 'M': 5, 'C': 6, 'I': 6, 'V': 6, 'B': 7, 'K': 7, 'U': 7, 'L': 8, 'O': 8, 'P': 8, 'G': 9, 'H': 9, 'Z': 9, ]; } /** * Trie for storing dictionary words according to the phone number mapping. */ class Trie { Trie[10] edges; string[] words; private void insert(string word, string suffix) { const(ubyte)* dig; while (!suffix.empty && (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null) { suffix = suffix[1 .. $]; } if (suffix.empty) { words ~= word; return; } auto node = new Trie; auto idx = *dig; if (edges[idx] is null) { edges[idx] = new Trie; } edges[idx].insert(word, suffix[1 .. $]); } /** * Insert a word into the Trie. * * Characters that don't map to any digit are ignored in building the Trie. * However, the original form of the word will be retained as-is in the * leaf node. */ void insert(string word) { insert(word, word[]); } /** * Iterate over all words stored in this Trie. */ void foreachEntry(void delegate(string path, string word) cb) { void impl(Trie node, string path = "") { if (node is null) return; foreach (word; node.words) { cb(path, word); } foreach (i, child; node.edges) { impl(child, path ~ cast(char)('0' + i)); } } impl(this); } } /** * Loads the given dictionary into a Trie. */ Trie loadDictionary(R)(R lines) if (isInputRange!R & is(ElementType!R : const(char)[])) { Trie result = new Trie; foreach (line; lines) { result.insert(line.idup); } return result; } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto app = appender!(string[]); dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); }); assert(app.data == [ "10: je", "105513: jemand", "107: neu", "1550: Name", "253302: Wasser", "35: da", "38: so", "400: Fee", "4021: fern", "4034: Fest", "482: Tor", "4824: fort", "4824: Torf", "51: an", "562: mir", "562: Mix", "56202: Mixer", "78: Bo\"", "783: bo\"s", "7857: blau", "7884: Boot", "824: Ort", "83: o\"d" ]); } /** * Find all encodings of the given phoneNumber according to the given * dictionary, and write each encoding to the given sink. */ void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink) if (isOutputRange!(W, string)) { bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit) { if (node is null) return false; // Ignore non-digit characters in phone number while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9')) suffix = suffix[1 .. $]; if (suffix.empty) { // Found a match, print result foreach (word; node.words) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(only(word)))); } return !node.words.empty; } bool ret; foreach (word; node.words) { // Found a matching word, try to match the rest of the phone // number. if (impl(dict, suffix, path ~ word, true)) { allowDigit = false; ret = true; } } if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false)) { allowDigit = false; ret = true; } if (allowDigit) { // If we got here, it means that if we take the current node as the // next word choice, then the following digit will have no further // matches, and we may encode it as a single digit. auto nextSuffix = suffix[1 .. $]; if (nextSuffix.empty) { put(sink, format("%s: %-(%s %)", phoneNumber, path.chain(suffix[0 .. 1].only))); ret = true; } else { if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false)) ret = true; } } return ret; } // Trim trailing non-digits from phone number auto suffix = phoneNumber[]; while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9')) { suffix = suffix[0 .. $-1]; } impl(dict, suffix, [], true); } /** * Encode the given input range of phone numbers according to the given * dictionary, writing the output to the given sink. */ void encodePhoneNumbers(R,W)(R input, Trie dict, W sink) if (isInputRange!R & is(ElementType!R : const(char)[]) && isOutputRange!(W, string)) { foreach (line; input) { findMatches(dict, line, sink); } } /// unittest { auto dict = loadDictionary(q"ENDDICT an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser ENDDICT".splitLines); auto input = [ "112", "5624-82", "4824", "0721/608-4067", "10/783--5", "1078-913-5", "381482", "04824", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); //writefln("%-(%s\n%)", app.data); assert(app.data.sort.release == [ "04824: 0 Tor 4", "04824: 0 Torf", "04824: 0 fort", "10/783--5: je Bo\" da", "10/783--5: je bo\"s 5", "10/783--5: neu o\"d 5", "381482: so 1 Tor", "4824: Tor 4", "4824: Torf", "4824: fort", "5624-82: Mix Tor", "5624-82: mir Tor", ]); } unittest { auto dict = loadDictionary(q"ENDDICT Bias ja Reck Weib USA ENDDICT".splitLines); auto input = [ "/7-357653152/0677-", "/8-", ]; auto app = appender!(string[]); encodePhoneNumbers(input, dict, (string match) { app.put(match); }); assert(app.data.sort.release == [ "/7-357653152/0677-: USA Bias ja Reck 7", "/7-357653152/0677-: USA Bias ja Weib 7", "/8-: 8", ]); } /** * Program entry point. */ int main(string[] args) { File input = stdin; bool countOnly; auto info = getopt(args, "c|count", "Count solutions only", &countOnly, ); int showHelp() { stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]", args[0]); defaultGetoptPrinter("", info.options); return 1; } if (info.helpWanted || args.length < 2) return showHelp(); auto dictfile = args[1]; if (args.length > 2) input = File(args[2]); Trie dict = loadDictionary(File(dictfile).byLine); if (countOnly) { size_t count; encodePhoneNumbers(input.byLine, dict, (string match) { count++; }); writefln("Number of solutions: %d", count); } else { encodePhoneNumbers(input.byLine, dict, (string match) { writeln(match); }); } return 0; } ------------------------------snip-------------------------------------- T -- "You are a very disagreeable person." "NO."
Jan 16
P.S. Compiling my program with `ldc -O2`, it runs so fast that I couldn't measure any meaningful running time that's greater than startup overhead. So I wrote a helper program to generate random phone numbers up to 50 characters long, and found that it could encode 1 million phone numbers in 2.2 seconds (using the 75,000 entry dictionary from your repository). Counting vs. printing the results made no significant difference to this. T -- People tell me that I'm skeptical, but I don't believe them.
Jan 16