digitalmars.D.learn - Help optimize D solution to phone encoding problem: extremely slow

Renato (121/121) Jan 13 2024 I like to use a phone encoding problem to determine the

Renato (39/41) Jan 13 2024 I profiled it on Linux and stored [the trace.log

Anonymouse (8/11) Jan 13 2024 As a drive-by suggestion and I hope it doesn't derail anything,

Renato (7/18) Jan 13 2024 Thanks for the suggestion, this looks promising as I do have a

Sergey (19/27) Jan 13 2024 In the repo is hard to find the proper version.

Anonymouse (6/9) Jan 14 2024 I would strongly argue for profiling first instead of optimising
Renato (74/112) Jan 14 2024 Can you be more specific about which part of the Rust

Jordan Wilson (24/30) Jan 14 2024 Hello Renato,

Renato (292/324) Jan 14 2024 The characteristics of the dictionary impact the number of

Sergey (5/10) Jan 14 2024 I've added port from Rust in the PR comment. Can you please check

Renato (8/18) Jan 15 2024 As discussed on GitHub, the line-by-line port of the Rust code is

H. S. Teoh (346/370) Jan 16 2024 This problem piqued my interest, so yesterday and today I worked on it

Siarhei Siamashka (26/31) Jan 16 2024 You are not allowed to emit "1" as the first token in the output

Renato (18/51) Jan 16 2024 Exactly, this is one of the things that make this problem a bit

H. S. Teoh (68/93) Jan 16 2024 Ohhh now I get it. Initially I misunderstood that as saying that if the
H. S. Teoh (359/361) Jan 16 2024 [...]

Renato (76/81) Jan 16 2024 Great, I ran the benchmarks for you :)

Siarhei Siamashka (5/8) Jan 16 2024 That's dynamic programming with memoization. Basically caching
H. S. Teoh (13/48) Jan 16 2024 [...]

Renato (9/11) Jan 16 2024 That will make no difference for the `count` option which is

H. S. Teoh (11/26) Jan 17 2024 Of course it will. Passing the data directly to the callback that bumps
H. S. Teoh (14/18) Jan 17 2024 I can't. I spent half an hour trying to get ./benchmark.sh to run, but

Renato (10/27) Jan 17 2024 I've just updated the Rust version to the benchmark monitor could

H. S. Teoh (7/8) Jan 17 2024 [...]

Renato (23/30) Jan 18 2024 Ok, last time I'm running this for someone else :D
Renato (34/41) Jan 18 2024 BTW here's you main function so it can run on the benchmark:

Siarhei Siamashka (66/72) Jan 16 2024 It's a GC allocations fest. Things like this make it slow:

H. S. Teoh (435/441) Jan 16 2024 Indeed.

Renato (8/10) Jan 16 2024 That's just irrelevant when you're using a completely different

evilrat (11/13) Jan 17 2024 Out of curiosity I've tried to manually run this on Windows and

Renato (11/25) Jan 17 2024 It's not Java writing the file, it's the bash script

Renato (5/11) Jan 17 2024 Perhaps using this option when running Java will help:

evilrat (37/51) Jan 17 2024 I've used powershell env var to set output to utf8, D version now

Renato (8/61) Jan 17 2024 This is this line:

evilrat (2/5) Jan 17 2024 It seems that it is only works with ASCII encoding though.

Renato (2/8) Jan 17 2024 Yes, that's according to the rules - only ASCII for everything.

Renato (8/29) Jan 16 2024 I was under the impression that `[digits[0]]` would just use a

evilrat (7/23) Jan 17 2024 nah, you are allocating new array out of single digit while the

H. S. Teoh (358/363) Jan 16 2024 [...]
H. S. Teoh (10/10) Jan 16 2024 P.S. Compiling my program with `ldc -O2`, it runs so fast that I

Renato <renato athaydes.com> writes:

I like to use a phone encoding problem to determine the 
strenghtness and weaknesses of programming languages because this 
problem is easy enough I can write solutions in any language in a 
few hours, but complex enough to exercise lots of interesting 
parts of the language.

You can check [my initial blog 
post](https://renato.athaydes.com/posts/revisiting-prechelt-paper-
omparing-languages) about this, which was an analysis or the original study by
Prechelt about the productivity differences between programmers using different
languages.

This original problem had a flaw when used in modern computers as 
the programs can find solutions so quickly that most of the time 
in the benchmarks was being used to actually print solutions to 
stdout, so I modified the problem so that there's an option to 
either print the solutions, or just count the number of solutions 
- so the programs still need to do all the work, but are not 
required to print anything other than how many solutions were 
found.

Anyway, I ported the Common Lisp solution to D because, like CL, 
D has built-in data structures like associative arrays and 
`BigInt` (at least it's in the stdlib)... I thought this would 
actually give D an edge! But it turns out D performs very badly 
for larger input sets. It starts quite well on smaller inputs, 
but scales very poorly.

My initial Rust solution also performed very poorly, so I'm 
afraid the same is happening with my initial D solution, despite 
my best effort to write something "decent".

You can find my D solution [in this Pull 
Request](https://github.com/renatoathaydes/prechelt-phone-number-encoding/pull/16).

The solution is almost identical, to the extent possible, to the 
Common Lisp solution, which can be [found 
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/lisp/main.lisp).

The secret to high performance for the algorithm being used is 
having a very efficient `BigInt` implementation, and fast hashing 
function for the hash table (associative array in D, with 
`BigInt` as keys). Hence, I suppose D's hash function or `BigInt` 
are not that fast (Rust's default hash is also not great due to 
security concerns, but it's very easy to use a custom one which 
is much faster by changing a single line of code).

Anyway, here's the current relative performance (the other 
languages are pretty heavily optimised, the Java solution uses a 
different algorithm so it's not directly comparable, [the Rust 
solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_
ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but
instead of `BigInt`, it uses a `Vec<u8>` as key as that turned out to be faster
- I may try that technique in D as well - but notice that even using Rust's
slower BigInt, it was orders of magnitude faster than D).

```
Benchmarking...
Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,109895680,377
java-Main,179634176,1025
java-Main,167149568,1621
java-Main,180203520,2493
java-Main,96481280,6112
java-Main,95526912,7989
===> sbcl --script src/lisp/main.fasl
sbcl,31780864,74
sbcl,79437824,888
sbcl,79613952,3991
sbcl,80654336,7622
sbcl,80420864,18623
sbcl,83402752,29503
===> ./rust
./rust,23257088,58
./rust,23437312,260
./rust,23433216,1077
./rust,23416832,2017
./rust,7106560,6757
./rust,7110656,10165
===> src/d/dencoder
src/d/dencoder,38748160,223
src/d/dencoder,75800576,3154
src/d/dencoder,75788288,14905
src/d/dencoder,75751424,30271
```

I had to abort D as it was taking too long.

The above is with the `print` option (that's normally slow when 
stdout is not buffered, but I did buffer D's writes and it's 
still very slow).

With the `count` option, which does not print anything except a 
number at the end, D took so long I couldn't even collect its 
results (I waited several minutes)... the other languages 
finished in much less than a minute:

```
Benchmarking...
Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,124112896,7883
java-Main,107487232,9273
===> sbcl --script src/lisp/main.fasl
sbcl,82669568,25638
sbcl,83759104,33501
===> ./rust
./rust,7061504,9488
./rust,7127040,11441
===> src/d/dencoder
^C
(ldc-1.36.0)
```

I tried to profile the D code but the profiler seems to be broken 
on my OS (Mac):

```
▶ dub build -b profile
     Starting Performing "profile" build using 
/Users/renato/dlang/ldc-1.36.0/bin/ldc2 for x86_64.
     Building prechelt ~master: building configuration 
[application]
      Linking dencoder
(ldc-1.36.0)
prechelt-phone-number-encoding/src/d  dlang ✗                     
                                                             9m ◒
▶ cd ../..
(ldc-1.36.0)
programming/projects/prechelt-phone-number-encoding  dlang ✗      
                                                             9m ◒
▶ src/d/dencoder
[1]    17877 illegal hardware instruction  src/d/dencoder
(ldc-1.36.0)
```

It's a bit hard to do any optmisation while "blind".

So, I decided to post it here to collect some hints about what to 
try and whether I hit some common performance bottlenecks in my 
current solution.

Would appreciate anything you can suggest, as long as you respect 
the fact that I'm still learning D so you can't expect me to know 
everything about it.

Jan 13 2024

Renato <renato athaydes.com> writes:

On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I tried to profile the D code but the profiler seems to be 
 broken on my OS (Mac):

I profiled it on Linux and stored [the trace.log 
file](https://gist.github.com/renatoathaydes/fd8752ed81b0cf792ed7ef5aa3d68acd)
on a public Gist.

I tried using the HTML viewer [recommended in the 
wiki](https://wiki.dlang.org/Development_tools) but that doesn't 
work... first try:

```
Running 
../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer
Corrupt trace.log (can't compute ticks per second), please 
re-profile and try again
```

Second try with a longer trace (the one I saved in the gist):

```
Running 
../../../.dub/packages/d-profile-viewer/~master/d-profile-viewer/d-profile-viewer
std.conv.ConvException /home/renato/dlang/ldc-1.36.0/bin/../impo
t/std/conv.d(2533): Unexpected '-' when converting from type char[] to type
ulong
----------------
??:?
  [0x55c4be9bc99e]
??:?
  [0x55c4be9bc612]
??:?
  [0x55c4be9de81e]
??:?
  [0x55c4be9c4fbf]
/home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2533 
[0x55c4be98ba2f]
/home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:2002 
[0x55c4be98b6fc]
/home/renato/dlang/ldc-1.36.0/bin/../import/std/conv.d:210 
[0x55c4be9665ec]
../../../.dub/packages/d-profile-viewer/~master/d-profile-view
r/source/app.d:1095 [0x55c4be965e21]
../../../.dub/packages/d-profile-viewer/~master/d-profile-view
r/source/app.d:1138 [0x55c4be96698a]
??:?
  [0x55c4be9c4c9c]
```

Not a great profiling experience :). Anyone has a better 
suggestion to "parse" the trace file?

Jan 13 2024

Anonymouse <zorael gmail.com> writes:

On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
 Not a great profiling experience :). Anyone has a better 
 suggestion to "parse" the trace file?

As a drive-by suggestion and I hope it doesn't derail anything, 
but if you have the opportunity to run it on linux, have you 
tried profiling with callgrind instead, with {Q,K}Cachegrind to 
visualise things? Your repositories probably have them. 
(callgrind is a part of valgrind.)

The wiki only mentions callgrind in passing, but it has worked 
well for me. [(example)](https://i.imgur.com/WWZAwy3.png)

Jan 13 2024

Renato <renato athaydes.com> writes:

On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]
 Not a great profiling experience :). Anyone has a better 
 suggestion to "parse" the trace file?

 As a drive-by suggestion and I hope it doesn't derail anything, 
 but if you have the opportunity to run it on linux, have you 
 tried profiling with callgrind instead, with {Q,K}Cachegrind to 
 visualise things? Your repositories probably have them. 
 (callgrind is a part of valgrind.)

 The wiki only mentions callgrind in passing, but it has worked 
 well for me. [(example)](https://i.imgur.com/WWZAwy3.png)

Thanks for the suggestion, this looks promising as I do have a 
Linux laptop (just not my main one).

I will have to try it... I thought that `BigInt` was to blame for 
the slowness (from what I could read from the trace logs), but 
after replacing that with basically a byte array key (see [commit 
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649
e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp
and very, very far from Java and Rust.

Jan 13 2024

Sergey <kornburn yandex.ru> writes:

On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:
 On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
 On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
 [...]


 I will have to try it... I thought that `BigInt` was to blame 
 for the slowness (from what I could read from the trace logs), 
 but after replacing that with basically a byte array key (see 
 [commit 
 here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649
e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp
and very, very far from Java and Rust.

In the repo is hard to find the proper version.
I've checked the Rust from master branch and it looks a bit 
different from D implementation..

I would suggest to rewrite in the same way as Rust implemented.
Probably you would like to try:
* do not use BigInt from std. It could be quite slow. Try to use 
GMP library from Dub instead
* don't do "idup" every time
* instead of byLine, try byLineCopy
* instead of "arr ~= data" try to use Appender 
(https://dlang.org/library/std/array/appender.html)
* also you could try to use splitter 
(https://dlang.org/library/std/algorithm/iteration/splitter.html) 
to lazily process each part of the data
* isLastDigit function has many checks, but I think it could be 
implemented easier in a Rust way
* also consider to use functions from Range (filter, map) as you 
use it in Rust, instead of using for loops

Jan 13 2024

Anonymouse <zorael gmail.com> writes:

On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
 I would suggest to rewrite in the same way as Rust implemented.
 Probably you would like to try:
 [...]

I would strongly argue for profiling first instead of optimising 
based on conjecture. If you profile you have solid evidence on 
what is actually slow. If you're very good at analysing D, 
well-educated hypotheses *may* be enough, until they suddenly 
aren't and you will have spent a lot of time on the wrong problem.

Jan 14 2024

Renato <renato athaydes.com> writes:

On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
On Saturday, 13 January 2024 at 19:35:57 UTC, Renato wrote:
On Saturday, 13 January 2024 at 17:00:58 UTC, Anonymouse wrote:
On Saturday, 13 January 2024 at 12:55:27 UTC, Renato wrote:
[...]

I will have to try it... I thought that `BigInt` was to blame
for the slowness (from what I could read from the trace logs),
but after replacing that with basically a byte array key (see
[commit
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0e9025b9aacdcfef5a2649
e4cc82b9bc607fd6c)) it barely improved. It's still much slower than Common Lisp
and very, very far from Java and Rust.

In the repo is hard to find the proper version.
I've checked the Rust from master branch and it looks a bit
different from D implementation..

I explicitly linked to the Rust implementation in my question:

the [Rust
solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/rust/phone_
ncoder/src/main.rs) uses approx. the same algorithm as used in CL and D, but
instead of BigInt, it uses a Vec<u8> as key

Can you be more specific about which part of the Rust
implementation is not roughly equivalent to the D
implementation?? There are obvious differences in style and
syntax, but I hope that it's mostly equivalent algorithm-wise.

But to clarify: the branch where the fastest solutions are is
called `fastest-implementations-print-or-count`.

The D branches with my alternative implementations are all called
`dlang-*`.

I would suggest to rewrite in the same way as Rust implemented.
Probably you would like to try:
* do not use BigInt from std. It could be quite slow. Try to
use GMP library from Dub instead

Right, but as I posted above, even using a byte-array just like
in Rust, the performance was still very bad (but around 2x faster
than using BigInt).

Also, using GMP is always suggested to me, but I can't accept
that because my goal is not to use C libraries to achieve the
fastest speed (which I could do by using C or FFI in any other
language), it's to use D (and the other languages) to solve my
problem in an idiomatic way.

I [ended up using
`Int128`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/98dcbcf1c77d1ded3406c
03af9026e126df5b2d) (the problem description requires more than i64, but int128
is enough), which grealy improved performance over my D baseline AND on the
byte-array solution.

* don't do "idup" every time
* instead of byLine, try byLineCopy

`idup` is necessary because the strings are stored in a Map after
the iteration ends. Anyway, I replaced that with `byLineCopy` and
it became sligthtly slower.

* instead of "arr ~= data" try to use Appender
(https://dlang.org/library/std/array/appender.html)

I was worried about `~=` allocating too much, though IIUC it
shouldn't be a problem the way I used it... in any case, I
[replaced it with
`Appender`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/f1364d0897f1f37882f1d
9b92c16b84b1abdc31) and the performance was completely unchanged - which is
good as I think it means I used `~=` correctly to avoid overallocation (or I
messed up using `Appender` :D - pls have a look).

* also you could try to use splitter
(https://dlang.org/library/std/algorithm/iteration/splitter.html) to lazily
process each part of the data

This is not necessary because that would only affect the time
spent loading the dictionary, which is the faster part of the
problem... nearly all of the time is spent looking for solutions
instead.

* isLastDigit function has many checks, but I think it could be
implemented easier in a Rust way

The Rust solution uses sum types for that, but that had a very
minor effect on the overall performance (though in Rust, that's
"cleaner" to use anyway)... I know D does have SumType in the
stdlib, but I thought it is not as "zero cost" as in Rust, and
because both variants wrap a String, it's awkward to use that...
so I instead tried using a struct like this:

```d
struct WordOrDigit {
string value;
bool isDigit;
}
```

You can check [the commit
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0f6b4ce83373c46b14b5bb40c53bb0e2d0905e66).

This change made the code slightly slower.

* also consider to use functions from Range (filter, map) as
you use it in Rust, instead of using for loops

Why would for loops be slower? Or you're just saying I should
make the D code nicer?

Anyway, thanks for the suggestions!

On Sunday, 14 January 2024 at 08:33:24 UTC, Anonymouse wrote:
On Saturday, 13 January 2024 at 23:20:32 UTC, Sergey wrote:
I would suggest to rewrite in the same way as Rust implemented.
Probably you would like to try:
[...]

I would strongly argue for profiling first instead of
optimising based on conjecture. If you profile you have solid
evidence on what is actually slow. If you're very good at
analysing D, well-educated hypotheses *may* be enough, until
they suddenly aren't and you will have spent a lot of time on
the wrong problem.

Totally agree. I will spend some time on my Linux machine to see
if I can get more useful profiling data.

For now, with the int128 change from BigInt, the code is about 4x
faster than before, but on larger problems, it's still scaling
very badly compared to other languages (this suggests there's
still some "exponential growth" going on, which should not happen
as the problem is not exponential).

```
Proc,Memory(bytes),Time(ms)
===> ./rust
./rust,23252992,59
./rust,23420928,263
./rust,23412736,1025
./rust,7069696,8661
===> src/d/dencoder
src/d/dencoder,11268096,72
src/d/dencoder,23781376,938
src/d/dencoder,23818240,4443
src/d/dencoder,10723328,134703
```

On the bright side: D is using almost as little memory as Rust,
which is orders of magnitude better than the other, usual GC
languages.

Jan 14 2024

Jordan Wilson <wilsonjord gmail.com> writes:

On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I like to use a phone encoding problem to determine the 
 strenghtness and weaknesses of programming languages because 
 this problem is easy enough I can write solutions in any 
 language in a few hours, but complex enough to exercise lots of 
 interesting parts of the language.

 [...]

Hello Renato,

This seems to be quite a lot of calls:
```
======== Timer frequency unknown, Times are in Megaticks ========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

19204964        3761        3756           0     pure nothrow ref 
 trusted immutable(char)[][] 
core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][],
immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][],
ulong)

19204924        8957        3474           0      safe void 
dencoder.printTranslations(immutable(char)[][][dencoder.Key], 
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], 
immutable(char)[][])
```

This is when using the `words-quarter.txt` input (the 
`dictionary.txt` input seems to finish much faster, although 
still slower than `java`/`rust`).

I also used only 100 phone numbers as input.

My final observation is that `words-quarter.txt` contains some 
1-letter inputs, (for example, `i` or `m`)...this may result in a 
large number of encoding permutations, which may explain the high 
number of recursion calls?

Jordan

Jan 14 2024

Renato <renato athaydes.com> writes:

On Sunday, 14 January 2024 at 10:02:58 UTC, Jordan Wilson wrote:
 On Saturday, 13 January 2024 at 11:03:42 UTC, Renato wrote:
 I like to use a phone encoding problem to determine the 
 strenghtness and weaknesses of programming languages because 
 this problem is easy enough I can write solutions in any 
 language in a few hours, but complex enough to exercise lots 
 of interesting parts of the language.

 [...]

 Hello Renato,

 This seems to be quite a lot of calls:
 ```
 ======== Timer frequency unknown, Times are in Megaticks 
 ========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

 19204964        3761        3756           0     pure nothrow 
 ref  trusted immutable(char)[][] 
 core.internal.array.appending._d_arrayappendcTX!(immutable(char)[][],
immutable(char)[])._d_arrayappendcTX(scope return ref immutable(char)[][],
ulong)

 19204924        8957        3474           0      safe void 
 dencoder.printTranslations(immutable(char)[][][dencoder.Key], 
 dencoder.ISolutionHandler, immutable(char)[], 
 immutable(char)[], immutable(char)[][])
 ```

 This is when using the `words-quarter.txt` input (the 
 `dictionary.txt` input seems to finish much faster, although 
 still slower than `java`/`rust`).

 I also used only 100 phone numbers as input.

 My final observation is that `words-quarter.txt` contains some 
 1-letter inputs, (for example, `i` or `m`)...this may result in 
 a large number of encoding permutations, which may explain the 
 high number of recursion calls?

 Jordan

The characteristics of the dictionary impact the number of 
solutions greatly. I explored this in my blog post about [Common 
Lisp, Part 
2](https://renato.athaydes.com/posts/revenge_of_lisp-part-2).

The fact there's a ridiculous amount of function calls is why 
this problem can take minutes even without having to allocate 
much memory or print anything.

** I've managed to improve the D code enough that it is now 
faster than Common Lisp and the equivalent algorithm in Java.**

It took some profiling to do that, though... thanks to 
 Anonymouse for the suggestion to use Valgrind... with that, I 
was able to profile the code nicely (Valgrind works nicely with 
D, it doesn't even show mangled names!).

Here's what I did.

First: the solution using int128 was much faster than the BigInt 
solution, as I had already mentioned. But when profiling that, it 
was clear that for the problems with a very large number of 
solutions, GC became a problem:

```
--------------------------------------------------------------------------------
23,044,096,944 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
7,079,878,197 (30.72%)  
???:core.internal.gc.impl.conservative.gc.Gcx.mark!(false, true, 
true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRang
!(false).ScanRange) [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
2,375,100,857 (10.31%)  
???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128],
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[],
immutable(char)[][])'2
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
1,971,210,820 ( 8.55%)  ???:_aaInX 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
1,922,961,924 ( 8.34%)  ???:_d_arraysetlengthT 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
1,298,747,622 ( 5.64%)  ???:core.int128.mul(core.int128.Cent, 
core.int128.Cent) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
1,134,644,706 ( 4.92%)  
???:core.internal.gc.bits.GCBits.setLocked(ulong) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
   849,587,834 ( 3.69%)  
???:core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, 
ref ulong, uint, const(TypeInfo)) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
   827,407,988 ( 3.59%)  
./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcp
_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
   688,845,027 ( 2.99%)  
./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset
avx2_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc.so.6]
   575,415,884 ( 2.50%)  
???:_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZ
QDd6memory8BlkInfo_ [/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
   562,146,592 ( 2.44%)  
???:core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC
mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)),
core.internal.gc.impl.conservative.gc.mallocTime,
core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong,
const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo))
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
   526,067,586 ( 2.28%)  
???:core.internal.spinlock.SpinLock.lock() shared 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/dencoder-int128]
```

So, I [decided to use 
`std.container.Array`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/b927bc9cc4e33f9f6ba45
d8abc3bccbc17c7f1e) (I had tried Appender before but that didn't really work)
to make sure no allocation was happening when adding/removing words from the
`words` which are carried through the `printTranslations` calls (that's the hot
path).

As you can see in the commit, the code is just slightly less 
convenient...
this made the code run considerably faster for the very long runs!

Here's the Callgrind profiling data AFTER this change:

```
6,547,316,944 (32.42%)  
???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128],
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[],
std.container.array.Array!(immutable(char)[]).Array)'2
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
5,229,076,596 (25.90%)  ???:_aaInX 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
3,871,644,402 (19.17%)  ???:core.int128.mul(core.int128.Cent, 
core.int128.Cent) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   677,533,800 ( 3.36%)  ???:std.int128.Int128.toHash() const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   543,388,688 ( 2.69%)  
???:std.int128.Int128.this(core.int128.Cent) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   543,388,688 ( 2.69%)  ???:std.int128.Int128.this(long, long) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   542,027,045 ( 2.68%)  ???:object.TypeInfo_Struct.getHash(scope 
const(void*)) const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   424,849,937 ( 2.10%)  ???:object.TypeInfo_Struct.equals(in 
void, in void) const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
```

No more GC! It seems that now, as expected, most time is spent 
calculating hashes for each possible solution (as it should be). 
I am not entirely sure what the `_aaInX` call is, but I believe 
that's the associative array's membership check?

The D profiler seems to show similar data:

```
======== Timer frequency unknown, Times are in Megaticks ========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

104001600         158         158           0     const pure 
nothrow  nogc  safe std.int128.Int128 
std.int128.Int128.opBinary!("*").opBinary(std.int128.Int128)
402933936          63          63           0     const pure 
nothrow  property  nogc  safe bool 
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized()
183744166          87          59           0     inout pure 
nothrow ref  property  nogc return  safe 
inout(std.container.array.Array!(immutable(char)[]).Array.Payload)
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.refCountedPayload()
103784880          68          38           0     const pure 
nothrow  nogc  safe std.int128.Int128 
std.int128.Int128.opBinary!("+", int).opBinary(const(int))
103784880          99          31           0     pure nothrow 
ref  nogc  safe std.int128.Int128 
std.int128.Int128.opOpAssign!("+", int).opOpAssign(int)
104001600          30          30           0     const pure 
nothrow  nogc  safe std.int128.Int128 
std.int128.Int128.opBinary!("+").opBinary(std.int128.Int128)
61167377          67          29           0     pure nothrow 
 nogc void 
std.container.array.Array!(immutable(char)[]).Array.__fieldDtor()
43444575          61          27           0     pure nothrow 
 nogc  safe void 
core.internal.lifetime.emplaceRef!(immutable(char)[], 
immutable(char)[], immutable(char)[]).emplaceRef(ref 
immutable(char)[], ref immutable(char)[])
48427530          79          27           0     const pure 
nothrow  property  nogc  safe bool 
std.container.array.Array!(immutable(char)[]).Array.empty()
61167377          38          27           0     pure nothrow 
 nogc void 
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.__dtor()
43444575         130          24           0     pure nothrow 
 nogc ulong 
std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[])
61167376          32          23           0     pure nothrow 
 nogc  safe void 
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.__postblit()
```

Anyway, after this change, the code now scaled much better.

But there's nothing much lest to be optimised... except the 
arithmetics (notice how the `("*").opBinary` call is near the top.

I know how to make that faster using a similar strategy that I 
used in Rust (you can check my [journey to optimise the Rust code 
on my blog 
post](https://renato.athaydes.com/posts/how-to-write-fast-rust-code) about that
- specifically, check the `Using packed bytes for more efficient storage`
section)... knowing how the encoding works, I knew that multiplication is not
really needed. So, instead of doing this:

```d
n *= 10;
n += c - '0';
```

I could do this:

```d
n <<= 4;
n += c;
```

This changes the actual number, but that doesn't matter: the code 
is still unique for each number, which is all that we need.

You can see [the full commit 
here](https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c).

This improved the performance for sure, even if not by much.

The profiling data after this arithmetic trick looks like this:

Valgrind:

```
4,821,217,799 (35.75%)  
???:dencoder.printTranslations(immutable(char)[][][std.int128.Int128],
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[],
std.container.array.Array!(immutable(char)[]).Array)'2
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
4,241,404,850 (31.45%)  ???:_aaInX 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
1,038,733,782 ( 7.70%)  ???:core.int128.shl(core.int128.Cent, 
uint) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   575,372,270 ( 4.27%)  ???:std.int128.Int128.toHash() const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   461,659,456 ( 3.42%)  
???:std.int128.Int128.this(core.int128.Cent) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   460,297,816 ( 3.41%)  ???:object.TypeInfo_Struct.getHash(scope 
const(void*)) const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   347,198,989 ( 2.57%)  ???:object.TypeInfo_Struct.equals(in 
void, in void) const 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
   288,537,160 ( 2.14%)  ???:core.int128.add(core.int128.Cent, 
core.int128.Cent) 
[/home/renato/programming/experiments/prechelt-phone-number-encoding/src/d/dencoder]
```

D Profiler:

```
======== Timer frequency unknown, Times are in Megaticks ========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

29879388       44037       11267           0     void 
dencoder.printTranslations(immutable(char)[][][std.int128.Int128],
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[],
std.container.array.Array!(immutable(char)[]).Array)
126827950        4306        3599           0     inout pure 
nothrow ref  property  nogc return  safe 
inout(std.container.array.Array!(immutable(char)[]).Array.Payload)
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.refCountedPayload()
29879268        7293        3100           0     pure nothrow 
 nogc ulong 
std.container.array.Array!(immutable(char)[]).Array.Payload.insertBack!(immutable(char)[]).insertBack(immutable(char)[])
33534720        4896        2780           0     const pure 
nothrow  property  nogc  safe bool 
std.container.array.Array!(immutable(char)[]).Array.empty()
29879268       11834        2479           0     pure nothrow 
 nogc ulong 
std.container.array.Array!(immutable(char)[]).Array.insertBack!(immutable(char)[]).insertBack(immutable(char)[])
29879268        8702        2279           0     pure nothrow 
 nogc  safe void 
std.container.array.Array!(immutable(char)[]).Array.removeBack()
282919518        2045        2045           0     const pure 
nothrow  property  nogc  safe bool 
std.typecons.RefCounted!(std.container.array.Array!(immutable(cha
)[]).Array.Payload, 0).RefCounted.RefCountedStore.isInitialized()
29879268        2534        1859           0     pure nothrow 
 nogc  safe void 
core.internal.lifetime.emplaceRef!(immutable(char)[], 
immutable(char)[], immutable(char)[]).emplaceRef(ref 
immutable(char)[], ref immutable(char)[])
44511077        3264        1505           0     pure nothrow 
 nogc void 
std.container.array.Array!(immutable(char)[]).Array.__fieldDtor()
44511076        2588        1254           0     pure nothrow 
 nogc scope void 
std.container.array.Array!(immutable(char)[]).Array.__fieldPostblit()
49758574        1684        1243           0     const pure 
nothrow  nogc  safe std.int128.Int128 
std.int128.Int128.opBinary!("+", char).opBinary(const(char))
49758574        2907        1223           0     pure nothrow ref 
 nogc  safe std.int128.Int128 std.int128.Int128.opOpAssign!("+", 
immutable(char)).opOpAssign(ref immutable(char))
49975294        1796        1208           0     pure nothrow ref 
 nogc  safe std.int128.Int128 std.int128.Int128.opOpAssign!("<<", 
int).opOpAssign(int)
```

As far as I can tell, the only two bottlenecks now bacame:

* `std.typecons.RefCounted!(Array.Payload, 
0).RefCounted.refCountedPayload()`
* the `Int128` implementation!

To fix the former, I would need to implement my own `Array`, I 
suppose... using ref-counting here seems unnecessary??

But that is a bit beyond what I am willing to do!

The latter could probably be fixed by not using a numeric key at 
all, just bytes - but my previous attempt at doing that didn't 
really give better results. Or perhaps by overriding the hashing 
for Int128 (I didn't try that because I assume the default should 
be pretty optimal already)??

So, I ran out of options and am going to call it done.

[Here's my final D 
solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d).

The code is still readable (I think everyone can agree D is one 
of the most readable languages of all)!

You can see how it performs in the last comparison run I did 
([all data, including a nice chart, in this 
gist](https://gist.github.com/renatoathaydes/ab5c86b0ea59152693a7236c333ac334)):

```
Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,3190730752,441
java-Main,3194789888,1440
java-Main,3193692160,2316
java-Main,3194720256,3604
java-Main,3192963072,12861
java-Main,3261128704,31282
===> sbcl --script src/lisp/main.fasl
sbcl,1275994112,83
sbcl,1275998208,1396
sbcl,1275998208,5925
sbcl,1275998208,9752
sbcl,1275998208,64811
sbcl,1276006400,153799
===> ./rust
./rust,23138304,56
./rust,23138304,234
./rust,23138304,1288
./rust,23138304,2444
./rust,9027584,15867
./rust,9027584,36985
===> src/d/dencoder
src/d/dencoder,219041792,67
src/d/dencoder,229904384,1114
src/d/dencoder,229904384,4421
src/d/dencoder,229904384,9087
src/d/dencoder,219041792,51315
src/d/dencoder,219041792,120818
```



Using `Int128` as a key instead of `BigInt` made the code much 
faster (around 4x faster).

Using `Array` to avoid too many reallocations with primitive 
dynamic arrays made the code run much faster for very large 
numbers of calls (very little difference when running for less 
than 10s, but at least 2x faster at round 1min runs where the 
tiny overhead adds up).

As you can see [in the chart I 
posted](https://gist.github.com/renatoathaydes/ab5c86b0ea5915
693a7236c333ac334), D's speed performance is close to that of Common Lisp for
all runs, though between 10 and 20% faster. It's nowhere near Rust,
unfortunately, with Rust being almost 4x faster (pls ignore the Java solution
as that's using a Trie instead of the "numeric" solution - so it's not a fair
comparison). I had expected to get very close to Rust, but that didn't
happen... I just can't see in the profiling data what's causing D to fall so
far behind!

On the memory data: D uses much less memory than Java and Common 
Lisp, but still a lot higher than Rust.

If anyone can find any flaw in my methodology or optmise my code 
so that it can still get a couple of times faster, approaching 
Rust's performance, I would greatly appreciate that! But for now, 
my understanding is that the most promising way to get there 
would be to write D in `betterC` style?!

Jan 14 2024

Sergey <kornburn yandex.ru> writes:

On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my 
 code so that it can still get a couple of times faster, 
 approaching Rust's performance, I would greatly appreciate 
 that! But for now, my understanding is that the most promising 
 way to get there would be to write D in `betterC` style?!

I've added port from Rust in the PR comment. Can you please check 
this solution?
Most probably it need to be optimized with profiler. Just 
interesting how close-enough port will work.

Jan 14 2024

Renato <renato athaydes.com> writes:

On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:
 On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my 
 code so that it can still get a couple of times faster, 
 approaching Rust's performance, I would greatly appreciate 
 that! But for now, my understanding is that the most promising 
 way to get there would be to write D in `betterC` style?!

 I've added port from Rust in the PR comment. Can you please 
 check this solution?
 Most probably it need to be optimized with profiler. Just 
 interesting how close-enough port will work.

As discussed on GitHub, the line-by-line port of the Rust code is 
5x slower than [my latest solution using 
int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/sr
/d/src/dencoder.d), which is itself 3 to 4x slower than the Rust implementation
(at around the same order of magnitude as algorithm-equivalent Java and Common
Lisp implementations, D is perhaps 15% faster).

I did the best I could to make D run faster, but we hit a limit 
that's a bit hard to get past now. Happy to be given suggestions 
(see profiling information in previous messages), but I've run 
out of ideas myself.

Jan 15 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Mon, Jan 15, 2024 at 08:10:55PM +0000, Renato via Digitalmars-d-learn wrote:
 On Monday, 15 January 2024 at 01:10:14 UTC, Sergey wrote:
 On Sunday, 14 January 2024 at 17:11:27 UTC, Renato wrote:
 If anyone can find any flaw in my methodology or optmise my code so
 that it can still get a couple of times faster, approaching Rust's
 performance, I would greatly appreciate that! But for now, my
 understanding is that the most promising way to get there would be
 to write D in `betterC` style?!

 
 I've added port from Rust in the PR comment. Can you please check
 this solution?
 Most probably it need to be optimized with profiler. Just
 interesting how close-enough port will work.

 
 As discussed on GitHub, the line-by-line port of the Rust code is 5x
 slower than [my latest solution using
 int128](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/0cbfd41a072718bfb0c0d0af8bb7266471e7e94c/src/d/src/dencoder.d),
 which is itself 3 to 4x slower than the Rust implementation (at around
 the same order of magnitude as algorithm-equivalent Java and Common
 Lisp implementations, D is perhaps 15% faster).
 
 I did the best I could to make D run faster, but we hit a limit that's
 a bit hard to get past now. Happy to be given suggestions (see
 profiling information in previous messages), but I've run out of ideas
 myself.

This problem piqued my interest, so yesterday and today I worked on it
and came up with my own solution (I did not look at existing solutions
in order to prevent bias).  I have not profiled it or anything, but the
runtime seems quite promising.

Here it is:

---------------------------------snip----------------------------------
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
    digitOf = [
        'E': 0,
        'J': 1, 'N': 1, 'Q': 1,
        'R': 2, 'W': 2, 'X': 2,
        'D': 3, 'S': 3, 'Y': 3,
        'F': 4, 'T': 4,
        'A': 5, 'M': 5,
        'C': 6, 'I': 6, 'V': 6,
        'B': 7, 'K': 7, 'U': 7,
        'L': 8, 'O': 8, 'P': 8,
        'G': 9, 'H': 9, 'Z': 9,
    ];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
    Trie[10] edges;
    string[] words;

    private void insert(string word, string suffix)
    {
        const(ubyte)* dig;
        while (!suffix.empty &&
               (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
        {
            suffix = suffix[1 .. $];
        }

        if (suffix.empty)
        {
            words ~= word;
            return;
        }

        auto node = new Trie;
        auto idx = *dig;
        if (edges[idx] is null)
        {
            edges[idx] = new Trie;
        }
        edges[idx].insert(word, suffix[1 .. $]);
    }

    /**
     * Insert a word into the Trie.
     *
     * Characters that don't map to any digit are ignored in building the Trie.
     * However, the original form of the word will be retained as-is in the
     * leaf node.
     */
    void insert(string word)
    {
        insert(word, word[]);
    }

    /**
     * Iterate over all words stored in this Trie.
     */
    void foreachEntry(void delegate(string path, string word) cb)
    {
        void impl(Trie node, string path = "")
        {
            if (node is null) return;
            foreach (word; node.words)
            {
                cb(path, word);
            }
            foreach (i, child; node.edges)
            {
                impl(child, path ~ cast(char)('0' + i));
            }
        }
        impl(this);
    }
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
    if (isInputRange!R & is(ElementType!R : const(char)[]))
{
    Trie result = new Trie;
    foreach (line; lines)
    {
        result.insert(line.idup);
    }
    return result;
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto app = appender!(string[]);
    dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
    assert(app.data == [
        "10: je",
        "105513: jemand",
        "107: neu",
        "1550: Name",
        "253302: Wasser",
        "35: da",
        "38: so",
        "400: Fee",
        "4021: fern",
        "4034: Fest",
        "482: Tor",
        "4824: fort",
        "4824: Torf",
        "51: an",
        "562: mir",
        "562: Mix",
        "56202: Mixer",
        "78: Bo\"",
        "783: bo\"s",
        "7857: blau",
        "7884: Boot",
        "824: Ort",
        "83: o\"d"
    ]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)
    if (isOutputRange!(W, string))
{
    bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit)
    {
        if (node is null)
            return false;

        // Ignore non-digit characters in phone number
        while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
            suffix = suffix[1 .. $];

        if (suffix.empty)
        {
            // Found a match, print result
            foreach (word; node.words)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(only(word))));
            }
            return !node.words.empty;
        }

        bool ret;
        foreach (word; node.words)
        {
            // Found a matching word, try to match the rest of the phone
            // number.
            if (impl(dict, suffix, path ~ word, true))
            {
                allowDigit = false;
                ret = true;
            }
        }

        if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false))
        {
            allowDigit = false;
            ret = true;
        }

        if (allowDigit)
        {
            // If we got here, it means that if we take the current node as the
            // next word choice, then the following digit will have no further
            // matches, and we may encode it as a single digit.
            auto nextSuffix = suffix[1 .. $];
            if (nextSuffix.empty)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(suffix[0 .. 1].only)));
                ret = true;
            }
            else
            {
                if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false))
                    ret = true;
            }
        }
        return ret;
    }

    impl(dict, phoneNumber[], [], true);
}

/**
 * Encode the given input range of phone numbers according to the given
 * dictionary, writing the output to the given sink.
 */
void encodePhoneNumbers(R,W)(R input, Trie dict, W sink)
    if (isInputRange!R & is(ElementType!R : const(char)[]) &&
        isOutputRange!(W, string))
{
    foreach (line; input)
    {
        findMatches(dict, line, sink);
    }
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto input = [
        "112",
        "5624-82",
        "4824",
        "0721/608-4067",
        "10/783--5",
        "1078-913-5",
        "381482",
        "04824",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (string match) { app.put(match); });

    //writefln("%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "04824: 0 Tor 4",
        "04824: 0 Torf",
        "04824: 0 fort",
        "10/783--5: je Bo\" da",
        "10/783--5: je bo\"s 5",
        "10/783--5: neu o\"d 5",
        "381482: so 1 Tor",
        "4824: Tor 4",
        "4824: Torf",
        "4824: fort",
        "5624-82: Mix Tor",
        "5624-82: mir Tor",
    ]);
}

/**
 * Program entry point.
 */
int main(string[] args)
{
    File input = stdin;
    bool countOnly;

    auto info = getopt(args,
        "c|count", "Count solutions only", &countOnly,
    );

    int showHelp()
    {
        stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]",
                        args[0]);
        defaultGetoptPrinter("", info.options);
        return 1;
    }

    if (info.helpWanted || args.length < 2)
        return showHelp();

    auto dictfile = args[1];
    if (args.length > 2)
        input = File(args[2]);

    Trie dict = loadDictionary(File(dictfile).byLine);

    if (countOnly)
    {
        size_t count;
        encodePhoneNumbers(input.byLine, dict, (string match) { count++; });
        writefln("Number of solutions: %d", count);
    }
    else
    {
        encodePhoneNumbers(input.byLine, dict, (string match) {
            writeln(match);
        });
    }

    return 0;
}
---------------------------------snip----------------------------------

The underlying idea is to store the dictionary entries in a trie (radix
tree) keyed by the prescribed digit mapping. Once this is done, encoding
a phone number is as simple as following each digit down the trie and
enumerating words along the path.  This means that once the dictionary
is encoded, we no longer need to do any digit mappings; the digits of
the input phone number is enough to determine the path down the trie.

Of course, pursuant to the original specifications of the problem, the
original word forms are stored as-is along each node corresponding with
the end of the word, so once a match is found the program simply
enumerates the words found at that node.

//

Unfortunately there seems to be some discrepancy between the output I
got and the prescribed output in your repository. For example, in your
output the number 1556/0 does not have an encoding, but isn't "1 Mai 0"
a valid encoding according to your dictionary and the original problem
description?


T

-- 
Engineering almost by definition should break guidelines. -- Ali �ehreli

Jan 16 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:
 Unfortunately there seems to be some discrepancy between the 
 output I got and the prescribed output in your repository. For 
 example, in your output the number 1556/0 does not have an 
 encoding, but isn't "1 Mai 0" a valid encoding according to 
 your dictionary and the original problem description?

You are not allowed to emit "1" as the first token in the output 
as long as there are any dictionary word matches at that 
position. The relevant paragraph from the problem statement:

==snip==
Encodings of phone numbers can consist of a single word or of 
multiple
words separated by spaces. The encodings are built word by word 
from
left to right. If and only if at a particular point no word at 
all from
the dictionary can be inserted, a single digit from the phone 
number can
be copied to the encoding instead. Two subsequent digits are never
allowed, though. To put it differently: In a partial encoding that
currently covers k digits, digit k+1 is encoded by itself if and 
only if,
first, digit k was not encoded by a digit and, second, there is 
no word
in the dictionary that can be used in the encoding starting at 
digit k+1.
==snip==

I also spent a bit of time trying to figure out this nuance when 
implementing my solution. It doesn't make much sense visually (no 
back-to-back digits in the output either way), but that's how it 
is.

Jan 16 2024

Renato <renato athaydes.com> writes:

On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka
wrote:
On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:
Unfortunately there seems to be some discrepancy between the
output I got and the prescribed output in your repository. For
example, in your output the number 1556/0 does not have an
encoding, but isn't "1 Mai 0" a valid encoding according to
your dictionary and the original problem description?

You are not allowed to emit "1" as the first token in the
output as long as there are any dictionary word matches at that
position. The relevant paragraph from the problem statement:

==snip==
Encodings of phone numbers can consist of a single word or of
multiple
words separated by spaces. The encodings are built word by word
from
left to right. If and only if at a particular point no word at
all from
the dictionary can be inserted, a single digit from the phone
number can
be copied to the encoding instead. Two subsequent digits are
never
allowed, though. To put it differently: In a partial encoding
that
currently covers k digits, digit k+1 is encoded by itself if
and only if,
first, digit k was not encoded by a digit and, second, there is
no word
in the dictionary that can be used in the encoding starting at
digit k+1.
==snip==

I also spent a bit of time trying to figure out this nuance
when implementing my solution. It doesn't make much sense
visually (no back-to-back digits in the output either way), but
that's how it is.

Exactly, this is one of the things that make this problem a bit
annoying to solve :)

"H. S. Teoh" you implemented the solution as a Trie!! Nice,
that's also what I did when I "participated" in the study. Here's
[my Trie solution in
Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java).

These are basically the two common approaches to the problem: a
Trie or a numeric-based table. According to the study, people who
use scripting languages almost always go with the numeric
approach, while people coming from lower level languages tend to
use a data structure like Trie (if they don't know Trie, they
come up with something similar which is fascinating), which is
harder to implement but more efficient in general.

Can I ask you why didn't you use the [D stdlib
Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not
sure that would've worked, but did you consider that?

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 16, 2024 at 06:54:56PM +0000, Renato via Digitalmars-d-learn wrote:
On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka wrote:

[...]
You are not allowed to emit "1" as the first token in the output as
long as there are any dictionary word matches at that position. The
relevant paragraph from the problem statement:

Ohhh now I get it. Initially I misunderstood that as saying that if the
rest of the phone number has at least one match, then a digit is not
allowed. Now I see that what it's actually saying is that even if some
random dictionary word matches at that position, even if it does not
lead to any full matches, then a digit is excluded.

[...]
I also spent a bit of time trying to figure out this nuance when
implementing my solution. It doesn't make much sense visually (no
back-to-back digits in the output either way), but that's how it is.

Exactly, this is one of the things that make this problem a bit
annoying to solve :)

It's a strange requirement, for sure, but I don't think it's annoying.
It makes the problem more Interesting(tm). ;-)

Anyway, I've fixed the problem, now my program produces the exact same
output as Renato's repo. Code is posted below. Interestingly enough, the
running time has now halved to about 0.9 seconds for 1 million phone
numbers. I guess that's caused by the more stringent requirement
excluding many more matching possibilities, effectively pruning away
large parts of the search tree.

"H. S. Teoh" you implemented the solution as a Trie!! Nice, that's
also what I did when I "participated" in the study. Here's [my Trie
solution in
Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java).

These are basically the two common approaches to the problem: a Trie
or a numeric-based table. According to the study, people who use
scripting languages almost always go with the numeric approach, while
people coming from lower level languages tend to use a data structure
like Trie (if they don't know Trie, they come up with something
similar which is fascinating), which is harder to implement but more
efficient in general.

Interesting. I guess my C/C++ background is showing. ;-)

I'm not sure what exactly motivated me to go this route; I guess it was
just my default preference of choosing the path of least work as far as
the algorithm is concerned: I chose the algorithm that did the least
amount of work needed to produce the right answer. Scanning through
sections of the dictionary to find a match was therefore excluded; so my
first thought was an AA. But then how much of the initial prefix to look
up an in AA? Since it can't be known beforehand, I'd have to gradually
lengthen the prefix to search for, which does a lot of repetitive work
(we keep looking up the first few digits repeatedly each time we search
for a longer prefix). Plus, multiple consecutive AA lookups is not
cache-friendly. So my next thought was, what should I do such that I
don't have to look at the initial digits anymore once I already
processed it? This line of thought naturally led to a trie structure.

Once I arrived at a trie structure, the next question was how exactly
dictionary entries would be stored in it. Again, in the vein of doing
the least amount of work I could get away with, I thought, if I stored
words in the trie directly, with each edge encoding a letter, then
during the search I'd have to repeatedly convert letters to the
corresponding phone number digit and vice versa. So why not do this
conversion beforehand, and store only phone digits in the trie? This
also had the additional benefit of letting me effectively search
multiple letters simultaneously, since multiple letters map to the same
digit, so scanning a digit is equivalent to searching multiple letters
at the same time. The output, of course, required the original form of
the words -- so the obvious solution was to attach the original words as
a list of words attached to the trie node representing the end of that
word.

Once this was all decided, the only remaining question was the search
algorithm. This turned out to take the most time in solving this
problem, due to the recursive nature of the search, I had to grapple
with where and how to make the recursive calls, and how to propagate
return values correctly. The initial implementation only found word
matches, and did not allow the single digits. Happily, the recursive
algorithm turned out to have enough structure to encode the single digit
requirements as well, although it took a bit of trial and error to find
the correct implementation.

Can I ask you why didn't you use the [D stdlib
Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not sure
that would've worked, but did you consider that?

Haha, I didn't even think of that. :-D I wouldn't have wanted to use it
anyway, because it was optimized for single codepoint lookups, and
wouldn't have matched what I wanted to do. Besides, I also understood it
as an internal helper structure used by std.uni, so it isn't really
designed as a trie implementation for general use.

Also, since performance was an issue in your OP, I wanted to avoid
off-the-bat any hidden costs that may be present in a pre-existing trie
implementation that I did not fully understand. Maybe I'm just affected
with NIH syndrome, but IME, when performance is important you usually
want to write a custom data structure for your inner loop, one that's
fine-tuned to your specific algorithm. Otherwise the impedance mismatch
may give you suboptimal results.

--
Knowledge is that area of ignorance that we arrange and classify. -- Ambrose
Bierce

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 Anyway, I've fixed the problem, now my program produces the exact same
 output as Renato's repo. Code is posted below.

[...]

Oops, forgot to actually paste the code. Here it is:

------------------------------------snip------------------------------------
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
    digitOf = [
        'E': 0,
        'J': 1, 'N': 1, 'Q': 1,
        'R': 2, 'W': 2, 'X': 2,
        'D': 3, 'S': 3, 'Y': 3,
        'F': 4, 'T': 4,
        'A': 5, 'M': 5,
        'C': 6, 'I': 6, 'V': 6,
        'B': 7, 'K': 7, 'U': 7,
        'L': 8, 'O': 8, 'P': 8,
        'G': 9, 'H': 9, 'Z': 9,
    ];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
    Trie[10] edges;
    string[] words;

    private void insert(string word, string suffix)
    {
        const(ubyte)* dig;
        while (!suffix.empty &&
               (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
        {
            suffix = suffix[1 .. $];
        }

        if (suffix.empty)
        {
            words ~= word;
            return;
        }

        auto node = new Trie;
        auto idx = *dig;
        if (edges[idx] is null)
        {
            edges[idx] = new Trie;
        }
        edges[idx].insert(word, suffix[1 .. $]);
    }

    /**
     * Insert a word into the Trie.
     *
     * Characters that don't map to any digit are ignored in building the Trie.
     * However, the original form of the word will be retained as-is in the
     * leaf node.
     */
    void insert(string word)
    {
        insert(word, word[]);
    }

    /**
     * Iterate over all words stored in this Trie.
     */
    void foreachEntry(void delegate(string path, string word) cb)
    {
        void impl(Trie node, string path = "")
        {
            if (node is null) return;
            foreach (word; node.words)
            {
                cb(path, word);
            }
            foreach (i, child; node.edges)
            {
                impl(child, path ~ cast(char)('0' + i));
            }
        }
        impl(this);
    }
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
    if (isInputRange!R & is(ElementType!R : const(char)[]))
{
    Trie result = new Trie;
    foreach (line; lines)
    {
        result.insert(line.idup);
    }
    return result;
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto app = appender!(string[]);
    dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
    assert(app.data == [
        "10: je",
        "105513: jemand",
        "107: neu",
        "1550: Name",
        "253302: Wasser",
        "35: da",
        "38: so",
        "400: Fee",
        "4021: fern",
        "4034: Fest",
        "482: Tor",
        "4824: fort",
        "4824: Torf",
        "51: an",
        "562: mir",
        "562: Mix",
        "56202: Mixer",
        "78: Bo\"",
        "783: bo\"s",
        "7857: blau",
        "7884: Boot",
        "824: Ort",
        "83: o\"d"
    ]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)
    if (isOutputRange!(W, string))
{
    bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit)
    {
        if (node is null)
            return false;

        // Ignore non-digit characters in phone number
        while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
            suffix = suffix[1 .. $];

        if (suffix.empty)
        {
            // Found a match, print result
            foreach (word; node.words)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(only(word))));
            }
            return !node.words.empty;
        }

        bool ret;
        foreach (word; node.words)
        {
            // Found a matching word, try to match the rest of the phone
            // number.
            ret = true;
            if (impl(dict, suffix, path ~ word, true))
                allowDigit = false;
        }

        if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false))
        {
            allowDigit = false;
            ret = true;
        }

        if (allowDigit)
        {
            // If we got here, it means that if we take the current node as the
            // next word choice, then the following digit will have no further
            // matches, and we may encode it as a single digit.
            auto nextSuffix = suffix[1 .. $];
            if (nextSuffix.empty)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(suffix[0 .. 1].only)));
                ret = true;
            }
            else
            {
                if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false))
                    ret = true;
            }
        }
        return ret;
    }

    // Trim trailing non-digits from phone number
    auto suffix = phoneNumber[];
    while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9'))
    {
        suffix = suffix[0 .. $-1];
    }

    impl(dict, suffix, [], true);
}

/**
 * Encode the given input range of phone numbers according to the given
 * dictionary, writing the output to the given sink.
 */
void encodePhoneNumbers(R,W)(R input, Trie dict, W sink)
    if (isInputRange!R & is(ElementType!R : const(char)[]) &&
        isOutputRange!(W, string))
{
    foreach (line; input)
    {
        findMatches(dict, line, sink);
    }
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto input = [
        "112",
        "5624-82",
        "4824",
        "0721/608-4067",
        "10/783--5",
        "1078-913-5",
        "381482",
        "04824",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (string match) { app.put(match); });

    //writefln("\n%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "04824: 0 Tor 4",
        "04824: 0 Torf",
        "04824: 0 fort",
        "10/783--5: je Bo\" da",
        "10/783--5: je bo\"s 5",
        "10/783--5: neu o\"d 5",
        "381482: so 1 Tor",
        "4824: Tor 4",
        "4824: Torf",
        "4824: fort",
        "5624-82: Mix Tor",
        "5624-82: mir Tor",
    ]);
}

unittest
{
    auto dict = loadDictionary(q"ENDDICT
Bias
ja
Mai
Reck
Weib
USA
ENDDICT".splitLines);

    auto input = [
        "/7-357653152/0677-",
        "/7-3576-",
        "/8-",
        "1556/0",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (string match) { app.put(match); });

    //writefln("\n%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "/7-357653152/0677-: USA Bias ja Reck 7",
        "/7-357653152/0677-: USA Bias ja Weib 7",
        "/8-: 8",

        /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15"
         * matches "ja", thus excluding a digit in that position. */
    ]);
}

/**
 * Program entry point.
 */
int main(string[] args)
{
    File input = stdin;
    bool countOnly;

    auto info = getopt(args,
        "c|count", "Count solutions only", &countOnly,
    );

    int showHelp()
    {
        stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]",
                        args[0]);
        defaultGetoptPrinter("", info.options);
        return 1;
    }

    if (info.helpWanted || args.length < 2)
        return showHelp();

    auto dictfile = args[1];
    if (args.length > 2)
        input = File(args[2]);

    Trie dict = loadDictionary(File(dictfile).byLine);

    if (countOnly)
    {
        size_t count;
        encodePhoneNumbers(input.byLine, dict, (string match) { count++; });
        writefln("Number of solutions: %d", count);
    }
    else
    {
        encodePhoneNumbers(input.byLine, dict, (string match) {
            writeln(match);
        });
    }

    return 0;
}
------------------------------------snip------------------------------------


T

-- 
The early bird gets the worm. Moral: ewww...

Jan 16 2024

Renato <renato athaydes.com> writes:

On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:
 On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 Anyway, I've fixed the problem, now my program produces the 
 exact same output as Renato's repo. Code is posted below.

 [...]

Great, I ran the benchmarks for you :)

I had to change how you accept arguments, even though you did 
"the right thing" using `getopt`, the solutions should just take 
a `count` or `print` argument first...

Anyway, here's your result:

```
===> ./rust
./rust,24133632,25
./rust,24739840,130
./rust,24477696,536
./rust,25247744,1064
./rust,8175616,6148
./rust,8306688,8315
===> src/d/dencoder
src/d/dencoder,46055424,43
src/d/dencoder,96337920,146
src/d/dencoder,102350848,542
src/d/dencoder,102268928,1032
src/d/dencoder,40206336,99936
^C
```

It took too long with the `count` option, so I had to abort 
before the last run ended... there's probably some bug there, 
otherwise the Trie runs very fast, as I had expected.

For the record (I already posted this on GitHub)... here's [my 
current fastest 
solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s
c/d/src/dencoder.d) time using the same algorithm as Rust (after I fixed my
solution that was using `int128`, which was not valid, as  ssvb pointed out):

```
Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23986176,24
./rust,24346624,118
./rust,24543232,525
./rust,24346624,1027
./rust,8011776,3830
./rust,8486912,18403
===> src/d/dencoder
src/d/dencoder,13615104,30
src/d/dencoder,24723456,374
src/d/dencoder,24592384,1750
src/d/dencoder,24788992,3472
src/d/dencoder,11517952,10235
src/d/dencoder,11517952,51211
```

Not great :(

But  ssvb came up with a really fast implementation, though 
totally different algorithm (so dont' take this as evidence of D 
being faster than Rust or anything like that):

```
Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23986176,24
./rust,24461312,135
./rust,24494080,514
./rust,24526848,981
./rust,8175616,3647
./rust,8011776,15404
===> src/d/dencoder
src/d/dencoder,16433152,30
src/d/dencoder,16613376,125
src/d/dencoder,16613376,468
src/d/dencoder,16613376,941
src/d/dencoder,5570560,12
src/d/dencoder,6701056,18
```

I can't explain why it's so incredibly fast, specially for the 
`count` case. I tried using the same hashing function on my 
solution, but that didn't really help me!

Pretty cool to see different solutions to the problem, but I'm 
still curious to know where the solution I wrote is being slow 
compared to Rust which is using identical, to my understanding, 
code! I'm sure people could write incredibly fast code to solve 
this problem, but what I am really curious about is what the code 
I wrote is doing wrong that causes it to run 4x slower than Rust 
despite doing "the same thing"...

Jan 16 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 I can't explain why it's so incredibly fast, specially for the 
 `count` case. I tried using the same hashing function on my 
 solution, but that didn't really help me!

That's dynamic programming with memoization. Basically caching 
the already calculated results (in the `dp` array) and avoiding 
recalculations when the recursive function revisits the same 
state again.

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 16, 2024 at 09:15:19PM +0000, Renato via Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 20:34:48 UTC, H. S. Teoh wrote:
 On Tue, Jan 16, 2024 at 12:28:49PM -0800, H. S. Teoh via
 Digitalmars-d-learn wrote: [...]
 Anyway, I've fixed the problem, now my program produces the exact
 same output as Renato's repo. Code is posted below.

 [...]
 

 
 Great, I ran the benchmarks for you :)
 
 I had to change how you accept arguments, even though you did "the
 right thing" using `getopt`, the solutions should just take a `count`
 or `print` argument first...

Oops, haha :-P


 Anyway, here's your result:
 
 ```
 ===> ./rust
 ./rust,24133632,25
 ./rust,24739840,130
 ./rust,24477696,536
 ./rust,25247744,1064
 ./rust,8175616,6148
 ./rust,8306688,8315
 ===> src/d/dencoder
 src/d/dencoder,46055424,43
 src/d/dencoder,96337920,146
 src/d/dencoder,102350848,542
 src/d/dencoder,102268928,1032
 src/d/dencoder,40206336,99936
 ^C
 ```
 
 It took too long with the `count` option, so I had to abort before the
 last run ended... there's probably some bug there, otherwise the Trie
 runs very fast, as I had expected.

[...]

Do you have the problematic data file handy?  I'd like to look into any
potential bugs.

Also, the profiler revealed that a lot of time was spent in the GC and
in small allocations.  The cause is in all likelihood the .format() call
for each found match, and array append being used for the recursive
calls. Getting rid of the .format ought to speed it up a bit. Will try
that now...


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. --
Christopher

Jan 16 2024

Renato <renato athaydes.com> writes:

On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
 used for the recursive calls. Getting rid of the .format ought 
 to speed it up a bit. Will try that now...

That will make no difference for the `count` option which is 
where your solution was very slow. To run the slow test manually 
use the `words_quarter.txt` dictionary (the phone numbers file 
doesn't matter much - it's all in the dictionary).

But pls run the benchmarks yourself as I am not going to keep 
running it for you, and would be nice if you posted your solution 
on a Gist for example, pasting lots of code in the forum makes it 
difficult to follow.

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 22:13:55 UTC, H. S. Teoh wrote:
 used for the recursive calls. Getting rid of the .format ought to
 speed it up a bit. Will try that now...
 

 
 That will make no difference for the `count` option which is where
 your solution was very slow.

Of course it will. Passing the data directly to the callback that bumps
a counter is faster than allocating a new string, formatting the data,
and then passing it to the callback that bumps a counter.  It may not
look like much, but avoiding unnecessary GC allocations means the GC
will have less work to do later when a collection is run, thus you save
time over the long term.


 To run the slow test manually use the `words_quarter.txt` dictionary
 (the phone numbers file doesn't matter much - it's all in the
 dictionary).
 
 But pls run the benchmarks yourself as I am not going to keep running
 it for you, and would be nice if you posted your solution on a Gist
 for example, pasting lots of code in the forum makes it difficult to
 follow.

I'll push the code to github.


T

-- 
"No, John.  I want formats that are actually useful, rather than over-featured
megaliths that address all questions by piling on ridiculous internal links in
forms which are hideously over-complex." -- Simon St. Laurent on xml-dev

Jan 17 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via Digitalmars-d-learn wrote:
[...]
 But pls run the benchmarks yourself as I am not going to keep running
 it for you, and would be nice if you posted your solution on a Gist
 for example, pasting lots of code in the forum makes it difficult to
 follow.

I can't. I spent half an hour trying to get ./benchmark.sh to run, but
no matter what it could not compile benchmark_runner. It complains that
my rustc is too old and some dependencies do not support it. I tried
running the suggested cargo update command to pin the versions but none
of them worked.  Since I'm not a Rust user, I'm not feeling particularly
motivated right now to spend any more time on this.  Upgrading my rustc
isn't really an option because that's the version currently in my distro
and I really don't feel like spending more time to install a custom
version of rustc just for this benchmark.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more
about less and less. Eventually, you know everything about nothing.

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 16:30:08 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:19:39AM +0000, Renato via 
 Digitalmars-d-learn wrote: [...]
 But pls run the benchmarks yourself as I am not going to keep 
 running it for you, and would be nice if you posted your 
 solution on a Gist for example, pasting lots of code in the 
 forum makes it difficult to follow.

 I can't. I spent half an hour trying to get ./benchmark.sh to 
 run, but no matter what it could not compile benchmark_runner. 
 It complains that my rustc is too old and some dependencies do 
 not support it. I tried running the suggested cargo update 
 command to pin the versions but none of them worked.  Since I'm 
 not a Rust user, I'm not feeling particularly motivated right 
 now to spend any more time on this.  Upgrading my rustc isn't 
 really an option because that's the version currently in my 
 distro and I really don't feel like spending more time to 
 install a custom version of rustc just for this benchmark.


 T

I've just updated the Rust version to the benchmark monitor could 
work on Linux (it only worked on Mac before) :D that's probably 
why your rustc didn't work, though as the project is still using 
edition2018 I would've thought even a very old compiler would 
have worked... anyway, if you ever find yourself actually using 
Rust, you should use `rustup` (https://rustup.rs/) which makes it 
trivial to update Rust.

About the "count" option: I had assumed you didn't call format on 
the count option as it's never needed, there's nothing to print.

Jan 17 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 I'll push the code to github.

[...]

Here: https://github.com/quickfur/prechelt/blob/master/encode_phone.d


T

-- 
Why do conspiracy theories always come from the same people??

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 I'll push the code to github.

 [...]

 Here: 
 https://github.com/quickfur/prechelt/blob/master/encode_phone.d


 T

Ok, last time I'm running this for someone else :D

```
Proc,Run,Memory(bytes),Time(ms)
===> ./rust
./rust,23920640,30
./rust,24018944,147
./rust,24068096,592
./rust,24150016,1187
./rust,7766016,4972
./rust,8011776,46101
===> src/d/dencoder
src/d/dencoder,44154880,42
src/d/dencoder,51347456,87
src/d/dencoder,51380224,273
src/d/dencoder,51462144,441
src/d/dencoder,18644992,4414
src/d/dencoder,18710528,43548
```

Congratulations on beating Rust :D but remember: you're using a 
much more efficient algorithm! I must conclude that the Rust 
translation of the Trie algorithm would be much faster still, 
unfortunately (you may have noticed that I am on D's side here!).

Jan 18 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 16:54:00 UTC, H. S. Teoh wrote:
 On Wed, Jan 17, 2024 at 07:57:02AM -0800, H. S. Teoh via 
 Digitalmars-d-learn wrote: [...]
 I'll push the code to github.

 [...]

 Here: 
 https://github.com/quickfur/prechelt/blob/master/encode_phone.d


 T

BTW here's you main function so it can run on the benchmark:

```d
int main(string[] args)
{
     bool countOnly = args.length > 1 ? (() {
         final switch (args[1])
         {
         case "count":
             return true;
         case "print":
             return false;
         }
     })() : false;

     auto dictfile = args.length > 2 ? args[2] : "tests/words.txt";
     auto input = args.length > 3 ? args[3] : "tests/numbers.txt";

     Trie dict = loadDictionary(File(dictfile).byLine);

     if (countOnly)
     {
         size_t count;
         encodePhoneNumbers(File(input).byLine, dict, (phone, 
match) { count++; });
         writefln("%d", count);
     }
     else
     {
         encodePhoneNumbers(File(input).byLine, dict, (phone, 
match) {
             writefln("%s: %s", phone, match);
         });
     }

     return 0;
}
```

Jan 18 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 For the record (I already posted this on GitHub)... here's [my 
 current fastest 
 solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s
c/d/src/dencoder.d) time using the same algorithm as Rust ...

[...]

 ... what I am really curious about is what the code I wrote is 
 doing wrong that causes it to run 4x slower than Rust despite 
 doing "the same thing"...

It's a GC allocations fest. Things like this make it slow:

```diff
      {
-        string digit = [digits[0]];
+        string digit = digits[0 .. 1];
          words.insertBack(digit);
```

And at the top is the associative array lookup (when profiling 
the handling of the "phones_1_000_000_with_empty.txt" input file):

```
     36.85%  dencoder  dencoder              [.] _aaInX
     12.38%  dencoder  dencoder              [.] void 
dencoder.printTranslations(immutable(char)[][][dencoder.Key], 
dencoder.ISolutionHandler, immutable(char)[], immutable(char)[], 
std.container.array.Array!(immutable(char)[]).Array)
      6.43%  dencoder  dencoder              [.] nothrow  nogc 
scope void core.internal.gc.impl.conservative.gc.Gcx.mark!(false, 
true, 
true).mark(core.internal.gc.impl.conservative.gc.Gcx.ScanRange!(false).ScanRange)
      4.53%  dencoder  dencoder              [.] pure  safe void 
std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult.popFront()
      4.08%  dencoder  dencoder              [.] 
_d_array_slice_copy
      2.43%  dencoder  dencoder              [.] _d_newarrayU
      2.21%  dencoder  dencoder              [.] shared nothrow 
 nogc  trusted void core.internal.spinlock.SpinLock.lock()
      2.00%  dencoder  dencoder              [.] pure  safe void 
std.array.Appender!(immutable(char)[]).Appender.put!(dchar).put(dchar)
      1.91%  dencoder  dencoder              [.] nothrow void* 
core.internal.gc.impl.conservative.gc.Gcx.smallAlloc(ulong, ref 
ulong, uint, const(TypeInfo))
      1.67%  dencoder  dencoder              [.] 
_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_
      1.66%  dencoder  dencoder              [.] pure nothrow 
 nogc scope  trusted ulong 
core.internal.gc.bits.GCBits.setLocked(ulong)
      1.60%  dencoder  dencoder              [.] const pure 
nothrow  trusted ulong object.TypeInfo_Struct.getHash(scope 
const(void*))
      1.53%  dencoder  dencoder              [.] nothrow  safe 
void 
core.internal.util.array.enforceRawArraysConformable(const(char[]),
const(ulong), const(void[]), const(void[]), const(bool))
      1.49%  dencoder  dencoder              [.] nothrow void* 
core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC
mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)),
core.internal.gc.impl.conservative.gc.mallocTime,
core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong,
const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo))
      1.27%  dencoder  dencoder              [.] pure nothrow 
 safe void 
std.array.Appender!(immutable(char)[]).Appender.ensureAddable(ulong)
      0.81%  dencoder  dencoder              [.] pure  property 
 safe dchar 
std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult.front()
      0.79%  dencoder  dencoder              [.] nothrow  safe 
void core.internal.util.array._enforceNoOverlap(const(char[]), 
ulong, ulong, const(ulong))
      0.74%  dencoder  dencoder              [.] nothrow void 
core.internal.gc.impl.conservative.gc.Pool.setBits(ulong, uint)
      0.73%  dencoder  dencoder              [.] pure  safe void 
std.array.Appender!(immutable(char)[]).Appender.put!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult).put(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult)
      0.70%  dencoder  dencoder              [.] pure  safe ulong 
std.range.primitives.walkLength!(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult).walkLength(std.algorithm.iteration.FilterResult!(dencoder.main(immutable(ch
r)[][]).__lambda12, immutable(char)[]).FilterResult)
      0.60%  dencoder  dencoder              [.] bool 
dencoder.lastItemIsDigit(std.container.array.Array!(immutable(char)[]).Array)
      0.54%  dencoder  dencoder              [.] _d_newarrayT
[...]
```

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 16, 2024 at 10:15:04PM +0000, Siarhei Siamashka via
Digitalmars-d-learn wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:

[...]
 ... what I am really curious about is what the code I wrote is doing
 wrong that causes it to run 4x slower than Rust despite doing "the
 same thing"...

 
 It's a GC allocations fest.

Indeed.

I have just completed 2 rounds of optimizations of my version of the
code, and both times the profiler also showed the problem to be
excessive allocations in the inner loop.  So, I did the following
optimizations:

1) Get rid of .format in the inner loop. Not only does .format cause a
lot of allocations, it is also a known performance hog.  So instead
of constructing the output string in the search function, I changed it
to take a delegate instead, and the delegate either counts the result or
prints it directly (bypassing the construction of an intermediate
string).  This improved performance quite a bit for the count-only runs,
but also wins some performance even when output is generated.  Overall,
however, this optimization only gave me some minor savings.

2) Changed the `path` parameter from string[] to string, since I didn't
really need it to be an array of strings anyway. This in itself only
improved performance marginally, barely noticeable, but it led to (3),
which gave a huge performance boost.

3) Basically, in the earlier version of the code, the `path` parameter
was appended to every time I recursed, and furthermore the same initial
segment gets appended to many times with different trailers as the
algorithm walks the trie. As a result, this triggers a lot of array
reallocations to store the new strings.  Most of these allocations are
unnecessary, because we already know that the initial segment of the
string will stay constant, only the tail end changes. Furthermore, we
only ever have a single instance of .path at any point in time in the
algorithm.  So we could use a single buffer to hold all of these
instances of .path, and simply return slices to it as we go along,
overwriting the tail end each time we need to append something.

This significantly cut down on the number of allocations, and along
with (1) and (2), performance improved by about 3x (!).  It didn't
completely remove all allocations, but I'm reasonably happy with the
performance now that I probably won't try to optimize it more unless
it's still losing out to another language. ;-)

(I'm especially curious to see if this beats the Rust version. :-P)

Optimized version of the code:

-----------------------------------snip------------------------------------
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
    digitOf = [
        'E': 0,
        'J': 1, 'N': 1, 'Q': 1,
        'R': 2, 'W': 2, 'X': 2,
        'D': 3, 'S': 3, 'Y': 3,
        'F': 4, 'T': 4,
        'A': 5, 'M': 5,
        'C': 6, 'I': 6, 'V': 6,
        'B': 7, 'K': 7, 'U': 7,
        'L': 8, 'O': 8, 'P': 8,
        'G': 9, 'H': 9, 'Z': 9,
    ];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
    Trie[10] edges;
    string[] words;

    private void insert(string word, string suffix)
    {
        const(ubyte)* dig;
        while (!suffix.empty &&
               (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
        {
            suffix = suffix[1 .. $];
        }

        if (suffix.empty)
        {
            words ~= word;
            return;
        }

        auto node = new Trie;
        auto idx = *dig;
        if (edges[idx] is null)
        {
            edges[idx] = new Trie;
        }
        edges[idx].insert(word, suffix[1 .. $]);
    }

    /**
     * Insert a word into the Trie.
     *
     * Characters that don't map to any digit are ignored in building the Trie.
     * However, the original form of the word will be retained as-is in the
     * leaf node.
     */
    void insert(string word)
    {
        insert(word, word[]);
    }

    /**
     * Iterate over all words stored in this Trie.
     */
    void foreachEntry(void delegate(string path, string word) cb)
    {
        void impl(Trie node, string path = "")
        {
            if (node is null) return;
            foreach (word; node.words)
            {
                cb(path, word);
            }
            foreach (i, child; node.edges)
            {
                impl(child, path ~ cast(char)('0' + i));
            }
        }
        impl(this);
    }
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
    if (isInputRange!R & is(ElementType!R : const(char)[]))
{
    Trie result = new Trie;
    foreach (line; lines)
    {
        result.insert(line.idup);
    }
    return result;
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto app = appender!(string[]);
    dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
    assert(app.data == [
        "10: je",
        "105513: jemand",
        "107: neu",
        "1550: Name",
        "253302: Wasser",
        "35: da",
        "38: so",
        "400: Fee",
        "4021: fern",
        "4034: Fest",
        "482: Tor",
        "4824: fort",
        "4824: Torf",
        "51: an",
        "562: mir",
        "562: Mix",
        "56202: Mixer",
        "78: Bo\"",
        "783: bo\"s",
        "7857: blau",
        "7884: Boot",
        "824: Ort",
        "83: o\"d"
    ]);
}

alias MatchCallback = void delegate(const(char)[] phone, const(char)[] match);

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(Trie dict, const(char)[] phoneNumber, MatchCallback cb)
{
    /*
     * Optimization: use a common buffer for constructing the output string,
     * instead of using string append, which allocates many new strings.
     *
     * The `path` parameter passed to .impl is always a slice of .buffer,
     * either its current instance or a previous instance left over from a
     * buffer reallocation. As such, its contents are always an initial segment
     * of whatever's in .buffer, so appending is just a matter of writing to
     * the tail end of the buffer and returning a slice of the new buffer.
     *
     * The fact that .path higher up the call tree may be pointing to old
     * versions of .buffer is not a problem; contentwise they are always an
     * initial segment of the current .buffer, so any subsequent appends will
     * copy the new word to the right place in the current .buffer and return a
     * slice to it, and the contents will always be consistent.
     */
    static char[] buffer;
    const(char)[] appendPath(const(char)[] path, const(char)[] word)
    {
        // Assumption: path is an initial segment of buffer (either its current
        // incarnation or a pre-reallocated initial copy). So we don't need to
        // copy the initial segment, just whatever needs to be appended.
        if (path.length == 0)
        {
            auto newlen = word.length;
            if (buffer.length < newlen)
                buffer.length = newlen;
            buffer[0 .. newlen] = word[];
            return buffer[0 .. newlen];
        }
        else
        {
            auto newlen = path.length + 1 + word.length;
            if (buffer.length < newlen)
                buffer.length = newlen;
            buffer[path.length] = ' ';
            buffer[path.length + 1 .. newlen] = word[];
            return buffer[0 .. newlen];
        }
    }

    bool impl(Trie node, const(char)[] suffix, const(char)[] path,
              bool allowDigit)
    {
        if (node is null)
            return false;

        // Ignore non-digit characters in phone number
        while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
            suffix = suffix[1 .. $];

        if (suffix.empty)
        {
            // Found a match, print result
            foreach (word; node.words)
            {
                cb(phoneNumber, appendPath(path, word));
            }
            return !node.words.empty;
        }

        bool ret;
        foreach (word; node.words)
        {
            // Found a matching word, try to match the rest of the phone
            // number.
            ret = true;
            if (impl(dict, suffix, appendPath(path, word), true))
                allowDigit = false;
        }

        if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false))
        {
            allowDigit = false;
            ret = true;
        }

        if (allowDigit)
        {
            // If we got here, it means that if we take the current node as the
            // next word choice, then the following digit will have no further
            // matches, and we may encode it as a single digit.
            auto nextSuffix = suffix[1 .. $];
            if (nextSuffix.empty)
            {
                cb(phoneNumber, appendPath(path, suffix[0 .. 1]));
                ret = true;
            }
            else
            {
                if (impl(dict, suffix[1 .. $],
                         appendPath(path, suffix[0 .. 1]), false))
                    ret = true;
            }
        }
        return ret;
    }

    // Trim trailing non-digits from phone number
    auto suffix = phoneNumber[];
    while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9'))
    {
        suffix = suffix[0 .. $-1];
    }

    impl(dict, suffix, buffer[0 .. 0], true);
}

/**
 * Encode the given input range of phone numbers according to the given
 * dictionary, writing the output to the given sink.
 */
void encodePhoneNumbers(R)(R input, Trie dict, MatchCallback cb)
    if (isInputRange!R & is(ElementType!R : const(char)[]))
{
    foreach (line; input)
    {
        findMatches(dict, line, cb);
    }
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto input = [
        "112",
        "5624-82",
        "4824",
        "0721/608-4067",
        "10/783--5",
        "1078-913-5",
        "381482",
        "04824",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (phone, match) {
        app.put(format("%s: %s", phone, match));
    });

    //writefln("\n%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "04824: 0 Tor 4",
        "04824: 0 Torf",
        "04824: 0 fort",
        "10/783--5: je Bo\" da",
        "10/783--5: je bo\"s 5",
        "10/783--5: neu o\"d 5",
        "381482: so 1 Tor",
        "4824: Tor 4",
        "4824: Torf",
        "4824: fort",
        "5624-82: Mix Tor",
        "5624-82: mir Tor",
    ]);
}

unittest
{
    auto dict = loadDictionary(q"ENDDICT
Bias
ja
Mai
Reck
Weib
USA
ENDDICT".splitLines);

    auto input = [
        "/7-357653152/0677-",
        "/7-3576-",
        "/8-",
        "1556/0",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (phone, match) {
        app.put(format("%s: %s", phone, match));
    });

    //writefln("\n%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "/7-357653152/0677-: USA Bias ja Reck 7",
        "/7-357653152/0677-: USA Bias ja Weib 7",
        "/8-: 8",

        /* Note: 1556/0 should NOT encode as "1 Mai 0" because the initial "15"
         * matches "ja", thus excluding a digit in that position. */
    ]);
}

/**
 * Program entry point.
 */
int main(string[] args)
{
    File input = stdin;
    bool countOnly;

    auto info = getopt(args,
        "c|count", "Count solutions only", &countOnly,
    );

    int showHelp()
    {
        stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]",
                        args[0]);
        defaultGetoptPrinter("", info.options);
        return 1;
    }

    if (info.helpWanted || args.length < 2)
        return showHelp();

    auto dictfile = args[1];
    if (args.length > 2)
        input = File(args[2]);

    Trie dict = loadDictionary(File(dictfile).byLine);

    if (countOnly)
    {
        size_t count;
        encodePhoneNumbers(input.byLine, dict, (phone, match) { count++; });
        writefln("Number of solutions: %d", count);
    }
    else
    {
        encodePhoneNumbers(input.byLine, dict, (phone, match) {
            writefln("%s: %s", phone, match);
        });
    }

    return 0;
}
-----------------------------------snip------------------------------------


T

-- 
People tell me that I'm paranoid, but they're just out to get me.

Jan 16 2024

Renato <renato athaydes.com> writes:

On Tuesday, 16 January 2024 at 23:43:16 UTC, H. S. Teoh wrote:
 (I'm especially curious to see if this beats the Rust version. 
 :-P)

That's just irrelevant when you're using a completely different 
algorithm.

My conclusion in the study was exactly that: much more important 
than the language is the algorithm. The language does matter, but 
much, much less.

If you want to check your performance, you know you can run the 
`./benchmark.sh` yourself?

Jan 16 2024

evilrat <evilrat666 gmail.com> writes:

On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:
 If you want to check your performance, you know you can run the 
 `./benchmark.sh` yourself?

Out of curiosity I've tried to manually run this on Windows and 
it seems that Java generator for these numbers files is "broken", 
the resulting count or print runs fine for both Java and D 
versions provided in your D branch, but fails with generated 
files.

D version complains about bad utf8 encoding.
I've opened the generated file in text editor and it is UTF-16 
(little-endian with BOM).

Tried with adoptium jdk 17 and 21 (former openjdk), but I guess 
it doesn't matter since UTF-16 is default on Windows.

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 09:15:02 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 07:11:02 UTC, Renato wrote:
 If you want to check your performance, you know you can run 
 the `./benchmark.sh` yourself?

 Out of curiosity I've tried to manually run this on Windows and 
 it seems that Java generator for these numbers files is 
 "broken", the resulting count or print runs fine for both Java 
 and D versions provided in your D branch, but fails with 
 generated files.

 D version complains about bad utf8 encoding.
 I've opened the generated file in text editor and it is UTF-16 
 (little-endian with BOM).

 Tried with adoptium jdk 17 and 21 (former openjdk), but I guess 
 it doesn't matter since UTF-16 is default on Windows.

It's not Java writing the file, it's the bash script 
[`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

```
java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
phones_1000.txt
```

Java is just printing to stdout. I wasn't aware that piping like 
this would use the OS default encoding. Unfortunately I don't 
have Windows here, but try to change the encoding used by the 
bash script maybe?

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```

Perhaps using this option when running Java will help:

```
java -DFile.Encoding=UTF-8 ...
```

Jan 17 2024

evilrat <evilrat666 gmail.com> writes:

On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:
 On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```

 Perhaps using this option when running Java will help:

 ```
 java -DFile.Encoding=UTF-8 ...
 ```

I've used powershell env var to set output to utf8, D version now 
works but java doesn't.

```
java -Xms20M -Xmx100M -cp build/java Main print words-quarter.txt 
phones_1000.txt
Exception in thread "main" 
java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of 
bounds for length 10
         at Trie.completeSolution(Main.java:216)
         at Trie.forEachSolution(Main.java:192)
         at PhoneNumberEncoder.encode(Main.java:132)
         at Main.lambda$main$1(Main.java:38)
         at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
         at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
         at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
         at 
java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
         at 
java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
         at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
         at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
         at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
         at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
         at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
         at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
         at Main.main(Main.java:38)
```

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 10:50:26 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 10:43:22 UTC, Renato wrote:
 On Wednesday, 17 January 2024 at 10:24:31 UTC, Renato wrote:
 It's not Java writing the file, it's the bash script 
 [`benchmark.sh`](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/master/benchmark.sh#L31):

 ```
 java -cp "build/util" util.GeneratePhoneNumbers 1000 > 
 phones_1000.txt
 ```

 Perhaps using this option when running Java will help:

 ```
 java -DFile.Encoding=UTF-8 ...
 ```

 I've used powershell env var to set output to utf8, D version 
 now works but java doesn't.

 ```
 java -Xms20M -Xmx100M -cp build/java Main print 
 words-quarter.txt phones_1000.txt
 Exception in thread "main" 
 java.lang.ArrayIndexOutOfBoundsException: Index 65485 out of 
 bounds for length 10
         at Trie.completeSolution(Main.java:216)
         at Trie.forEachSolution(Main.java:192)
         at PhoneNumberEncoder.encode(Main.java:132)
         at Main.lambda$main$1(Main.java:38)
         at 
 java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
         at 
 java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
         at 
 java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
         at 
 java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
         at 
 java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
         at 
 java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
         at 
 java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
         at 
 java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
         at 
 java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
         at 
 java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
         at 
 java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
         at Main.main(Main.java:38)
 ```

This is this line:

```
var digit = chars[ index ] - 48;
```

That means the input file is still not ASCII (or UTF-8) as it 
should. Java is reading files with the ASCII encoding so it 
should've worked fine.

Jan 17 2024

evilrat <evilrat666 gmail.com> writes:

On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:
 That means the input file is still not ASCII (or UTF-8) as it 
 should. Java is reading files with the ASCII encoding so it 
 should've worked fine.

It seems that it is only works with ASCII encoding though.

Jan 17 2024

Renato <renato athaydes.com> writes:

On Wednesday, 17 January 2024 at 11:56:19 UTC, evilrat wrote:
 On Wednesday, 17 January 2024 at 11:20:14 UTC, Renato wrote:
 That means the input file is still not ASCII (or UTF-8) as it 
 should. Java is reading files with the ASCII encoding so it 
 should've worked fine.

 It seems that it is only works with ASCII encoding though.

Yes, that's according to the rules - only ASCII for everything.

Jan 17 2024

Renato <renato athaydes.com> writes:

On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka 
wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:
 For the record (I already posted this on GitHub)... here's [my 
 current fastest 
 solution](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/dlang-key-hash-incremental/s
c/d/src/dencoder.d) time using the same algorithm as Rust ...

 [...]

 ... what I am really curious about is what the code I wrote is 
 doing wrong that causes it to run 4x slower than Rust despite 
 doing "the same thing"...

 It's a GC allocations fest. Things like this make it slow:

 ```diff
      {
 -        string digit = [digits[0]];
 +        string digit = digits[0 .. 1];
          words.insertBack(digit);
 ```

I was under the impression that `[digits[0]]` would just use a 
stack allocation??

The profiler does not show any GC anymore, are you sure it's a 
"GC allocations fest"???


 And at the top is the associative array lookup (when profiling 
 the handling of the "phones_1_000_000_with_empty.txt" input 
 file):

 ```
     36.85%  dencoder  dencoder              [.] _aaInX
     12.38%  dencoder  dencoder              [.] void


Well, I know, but I did everything to make the hash "cheap" to 
compute so I still don't see a way to improve it.

Jan 16 2024

evilrat <evilrat666 gmail.com> writes:

On Wednesday, 17 January 2024 at 07:06:25 UTC, Renato wrote:
 On Tuesday, 16 January 2024 at 22:15:04 UTC, Siarhei Siamashka 
 wrote:
 On Tuesday, 16 January 2024 at 21:15:19 UTC, Renato wrote:

 It's a GC allocations fest. Things like this make it slow:

 ```diff
      {
 -        string digit = [digits[0]];
 +        string digit = digits[0 .. 1];
          words.insertBack(digit);
 ```

 I was under the impression that `[digits[0]]` would just use a 
 stack allocation??

 The profiler does not show any GC anymore, are you sure it's a 
 "GC allocations fest"???

nah, you are allocating new array out of single digit while the 
latter is just takes a slice.

there is 'scope' storage specifier for when you know your 
variable won't escape the scope to place it on stack (works for 
classes too), but I'm not sure if it will work with array.

`scope string digit = [digits[0]];`

Jan 17 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Tue, Jan 16, 2024 at 07:50:35AM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 Unfortunately there seems to be some discrepancy between the output I
 got and the prescribed output in your repository. For example, in your
 output the number 1556/0 does not have an encoding, but isn't "1 Mai 0"
 a valid encoding according to your dictionary and the original problem
 description?

[...]

Also, found a bug in my program that misses some solutions when the
phone number has trailing non-digits. Here's the updated code.  It still
finds extra encodings from the output in your repo, though.  Maybe I
misunderstood part of the requirements?


------------------------------snip--------------------------------------
/**
 * Encoding phone numbers according to a dictionary.
 */
import std;

/**
 * Table of digit mappings.
 */
static immutable ubyte[dchar] digitOf;
shared static this()
{
    digitOf = [
        'E': 0,
        'J': 1, 'N': 1, 'Q': 1,
        'R': 2, 'W': 2, 'X': 2,
        'D': 3, 'S': 3, 'Y': 3,
        'F': 4, 'T': 4,
        'A': 5, 'M': 5,
        'C': 6, 'I': 6, 'V': 6,
        'B': 7, 'K': 7, 'U': 7,
        'L': 8, 'O': 8, 'P': 8,
        'G': 9, 'H': 9, 'Z': 9,
    ];
}

/**
 * Trie for storing dictionary words according to the phone number mapping.
 */
class Trie
{
    Trie[10] edges;
    string[] words;

    private void insert(string word, string suffix)
    {
        const(ubyte)* dig;
        while (!suffix.empty &&
               (dig = std.ascii.toUpper(suffix[0]) in digitOf) is null)
        {
            suffix = suffix[1 .. $];
        }

        if (suffix.empty)
        {
            words ~= word;
            return;
        }

        auto node = new Trie;
        auto idx = *dig;
        if (edges[idx] is null)
        {
            edges[idx] = new Trie;
        }
        edges[idx].insert(word, suffix[1 .. $]);
    }

    /**
     * Insert a word into the Trie.
     *
     * Characters that don't map to any digit are ignored in building the Trie.
     * However, the original form of the word will be retained as-is in the
     * leaf node.
     */
    void insert(string word)
    {
        insert(word, word[]);
    }

    /**
     * Iterate over all words stored in this Trie.
     */
    void foreachEntry(void delegate(string path, string word) cb)
    {
        void impl(Trie node, string path = "")
        {
            if (node is null) return;
            foreach (word; node.words)
            {
                cb(path, word);
            }
            foreach (i, child; node.edges)
            {
                impl(child, path ~ cast(char)('0' + i));
            }
        }
        impl(this);
    }
}

/**
 * Loads the given dictionary into a Trie.
 */
Trie loadDictionary(R)(R lines)
    if (isInputRange!R & is(ElementType!R : const(char)[]))
{
    Trie result = new Trie;
    foreach (line; lines)
    {
        result.insert(line.idup);
    }
    return result;
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto app = appender!(string[]);
    dict.foreachEntry((path, word) { app ~= format("%s: %s", path, word); });
    assert(app.data == [
        "10: je",
        "105513: jemand",
        "107: neu",
        "1550: Name",
        "253302: Wasser",
        "35: da",
        "38: so",
        "400: Fee",
        "4021: fern",
        "4034: Fest",
        "482: Tor",
        "4824: fort",
        "4824: Torf",
        "51: an",
        "562: mir",
        "562: Mix",
        "56202: Mixer",
        "78: Bo\"",
        "783: bo\"s",
        "7857: blau",
        "7884: Boot",
        "824: Ort",
        "83: o\"d"
    ]);
}

/**
 * Find all encodings of the given phoneNumber according to the given
 * dictionary, and write each encoding to the given sink.
 */
void findMatches(W)(Trie dict, const(char)[] phoneNumber, W sink)
    if (isOutputRange!(W, string))
{
    bool impl(Trie node, const(char)[] suffix, string[] path, bool allowDigit)
    {
        if (node is null)
            return false;

        // Ignore non-digit characters in phone number
        while (!suffix.empty && (suffix[0] < '0' || suffix[0] > '9'))
            suffix = suffix[1 .. $];

        if (suffix.empty)
        {
            // Found a match, print result
            foreach (word; node.words)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(only(word))));
            }
            return !node.words.empty;
        }

        bool ret;
        foreach (word; node.words)
        {
            // Found a matching word, try to match the rest of the phone
            // number.
            if (impl(dict, suffix, path ~ word, true))
            {
                allowDigit = false;
                ret = true;
            }
        }

        if (impl(node.edges[suffix[0] - '0'], suffix[1 .. $], path, false))
        {
            allowDigit = false;
            ret = true;
        }

        if (allowDigit)
        {
            // If we got here, it means that if we take the current node as the
            // next word choice, then the following digit will have no further
            // matches, and we may encode it as a single digit.
            auto nextSuffix = suffix[1 .. $];
            if (nextSuffix.empty)
            {
                put(sink, format("%s: %-(%s %)", phoneNumber,
                                 path.chain(suffix[0 .. 1].only)));
                ret = true;
            }
            else
            {
                if (impl(dict, suffix[1 .. $], path ~ [ suffix[0] ], false))
                    ret = true;
            }
        }
        return ret;
    }

    // Trim trailing non-digits from phone number
    auto suffix = phoneNumber[];
    while (!suffix.empty && (suffix[$-1] < '0' || suffix[$-1] > '9'))
    {
        suffix = suffix[0 .. $-1];
    }

    impl(dict, suffix, [], true);
}

/**
 * Encode the given input range of phone numbers according to the given
 * dictionary, writing the output to the given sink.
 */
void encodePhoneNumbers(R,W)(R input, Trie dict, W sink)
    if (isInputRange!R & is(ElementType!R : const(char)[]) &&
        isOutputRange!(W, string))
{
    foreach (line; input)
    {
        findMatches(dict, line, sink);
    }
}

///
unittest
{
    auto dict = loadDictionary(q"ENDDICT
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
ENDDICT".splitLines);

    auto input = [
        "112",
        "5624-82",
        "4824",
        "0721/608-4067",
        "10/783--5",
        "1078-913-5",
        "381482",
        "04824",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (string match) { app.put(match); });

    //writefln("%-(%s\n%)", app.data);
    assert(app.data.sort.release == [
        "04824: 0 Tor 4",
        "04824: 0 Torf",
        "04824: 0 fort",
        "10/783--5: je Bo\" da",
        "10/783--5: je bo\"s 5",
        "10/783--5: neu o\"d 5",
        "381482: so 1 Tor",
        "4824: Tor 4",
        "4824: Torf",
        "4824: fort",
        "5624-82: Mix Tor",
        "5624-82: mir Tor",
    ]);
}

unittest
{
    auto dict = loadDictionary(q"ENDDICT
Bias
ja
Reck
Weib
USA
ENDDICT".splitLines);

    auto input = [
        "/7-357653152/0677-",
        "/8-",
    ];

    auto app = appender!(string[]);
    encodePhoneNumbers(input, dict, (string match) { app.put(match); });

    assert(app.data.sort.release == [
        "/7-357653152/0677-: USA Bias ja Reck 7",
        "/7-357653152/0677-: USA Bias ja Weib 7",
        "/8-: 8",
    ]);
}

/**
 * Program entry point.
 */
int main(string[] args)
{
    File input = stdin;
    bool countOnly;

    auto info = getopt(args,
        "c|count", "Count solutions only", &countOnly,
    );

    int showHelp()
    {
        stderr.writefln("Usage: %s [options] <dictfile> [<inputfile>]",
                        args[0]);
        defaultGetoptPrinter("", info.options);
        return 1;
    }

    if (info.helpWanted || args.length < 2)
        return showHelp();

    auto dictfile = args[1];
    if (args.length > 2)
        input = File(args[2]);

    Trie dict = loadDictionary(File(dictfile).byLine);

    if (countOnly)
    {
        size_t count;
        encodePhoneNumbers(input.byLine, dict, (string match) { count++; });
        writefln("Number of solutions: %d", count);
    }
    else
    {
        encodePhoneNumbers(input.byLine, dict, (string match) {
            writeln(match);
        });
    }

    return 0;
}
------------------------------snip--------------------------------------


T

-- 
"You are a very disagreeable person." "NO."

Jan 16 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

P.S. Compiling my program with `ldc -O2`, it runs so fast that I
couldn't measure any meaningful running time that's greater than startup
overhead.  So I wrote a helper program to generate random phone numbers
up to 50 characters long, and found that it could encode 1 million phone
numbers in 2.2 seconds (using the 75,000 entry dictionary from your
repository).  Counting vs. printing the results made no significant
difference to this.


T

-- 
People tell me that I'm skeptical, but I don't believe them.

Jan 16 2024

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Help optimize D solution to phone encoding problem: extremely slow