www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Template wizardry and its cost

reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
Two years ago 
[[1]](https://forum.dlang.org/post/mailman.2526.1585832475.31109.digitalma
s-d puremagic.com), [[2]](https://forum.dlang.org/post/mailman.4770.1596218284.31109.digitalmars-d-anno
nce puremagic.com), H. S. Teoh presented an ingenious proof of concept for a
gettext-like system that automatically extracts translatable strings for i18n
purposes:

```d
	class Language { ... }
	Language curLang = ...;

	version(extractStrings) {
		private int[string] translatableStrings;
		string[] getTranslatableStrings() {
			return translatableStrings.keys;
		}
	}

	string gettext(string str)() {
		version(extractStrings) {
			static struct StrInjector {
				static this() {
					translatableStrings[str]++;
				}
			}
		}
		return curLang.translate(str);
	}

	...
	auto myFunc() {
		...
		writeln(gettext!"Some translatable message");
		...
	}
```

 The gettext function uses a static struct to inject a static 
 ctor into the program that inserts all translatable strings 
 into a global AA. Then, when compiled with 
 -version=extractStrings, this will expose the function 
 getTranslatableStrings that returns a list of all translatable 
 strings.  Voila! No need for a separate utility to parse source 
 code to discover translatable strings; this does it for you 
 automatically. :-)

 It could be made more fancy, of course, like having a function 
 that parses the current l10n files and doing a diff between 
 strings that got deleted / added / changed, and generating a 
 report to inform the translator which strings need to be 
 updated.  This is guaranteed to be 100% reliable since the 
 extracted strings are obtained directly from actual calls to 
 gettext, rather than a 3rd party parser that may choke over 
 uncommon syntax / unexpected formatting.

 D is just *this* awesome.
I find this marvellous, and played with it over the weekend. It works brilliantly. I got it to integrate neatly in Dub projects, made it compatible with GNU gettext so that existing translation services and editors can be used ([Poedit](https://poedit.net/) is awesome, thank you [mofile](https://code.dlang.org/packages/mofile)). But I don't think I'll go through with it this way. My problem is the template instantiation for every individual translatable string literal. I'd like to think the consequences are insignificant, but in large code bases I fear they won't be. And the issue is that only for `version(extractStrings)` the string needs to be a template argument, otherwise you'd want it to be an ordinary function argument. Maybe this is possible to achieve with string mixins, but probably not without getting much more verbose. I am ready to be amazed with more wizardry, or to be convinced not to worry because inlining or something (it doesn't inline). Until then, I am thinking an external tool based on `libdparse` or dmd-as-a-library is probably the better approach; however awesome D is :-) -- Bastiaan.
Jun 20 2022
next sibling parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Monday, 20 June 2022 at 08:37:13 UTC, Bastiaan Veelo wrote:
 It works brilliantly.
 [...]
 but in large code bases I fear they won't be.
*fear* so what you're saying is you have no evidence there is an actual problem here, but have literally fallen prey to FUD. In theory, generating hundreds of thousands of functions can indeed be a problem, even if they are small (though the biggest problems in dmd come when an individual function is large moreso than many small functions), but how may unique user-visible strings do you have, even in a large project?
Jun 20 2022
parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Monday, 20 June 2022 at 11:16:29 UTC, Adam D Ruppe wrote:
 how may unique user-visible strings do you have, even in a 
 large project?
In total, we currently have 18997 individual translated strings. These are spread over 45 executables (unevenly). -- Bastiaan.
Jun 20 2022
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Monday, 20 June 2022 at 13:52:12 UTC, Bastiaan Veelo wrote:
 In total, we currently have 18997 individual translated 
 strings. These are spread over 45 executables (unevenly).
That's nothing. Consider this test: static foreach(i; 0 .. 20000) mixin("string a", i, " = gettext!(i.stringof);"); void main() {} $ /usr/bin/time dmd templatespam.d 0.32user 0.07system 0:00.40elapsed 98%CPU (0avgtext+0avgdata 135648maxresident)k 0inputs+1640outputs (0major+40899minor)pagefaults 0swaps About 3x the memory and such of a basic hello world but as you can see, 0.3s and 135 MB ram is nothing to worry about. What about 200,000 strings? $ /usr/bin/time dmd templatespam.d Command terminated by signal 11 1.93user 0.40system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 1415580maxresident)k 0inputs+0outputs (0major+353926minor)pagefaults 0swaps Now it is adding up, 2s and 1.4 GB build time/ram. Which is still inside the realm of acceptable cost, and it seems unlikely that you'd ever have 200,000 user visible strings in a single build unit anyway, well more than 10x what you have in your actual application right now. Please note that adding -version=extractStrings has no significant impact on these numbers. And this is with zero effort to optimize it.
Jun 20 2022
parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Monday, 20 June 2022 at 14:08:30 UTC, Adam D Ruppe wrote:
 About 3x the memory and such of a basic hello world but as you 
 can see, 0.3s and 135 MB ram is nothing to worry about.
Thanks for bringing my heart rate down :-) I was looking at all the generated functions in the assembly, and Stefan's trick eliminates those. -- Bastiaan.
Jun 20 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/20/22 4:37 AM, Bastiaan Veelo wrote:

 But I don't think I'll go through with it this way. My problem is the 
 template instantiation for every individual translatable string literal. 
 I'd like to think the consequences are insignificant, but in large code 
 bases I fear they won't be. And the issue is that only for 
 `version(extractStrings)` the string needs to be a template argument, 
 otherwise you'd want it to be an ordinary function argument. Maybe this 
 is possible to achieve with string mixins, but probably not without 
 getting much more verbose.
 
 I am ready to be amazed with more wizardry, or to be convinced not to 
 worry because inlining or something (it doesn't inline). Until then, I 
 am thinking an external tool based on `libdparse` or dmd-as-a-library is 
 probably the better approach; however awesome D is :-)
Let me dust off my wand ;) ```d struct TranslatedString { private string _str; string get() { return curLang.translate(_str); } alias get this; } template gettext(string str) { version(extractStrings) { shared static this() { ++translatableStrings.require(str); // use require here, even though the ++ works without it. } } enum gettext = TranslatedString(str); } ``` What does this do? It *still* generates the template, but the key difference is that the `TranslatedString` type is not a template. An enum only exists in the compiler, it's as if you pasted the resulting code at the call site. So it should not take up any space, maybe 2 words for the string reference. But only one `TypeInfo` (if that's even needed, I'm not sure), and a minor CTFE-call for the construction. It *will* take up space in the symbol table, but that goes away once compilation is done. But in general, one should not be afraid of writing templates in D. I think there may be some room for improvement for performance with compiler hints, or improvements without them. -Steve
Jun 20 2022
next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Monday, 20 June 2022 at 12:59:28 UTC, Steven Schveighoffer 
wrote:
 Let me dust off my wand ;)

 ```d
 struct TranslatedString {
     private string _str;
     string get() {
         return curLang.translate(_str);
     }
     alias get this;
 }
 template gettext(string str) {
     version(extractStrings) {
         shared static this() {
             ++translatableStrings.require(str); // use require 
 here, even though the ++ works without it.
         }
     }
     enum gettext = TranslatedString(str);
 }
 ```
Wow. Man, this is interesting. My hat is off.
 What does this do? It *still* generates the template, but the 
 key difference is that the `TranslatedString` type is not a 
 template. An enum only exists in the compiler, it's as if you 
 pasted the resulting code at the call site. So it should not 
 take up any space, maybe 2 words for the string reference. But 
 only one `TypeInfo` (if that's even needed, I'm not sure), and 
 a minor CTFE-call for the construction.

 It *will* take up space in the symbol table, but that goes away 
 once compilation is done.
You have put a big grin on my face, I like your potion! -- Bastiaan.
Jun 20 2022
prev sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
As you and Adam pointed out, this may not be worth the trouble; 
But just to see if I can, I tried to extend your trick to a 
function taking an argument. I didn't find a way without using a 
delegate, and that gives a deprecation warning for escaping a 
reference. Can it be fixed? The code below leaves out the string 
extraction version, but is otherwise complete and can be pasted 
into [run.dlang.io](https://run.dlang.io/).

-- Bastiaan.

```d
--- app.d
import gettext;
import std.stdio;

void main()
{
     foreach (n; 0 .. 3)
         writeln(tr!("one goose.", "%d geese.")(n));
}

--- gettext.d
import std;

private  safe struct TranslatableString
{
     string _str;
     string get()
     {
         return gettext(_str);
     }
     alias get this;
}
private  safe struct TranslatableStringPlural
{
     string _str, _strpl;
     string callFormat(int n)
     {
         auto fmt = ngettext(_str, _strpl, n);
         if (countFormatSpecifiers(fmt) == 0)
             // Hack to prevent orphan format arguments if "%d" is 
replaced by "one" in the singular form:
             return () trusted{ return 
fromStringz(&(format(fmt~"\0%s", n)[0])); }();
         return format(fmt, n);
     }
     string delegate(int) get()
     {
         return &callFormat;
     }
     alias get this;
}
template tr(string singular, string plural = null)
{
     static if (plural == null)
         enum tr = TranslatableString(singular);
     else
         enum tr = TranslatableStringPlural(singular, plural);
}

 safe: private:
int countFormatSpecifiers(string fmt) pure
{
     int count = 0;
     auto f = FormatSpec!char(fmt);
     if (!__ctfe)
     {
         while (f.writeUpToNextSpec(nullSink))
             count++;
     } else {
         auto a = appender!string; // std.range.nullSink does not 
work at CT.
         while (f.writeUpToNextSpec(a))
             count++;
     }
     return count;
}
// Translation happens here:
string gettext(string str)
{
     return str;
}
string ngettext(string singular, string plural, int n)
{
     return n == 1 ? singular : plural;
}
```
Jun 20 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/20/22 7:01 PM, Bastiaan Veelo wrote:
 As you and Adam pointed out, this may not be worth the trouble; But just 
 to see if I can, I tried to extend your trick to a function taking an 
 argument. I didn't find a way without using a delegate, and that gives a 
 deprecation warning for escaping a reference. Can it be fixed? The code 
 below leaves out the string extraction version, but is otherwise 
 complete and can be pasted into [run.dlang.io](https://run.dlang.io/).
structs can be functors: ```d private safe struct TranslatableStringPlural { string _str, _strpl; this(string s1, string s2) { // this is unfortunately necessary _str = s1; _strpl = s2; } string opCall(int n) { auto fmt = ngettext(_str, _strpl, n); if (countFormatSpecifiers(fmt) == 0) // Hack to prevent orphan format arguments if "%d" is replaced by "one" in the singular form: return () trusted{ return fromStringz(&(format(fmt~"\0%s", n)[0])); }(); return format(fmt, n); } } ``` I will say, I find this line... reprehensible ;) ```d return () trusted{ return fromStringz(&(format(fmt~"\0%s", n)[0])); }(); ``` Actually, the whole function (and the format specifier counter, etc) is very obtuse. How about: ```d private safe struct Strplusarg { this(string s) { fmt = s; auto fs = countFormatSpecifiers(fmt); assert(fs == 0 || fs == 1, "Invalid number of specifiers"); // bonus sanity check hasArg = fs == 1; } string fmt; bool hasArg; } private safe struct TranslatableStringPlural { Strplusarg _str, _strpl; this(string s1, string s2) { // this is unfortunately necessary _str = s1; _strpl = s2; } string opCall(int n) { auto f = n == 1 ? _str : _strpl; return f.hasArg ? format(f.fmt, n) : f.fmt; } } ``` And we can fix your countFormatSpecifiers function so it doesn't have the __ctfe branch ```d safe: private: int countFormatSpecifiers(string fmt) pure { static void ns(const(char)[] arr) {} // the simplest output range auto nullSink = &ns; int count = 0; auto f = FormatSpec!char(fmt); while (f.writeUpToNextSpec(nullSink)) count++; return count; } ``` But.... I don't see any actual translation happening in the plural/singular form? Is that expected? If it's supposed to happen in ngettext, that translation surely has to be done inside the opCall, and if you can vary the parameter count based on language, then so does the count for the argument specifiers. In any case, lots to ingest and figure out how it fits your needs. -Steve
Jun 20 2022
parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Tuesday, 21 June 2022 at 02:35:28 UTC, Steven Schveighoffer 
wrote:
 structs can be functors:
That was one of the things I tried, but I missed this bit:
 ```d
     this(string s1, string s2) { // this is unfortunately 
 necessary
         _str = s1;
         _strpl = s2;
     }
 ```
[...]
 I will say, I find this line... reprehensible ;)
I left my dirty laundry out in the hopes it would trigger you ;-) Thanks for the rinse, it looks great!
 But.... I don't see any actual translation happening in the 
 plural/singular form?
That's because the counting needs to be done on the translated string. Fixed below. Thanks Steve, looks like I'll be releasing my first Dub package soonish. -- Bastiaan. ```d --- app.d import gettext; import std.stdio; void main() { foreach (n; 0 .. 3) writeln(tr!("one goose.", "%d geese.")(n)); } --- gettext.d import std; private safe struct TranslatableString { string _str; string get() { return gettext(_str); } alias get this; } private safe struct Strplusarg { this(string s) { fmt = s; auto fs = countFormatSpecifiers(fmt); assert(fs == 0 || fs == 1, "Invalid number of specifiers"); // bonus sanity check hasArg = fs == 1; } string fmt; bool hasArg; } private safe struct TranslatableStringPlural { string _str, _strpl; this(string s1, string s2) { // this is unfortunately necessary _str = s1; _strpl = s2; } string opCall(int n) { auto f = Strplusarg(ngettext(_str, _strpl, n)); return f.hasArg ? format(f.fmt, n) : f.fmt; } } template tr(string singular, string plural = null) { static if (plural == null) enum tr = TranslatableString(singular); else enum tr = TranslatableStringPlural(singular, plural); } safe: private: int countFormatSpecifiers(string fmt) pure { static void ns(const(char)[] arr) {} // the simplest output range auto nullSink = &ns; int count = 0; auto f = FormatSpec!char(fmt); while (f.writeUpToNextSpec(nullSink)) count++; return count; } // Translation happens here: string gettext(string str) { return str; } string ngettext(string singular, string plural, int n) { return n == 1 ? singular : plural; } ``` 0 geese. one goose. 2 geese.
Jun 21 2022
parent reply WebFreak001 <d.forum webfreak.org> writes:
On Tuesday, 21 June 2022 at 09:06:40 UTC, Bastiaan Veelo wrote:
 [...]

 0 geese.
 one goose.
 2 geese.
side note: plurals don't work this way in most other languages :) Mozilla had a custom pluralization thing in their translation framework once which I ported: https://github.com/Pure-D/serve-d/blob/c93530e5b235378a6c465205fd6dade0521a2804/source/served/utils/translate.d#L248 now it seems they moved away from that though and use the Unicode CLDR Project.
Jun 21 2022
parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Tuesday, 21 June 2022 at 12:34:24 UTC, WebFreak001 wrote:
 On Tuesday, 21 June 2022 at 09:06:40 UTC, Bastiaan Veelo wrote:
 [...]

 0 geese.
 one goose.
 2 geese.
side note: plurals don't work this way in most other languages :)
I am using the [GNU gettext format](https://www.gnu.org/software/gettext/manual/html_node/Translatin -plural-forms.html) for translation tables, which supports [formulae for the various language categories](https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html). We should be good in this department. Example: Я считаю 1 яблоко. Я считаю 3 яблока. Я считаю 5 яблок. Я считаю 7 яблок. -- Bastiaan.
Jun 21 2022