www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Suggestion for a D project/std module: templated format

reply Miles <_______ _______.____> writes:
Since I started programming in C, I've found printf() to be wrong. The
format string, which is usually static, is parsed at run-time. GCC uses
some hints to issue warnings when you misuse printf() arguments, but
this is just to aid the inability to do this check at run-time. D have
similar issues, but at least the varargs implementation of D allows good
type-checking and argument counting at run-time.

Now, with the new mixin features of D, I think that it would be possible
to make a templated format function, that parses and does all
type-checking/argument counting at compile-time, and generates the most
efficient, perhaps inline, string formatter.

	char[] format(char[] F)(...) { ... }

User code would call it like, for example:

	format("There where %d %s, totalizing $%8.2f.")(n, plural(n, "item",
"items"), total);

It should be equivalent to calling a function exactly like:

	char[] format_XXX(int a, char[] b, real c) {
		return "There where " ~ toString(a) ~ " " ~ b
			~ ", tootalizing $" ~ ftoa(c, 8, 2) ~ ".";
	}

This means that the arguments are checked at compile-time, no string
parsing is done at run-time, and efficient "inlineable" code is
generated. Of course, l10n-aware programs would have to keep using
standard format functions, since the format strings are loaded at
run-time according to user settings.

I would love to do this, but I have severe time restrictions for the
next months while I work on an unrelated project. So I'm leaving the
idea here if someone is looking for something fun/useful to do with D,
or even a suggestion for Walter or the tango guys to add such function
template to std lib/tango.

I foresee some possible issues with argument checking, some implications
with generating function parameters from a template string argument.
Perhaps this should be possible:

	char[] gen_format_decl(char[] format) {
		...
	}

	template format(char[] F) {
		mixin(gen_format_decl(F));
	}

I leave it up to you.

Best regards,
Miles.
Feb 20 2007
parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Miles" <_______ _______.____> wrote in message 
news:erfh29$jq7$1 digitalmars.com...

 Best regards,
 Miles.

You mean like std.metastrings.Format? ;)
Feb 20 2007
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:erft0e$1eoe$1 digitalmars.com...
 "Miles" <_______ _______.____> wrote in message 
 news:erfh29$jq7$1 digitalmars.com...

 Best regards,
 Miles.

You mean like std.metastrings.Format? ;)

Although I guess it only handles constants..
Feb 20 2007
prev sibling parent reply Miles <_______ _______.____> writes:
Jarrett Billingsley wrote:
 You mean like std.metastrings.Format?  ;) 

Definitely no. Like you figured, std.metastrings.Format doesn't generate any run-time code. I want something that parses the format string at compile-time, and just convert and concatenate the arguments at run-time. Also, std.metastrings.Format only supports %s, so it does no type-checking and argument counting. If you provide more arguments than the format string asks for, they are just concatenated like the original format. This is something that helps bugs to pass unnoticed by the programmer. Definitely not what I want.
Feb 20 2007
parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Tue, 20 Feb 2007 21:30:17 -0200, Miles wrote:

 Jarrett Billingsley wrote:
 You mean like std.metastrings.Format?  ;) 

Definitely no. Like you figured, std.metastrings.Format doesn't generate any run-time code. I want something that parses the format string at compile-time, and just convert and concatenate the arguments at run-time. Also, std.metastrings.Format only supports %s, so it does no type-checking and argument counting. If you provide more arguments than the format string asks for, they are just concatenated like the original format. This is something that helps bugs to pass unnoticed by the programmer. Definitely not what I want.

For what its worth, I tend to just use '%s' in writefln() calls, and almost never use any of the other format codes, regardless of the data type being supplied in the arguments. e.g. char[] theName; int theAge; std.stdio.writefln("Hi, my name is %s and I'm %s years old", theName, theAge); Thus I've chosen the chance of formatting bugs over the ease of coding. It is no different to saying either ... std.stdio.writefln("Hi, my name is %s and I'm %s years old", toString(theName), toString(theAge) ); or std.stdio.write("Hi, my name is ", theName, " and I'm ", theAge, " years old"); I'm not saying your idea is bad, just that its not going to be universally wanted. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 21/02/2007 10:42:57 AM
Feb 20 2007
parent reply Miles <_______ _______.____> writes:
Derek Parnell wrote:
 For what its worth, I tend to just use '%s' in writefln() calls, and almost
 never use any of the other format codes, regardless of the data type being
 supplied in the arguments.

This means that you have a little more run-time code being executed to type-id each argument and call the appropriate toString(). I.e.:
     char[] theName;
     int theAge;
 
     std.stdio.writefln("Hi, my name is %s and I'm %s years old",
                           theName, theAge);

In this case, format will have to type-identify both arguments at run-time every time this statement is executed, and the results will always be char[] for the second argument and int for the third, it won't change. This is a little bit of useless computation. Performance impact is small, but it exists, and it bothers me (not the type-id solely, but the format string parsing).
     std.stdio.writefln("Hi, my name is %s and I'm %s years old",
                           toString(theName), toString(theAge) );

What about is you want leading zeros and a forced '+' sign on the integer number? Or want to format a float number as Sci notation with 2 decimal places, padded into a column 12-spaces wide...
 I'm not saying your idea is bad, just that its not going to be universally
 wanted.

Yeah, sure. It is just how I see that format should have worked since the old C days, with compile-time type checking, argument counting, and no useless computation (neither type-identifying, nor string parsing) at run-time. Now that D have some powerful mixins, added to templates, this makes the whole sense.
Feb 20 2007
parent reply renoX <renosky free.fr> writes:
Miles Wrote:

 Derek Parnell wrote:
 For what its worth, I tend to just use '%s' in writefln() calls, and almost
 never use any of the other format codes, regardless of the data type being
 supplied in the arguments.


Yes, %s and the like are useless except when you need a special format like %08d. In Ruby, they provide an escape syntax 'puts "test #{variable}"' without format for the default embedding. IMHO in D we could use a similar syntax: writef("text %{variable}"); and writef("test %08d{variable2}"); the %{} embedded in the strings helps a lot readability I think. Below is a sample implementation (could be used as a basis for "templated format" if wanted), it is lacking the possibility to have %08d{} but this could be added.. renoX import std.stdio; template FindChar(char[] A, char B) { static if (A.length == 0) { const int FindChar = -1; } else static if (A[0] == B) { const int FindChar = 0; } else static if (-1 == FindChar!(A[1..$], B)) { const int FindChar = -1; } else { const int FindChar = 1 + FindChar!(A[1..$], B); } } template FmtString(char[] F, A...) { static if (F.length == 0) static if (A.length) const char[] FmtString = "\"," ~ Fmt!(A); else const char[] FmtString = "\""; else static if (F.length == 1) static if (A.length) const char[] FmtString = F[0] ~ "\"," ~ Fmt!(A); else const char[] FmtString = F[0] ~ "\""; else static if (F[0..2] == "%{") { // get the variable name between %{ and } static if (FindChar!(F[2..$], '}') <= 0) static assert(0, "format %{ incorrect in '" ~ F ~ "'"); const char[] FmtString = "%s\"," ~ F[2..2+FindChar!(F[2..$],'}')] ~ ",\"" ~ FmtString!(F[3+FindChar!(F[2..$],'}')..$], A); } else static if (F[0..2] == "%%") const char[] FmtString = "%%" ~ FmtString!(F[2..$], A); // else static if (F[0] == '%') // static assert(0, "unrecognized Fmt '%" ~ F[1] ~ "'"); else const char[] FmtString = F[0] ~ FmtString!(F[1..$], A); } template Fmt(A...) { static if (A.length == 0) const char[] Fmt = ""; else static if (is(typeof(A[0]) : char[])) const char[] Fmt = "\"" ~ FmtString!(A[0], A[1..$]); else static if (A.length == 1) const char[] Fmt = A[0].stringof; else const char[] Fmt = A[0].stringof ~ "," ~ Fmt!(A[1..$]); } template Putf(A...) { const char[] Putf = "writef(" ~ Fmt!(A) ~ ");"; // static assert(0, Putf); } int main() { int x = 10; int y = 20; writef("BEFORE'"); mixin(Putf!("foo",x)); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("foo",x,y)); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!(x,"foo")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!(x)); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("foo\n")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant %%{x} apres")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant %{x} apres")); writef("'AFTER\n"); char[] text="should see the % in %{x}"; writef("BEFORE'"); mixin(Putf!("test avant %{text} apres")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant %{text} apres %{x} encore apres")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant", "apres\n")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant %{text}"," apres %{x} encore apres")); writef("'AFTER\n"); writef("BEFORE'"); mixin(Putf!("test avant %{text}"," apres %{x} encore apres ",x)); writef("'AFTER\n"); return 0; }
Feb 21 2007
next sibling parent reply Don Clugston <dac nospam.com.au> writes:
renoX wrote:
 Miles Wrote:
 
 Derek Parnell wrote:
 For what its worth, I tend to just use '%s' in writefln() calls, and almost
 never use any of the other format codes, regardless of the data type being
 supplied in the arguments.


Yes, %s and the like are useless except when you need a special format like %08d. In Ruby, they provide an escape syntax 'puts "test #{variable}"' without format for the default embedding. IMHO in D we could use a similar syntax: writef("text %{variable}"); and writef("test %08d{variable2}"); the %{} embedded in the strings helps a lot readability I think.

IMHO, to be really useful, it would need to be possible to embed expressions in the string (not just variable names). This is tantalizingly close to being possible with string mixins.
Feb 23 2007
parent reply renoX <renosky free.fr> writes:
Don Clugston a écrit :
 renoX wrote:
 Miles Wrote:

 Derek Parnell wrote:
 For what its worth, I tend to just use '%s' in writefln() calls, and 
 almost
 never use any of the other format codes, regardless of the data type 
 being
 supplied in the arguments.


Yes, %s and the like are useless except when you need a special format like %08d. In Ruby, they provide an escape syntax 'puts "test #{variable}"' without format for the default embedding. IMHO in D we could use a similar syntax: writef("text %{variable}"); and writef("test %08d{variable2}"); the %{} embedded in the strings helps a lot readability I think.

IMHO, to be really useful, it would need to be possible to embed expressions in the string (not just variable names). This is tantalizingly close to being possible with string mixins.

Not 'tantalizingly close to being possible': it works! I've posted the implementation in another thread 'Improvement on format strings, take two.' and indeed: mixin(Putf!("text: %d{x+y+1}")); does what's expected. Of course format string containing %.{} format must be const char[]. I had to add a limitation to escape '{' at first I planned to do '\{' with would allow %{x} to work, but dmd tells me 'invalid escape', so '%{' is the escape for '{' and you have to specify %d[x} The only annoying part is the need to put the mixin keyword, too bad there isn't some special syntax: $putf!(.. ) would be acceptable for example or even better putf!!(..) mixin(putf!(..)) looks ugly. renoX PS: while I won't do the 'templated format' suggestion, IMHO my implementation of putf would be a good start for anyone who wants to do it: it has format string matching template code.
Feb 23 2007
parent "Andrei Alexandrescu (See Website For Email)" <SeeWebsiteForEmail erdani.org> writes:
renoX wrote:
 Don Clugston a écrit :
 renoX wrote:
 Miles Wrote:

 Derek Parnell wrote:
 For what its worth, I tend to just use '%s' in writefln() calls, 
 and almost
 never use any of the other format codes, regardless of the data 
 type being
 supplied in the arguments.


Yes, %s and the like are useless except when you need a special format like %08d. In Ruby, they provide an escape syntax 'puts "test #{variable}"' without format for the default embedding. IMHO in D we could use a similar syntax: writef("text %{variable}"); and writef("test %08d{variable2}"); the %{} embedded in the strings helps a lot readability I think.

IMHO, to be really useful, it would need to be possible to embed expressions in the string (not just variable names). This is tantalizingly close to being possible with string mixins.

Not 'tantalizingly close to being possible': it works! I've posted the implementation in another thread 'Improvement on format strings, take two.' and indeed: mixin(Putf!("text: %d{x+y+1}")); does what's expected. Of course format string containing %.{} format must be const char[]. I had to add a limitation to escape '{' at first I planned to do '\{' with would allow %{x} to work, but dmd tells me 'invalid escape', so '%{' is the escape for '{' and you have to specify %d[x} The only annoying part is the need to put the mixin keyword, too bad there isn't some special syntax: $putf!(.. ) would be acceptable for example or even better putf!!(..) mixin(putf!(..)) looks ugly.

That is on the radar. Ideally, if something expands to an expression, it should be implementable with the function call syntax. Andrei
Feb 23 2007
prev sibling parent reply Miles <_______ _______.____> writes:
renoX wrote:
 Below is a sample implementation (could be used as a basis for
 "templated format" if wanted), it is lacking the possibility to have
 %08d{} but this could be added..

Nice! :-)
Feb 23 2007
parent reply renoX <renosky free.fr> writes:
Miles a écrit :
 renoX wrote:
 Below is a sample implementation (could be used as a basis for
 "templated format" if wanted), it is lacking the possibility to have
 %08d{} but this could be added..

Nice! :-)

There's a better implementation under 'Improvement on format strings, take two': it works with things like %08d{x+y} (and any legal format string between %..{ ), and has no problem with non-const char[] (it just leave them alone). But there are remaining problems: -you need to call it like: mixin(putf!(...)); the mixin keyword doesn't look nice, and when there is a problem at compile time (I dont' undestand why it's needed: Reiner Pope in 'Mixin demo: associative array initializers' manage to hide it him..) - the line given for the error is inside the template processor, and I haven't found any way to show the line where the call is made: for a developer this is not easy to debug when he makes an error in the format string. And lastly the implementation could probably be improved, but that's normal: I'm a beginner in D.. renoX
Feb 24 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
renoX wrote:
 But there are remaining problems:
 -you need to call it like: mixin(putf!(...));
 the mixin keyword doesn't look nice, and when there is a problem at 
 compile time

 (I dont' undestand why it's needed: Reiner Pope in 'Mixin 
 demo: associative array initializers' manage to hide it him..)

I just answered this question in your thread "Re: "Hiding" the mixin keyword?" in d.D.learn. For the people who don't want to go look it up: ----- Looks like he hid the mixin() in a templated function call, using property syntax to remove the trailing (). I'm pretty sure that means he can't support variables in the initializers, nor expressions containing variables. It would seem these are quite important for a (runtime) string formatter though. ;) -----
Feb 24 2007