digitalmars.D - Wanted: Format character for source code literal

Berni44 (22/22) Apr 30 2021 I plan to add an extension to `std.format`, namely a new format

Imperatorn (3/8) Apr 30 2021 Actually S is already used. The only specifier that is "free"

Berni44 (2/3) Apr 30 2021 Huh, where? Not in `std.format`...

Imperatorn (3/6) Apr 30 2021 No I meant for [w]printf

Patrick Schluter (2/7) Apr 30 2021 %m in reference to mixin ?

Steven Schveighoffer (5/14) Apr 30 2021 I like this.

H. S. Teoh (16/27) Apr 30 2021 [...]

Berni44 (14/27) May 01 2021 Of course, there are limits to this. I intend to implement it for

Imperatorn (3/8) May 01 2021 If this would be unique to mixins, then %m, if not, L(iteral) or
Bastiaan Veelo (11/13) May 01 2021 Would `%q` work? In reference to [token
Q. Schroll (8/30) May 03 2021 Please don't do this. Format characters can be customized. Any

Berni44 (27/35) May 04 2021 I think, this doesn't hurt: The call of a `toString` has

Imperatorn (3/14) May 04 2021 Just go with %m if you want lowercase, otherwise you would have
Q. Schroll (32/61) May 04 2021 What you wrote in parentheses is _exactly_ the problem I have

Berni44 (17/46) May 05 2021 I fear, I can't follow you. Seems like I don't get your point.

Paul Backus (24/30) May 05 2021 `%D` *does* currently have a meaning, though. It means "custom

Berni44 (24/48) May 06 2021 First of all: Thanks for clarifying. I think, I understand the

Q. Schroll (74/122) May 05 2021 I'm speaking of aggregate types (structs, classes, etc.) that

Q. Schroll (17/28) May 05 2021 I've done some experiments and the results are mixed.
Berni44 (40/59) May 06 2021 Yes, thank you. That already helped a lot, although I fear, we

Q. Schroll (56/105) May 07 2021 The role of `%s` is special, but not too special either. It just

Berni44 (40/80) May 08 2021 That's not, what I meant. What I meant was, that some custom

Berni44 <someone somemail.com> writes:

I plan to add an extension to `std.format`, namely a new format 
character with the meaning of producing a source code literal. Or 
more formally, the following snippet should work for every type 
this extension will support:

```
enum a = <something>;
enum b = mixin(format!"%S"(a));

static assert(a == b && is(typeof(a) == typeof(b)));

```
(Please note, that even for floats `a == b` should hold for all 
values, but NaNs; I plan to use RYU for this.)

The big question is now, which character to use. I thought of 
`%S` like source code literal. Andrei suggested `%D` like D 
literal. Both ideas have the disadvantage of using uppercase 
letters, which would break the of uppercase letters meaning that 
the output uses uppercase instead of lowercase (i.e. 1E10 instead 
of 1e10).

A first idea of a lowercase literal might be `%l` but this might 
easily be confused with `%I` and `%1` (both don't exist); and 
also `l` is used in C's `printf` for `long` which we luckily 
don't need here. Anyway I fear confusion.

What do you think? Which letter would be best?

Apr 30 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]

Actually S is already used. The only specifier that is "free" 
would be D. But, what's so bad with using L? 🤔

Apr 30 2021

Berni44 <someone somemail.com> writes:

On Friday, 30 April 2021 at 07:20:11 UTC, Imperatorn wrote:
 Actually S is already used.

Huh, where? Not in `std.format`...

Apr 30 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Friday, 30 April 2021 at 08:02:44 UTC, Berni44 wrote:
 On Friday, 30 April 2021 at 07:20:11 UTC, Imperatorn wrote:
 Actually S is already used.

 Huh, where? Not in `std.format`...

No I meant for [w]printf

https://docs.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions

Apr 30 2021

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]

%m in reference to mixin ?

Apr 30 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/30/21 4:10 AM, Patrick Schluter wrote:
 On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. Or more 
 formally, the following snippet should work for every type this 
 extension will support:

 [...]

 
 %m in reference to mixin ?

I like this.

I also could be OK with capital D. I know it normally stands for make 
uppercase, but decimal numbers have no letters in them.

-Steve

Apr 30 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Apr 30, 2021 at 07:10:38AM +0000, Berni44 via Digitalmars-d wrote:
 I plan to add an extension to `std.format`, namely a new format
 character with the meaning of producing a source code literal. Or more
 formally, the following snippet should work for every type this
 extension will support:
 
 ```
 enum a = <something>;
 enum b = mixin(format!"%S"(a));
 
 static assert(a == b && is(typeof(a) == typeof(b)));
 ```

[...]

What's the scope of this feature? Can <something> be, say, a code
literal like a lambda? Can std.format even support printing the function
body of a lambda in a way that can be parsed by mixin()?  How far do we
intend to go with this?  Or does this only apply to POD values?

I can imagine there'd be problems if you have, say, a class from a
different module with private members, possibly with nested private
classes, so you couldn't actually reconstruct the class instance from a
string alone.

Unless the scope is significantly constrained, I see the potential for
this feature devolving into a big time-sink that really only caters to a
niche use-case. I'd be happy to be proven wrong, though.


T

-- 
"You know, maybe we don't *need* enemies." "Yeah, best friends are about all I
can take." -- Calvin & Hobbes

Apr 30 2021

Berni44 <someone somemail.com> writes:

On Friday, 30 April 2021 at 21:11:55 UTC, H. S. Teoh wrote:
 What's the scope of this feature? Can \<something\> be, say, a 
 code literal like a lambda? Can std.format even support 
 printing the function body of a lambda in a way that can be 
 parsed by mixin()?  How far do we intend to go with this?  Or 
 does this only apply to POD values?

 I can imagine there'd be problems if you have, say, a class 
 from a different module with private members, possibly with 
 nested private classes, so you couldn't actually reconstruct 
 the class instance from a string alone.

 Unless the scope is significantly constrained, I see the 
 potential for this feature devolving into a big time-sink that 
 really only caters to a niche use-case. I'd be happy to be 
 proven wrong, though.

Of course, there are limits to this. I intend to implement it for 
bools, integers, floats, characters, strings and enums. I think 
it will also be possible to implement it for (associative) 
arrays, as long as the elements can be implemented like this. 
Structs, classes, interfaces and unions should implement it in 
their `toString` functions. I wouldn't go so far to provide 
generic code for them. With that we'll indeed run into problems. 
(Maybe with the exception of structs with default constructor.) I 
haven't made up my mind about pointers yet.

And well functions, delegates and lambdas? Functions aren't 
supported by `std.format` at all and the implementation of 
delegates is broken and useless and should be removed (in my 
opinion).

May 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]

If this would be unique to mixins, then %m, if not, L(iteral) or 
D(literal).

May 01 2021

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal.

Would `%q` work? In reference to [token 
strings](https://dlang.org/spec/lex.html#token_strings ):

```
enum a = <something>;
enum b = mixin(format!"%q"(a));
enum c = mixin(q{<something>});

static assert(a == b && is(typeof(a) == typeof(b)));
static assert(c == b && is(typeof(c) == typeof(b)));
```

— Bastiaan.

May 01 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 ```D
 enum a = <something>;
 enum b = mixin(format!"%S"(a));

 static assert(a == b && is(typeof(a) == typeof(b)));
 ```

 (Please note, that even for floats `a == b` should hold for all 
 values, but NaNs; I plan to use RYU for this.)

 The big question is now, which character to use. I thought of 
 `%S` like source code literal. Andrei suggested `%D` like D 
 literal. Both ideas have the disadvantage of using uppercase 
 letters, which would break the of uppercase letters meaning 
 that the output uses uppercase instead of lowercase (i.e. 1E10 
 instead of 1e10).

 A first idea of a lowercase literal might be `%l` but this 
 might easily be confused with `%I` and `%1` (both don't exist); 
 and also `l` is used in C's `printf` for `long` which we 
 luckily don't need here. Anyway I fear confusion.

 What do you think? Which letter would be best?

Please don't do this. Format characters can be customized. Any 
character you'd introduce for it either wouldn't work for some 
types or break those types' formatting. Why not introduce a new 
function like `dlangLiteral` that takes the value and returns a 
string? It can be used in `format` quite easily (like 
`format("pre %s post", dlangLiteral(<something>));`) and is 
explicit and not a special case at all.

May 03 2021

Berni44 <someone somemail.com> writes:

On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 Please don't do this. Format characters can be customized. Any 
 character you'd introduce for it either wouldn't work for some 
 types or break those types' formatting.

I think, this doesn't hurt: The call of a `toString` has 
precedence for compound types like structs and classes (and in 
most of these cases it won't be possible to add a generic literal 
at all, see my post above). So, if you use some of the predefined 
qualifiers, the customized version will always be used, even if 
it has a completely different meaning. (Admittedly it might cause 
some confusion, if the customized versions are not well 
documented.)

 Why not introduce a new function like `dlangLiteral` that takes 
 the value and returns a string? It can be used in `format` 
 quite easily (like `format("pre %s post", 
 dlangLiteral(<something>));`) and is explicit and not a special 
 case at all.

In my opinion, the main idea behind this formatting routines is, 
to have a simple and short way for formatting output. We could 
use your idea for every other format character too, like: 
`format("%s = %s", character('𝜋'), 
scientificFloatingPoint(3.14))`. We don't do that, because it's 
more convenient to write `format("%c = %e", '𝜋', 3.14)`.

Furthermore, there are all the parameters, that can be applied: 
`format("%-3c = %+.4e", '𝜋', 3.14)` is a simple way to change the 
formatting. Without it it would become something like `format("%s 
= %s", character!(true, false, false, false, false, false, 
3)('𝜋'), scientificFloatingPoint!(false, true, false, false, 
false, false, FormatSpec.UNSPECIFIED, 4)(3.14))`.

An other problem will be, when used with arrays, ranges and the 
like, e.g. you can do something like `format("val = [%(%D,\n      
  %)];", my_array);` to get an output with each value on a 
separate line. Without this literal you would at least need to 
map `my_array` using `dlangLiteral` and in generic code this 
might even cause more trouble.

May 04 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:
 On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 [...]

 I think, this doesn't hurt: The call of a `toString` has 
 precedence for compound types like structs and classes (and in 
 most of these cases it won't be possible to add a generic 
 literal at all, see my post above). So, if you use some of the 
 predefined qualifiers, the customized version will always be 
 used, even if it has a completely different meaning. 
 (Admittedly it might cause some confusion, if the customized 
 versions are not well documented.)

 [...]

Just go with %m if you want lowercase, otherwise you would have 
to use uppercase specifiers.

May 04 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:
 On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 Please don't do this. Format characters can be customized. Any 
 character you'd introduce for it either wouldn't work for some 
 types or break those types' formatting.

 I think, this doesn't hurt: The call of a `toString` has 
 precedence for compound types like structs and classes (and in 
 most of these cases it won't be possible to add a generic 
 literal at all, see my post above). So, if you use some of the 
 predefined qualifiers, the customized version will always be 
 used, even if it has a completely different meaning. 
 (Admittedly it might cause some confusion, if the customized 
 versions are not well documented.)

What you wrote in parentheses is _exactly_ the problem I have 
with this. Generic code cannot use it because user defined types 
regularly hook the format. If you use `%D` format, but the type 
does not support it (say std.typecons.Tuple), it will throw a 
FormatException.
So you're stuck between a rock and a hard place: Give `%D` 
preference over custom format specifiers rendering those that use 
`%D` invalid or let `%D` do its custom stuff if *potentially* 
supported rendering `%D` useless in generic code where most of 
its use-cases would lie.

 Why not introduce a new function like `dlangLiteral` that 
 takes the value and returns a string? It can be used in 
 `format` quite easily (like `format("pre %s post", 
 dlangLiteral(<something>));`) and is explicit and not a 
 special case at all.

 In my opinion, the main idea behind this formatting routines 
 is, to have a simple and short way for formatting output. We 
 could use your idea for every other format character too, like: 
 `format("%s = %s", character('𝜋'), 
 scientificFloatingPoint(3.14))`. We don't do that, because it's 
 more convenient to write `format("%c = %e", '𝜋', 3.14)`.

Yes, you could. But you could use format specifiers like `%-3.8f` 
*without losses* to get to the same result. And that's *the* 
difference between introducing a format specifier character that 
should have generic meaning and introducing, well, anything else. 
There was no problem introducing separators like `%,3d` and 
neither would there be a problem introducing `%y` for `int` or 
`double` (whatever it does), or, for a concrete example, `%S` for 
`bool` to return `TRUE` instead of `true`.

The problem is introducing *generic* format specifier 
*characters*.

 An other problem will be, when used with arrays, ranges and the 
 like, e.g. you can do something like `format("val = [%(%D,\n
  %)];", my_array);` to get an output with each value on a 
 separate line. Without this literal you would at least need to 
 map `my_array` using `dlangLiteral` and in generic code this 
 might even cause more trouble.

If you want that, you need to allow something that's currently 
illegal. As a comparison, `%.*f` could be introduced if `*` 
precision weren't already a thing (compare with `%,3d`) because 
in any reasonable implementation, `*` instead of precision would 
be an error. What we could do is special casing `%$` to mean what 
you want. Currently, no matter what type you're formatting, `%$` 
is an error in `FormatSpec`. You can give it semantics, no 
problem, including one that ignores custom formatting. Even 
better, `%$` looks like it's a special case and not some 
odd-but-legal custom specifier.

Changing the meaning of `%D` begs for trouble.

May 04 2021

Berni44 <someone somemail.com> writes:

On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
 So you're stuck between a rock and a hard place: Give `%D` 
 preference over custom format specifiers rendering those that 
 use `%D` invalid or let `%D` do its custom stuff if 
 *potentially* supported rendering `%D` useless in generic code 
 where most of its use-cases would lie.

I fear, I can't follow you. Seems like I don't get your point. 
Maybe you can give an example?

 In my opinion, the main idea behind this formatting routines 
 is, to have a simple and short way for formatting output. We 
 could use your idea for every other format character too, 
 like: `format("%s = %s", character('𝜋'), 
 scientificFloatingPoint(3.14))`. We don't do that, because 
 it's more convenient to write `format("%c = %e", '𝜋', 3.14)`.

 Yes, you could. But you could use format specifiers like 
 `%-3.8f` *without losses* to get to the same result.

??? Again I'm stuck. What has `%-3.8f` with what I wrote above to 
do?

 And that's *the* difference between introducing a format 
 specifier character that should have generic meaning and 
 introducing, well, anything else. There was no problem 
 introducing separators like `%,3d` and neither would there be a 
 problem introducing `%y` for `int` or `double` (whatever it 
 does), or, for a concrete example, `%S` for `bool` to return 
 `TRUE` instead of `true`.

 The problem is introducing *generic* format specifier 
 *characters*.

What is the difference between "generic" (which as far as I 
understand you oppose) and adding `%D` for bool, integers, 
floats, characters, strings, arrays and AAs (which you sound as 
being OK with, and which is, what I plan to do)?

 What we could do is special casing `%$` to mean what you want. 
 Currently, no matter what type you're formatting, `%$` is an 
 error in `FormatSpec`. You can give it semantics, no problem, 
 including one that ignores custom formatting. Even better, `%$` 
 looks like it's a special case and not some odd-but-legal 
 custom specifier.

Using `$` would cause real troubles, because it's already used 
for positional arguments. What would `format("%1$d", 'a');` be 
supposed to produce? `'a'd` or `97`?

 Changing the meaning of `%D` begs for trouble.

`%D` has currently no meaning, so we cannot change it; we can 
just add it.


I hope, we can figure this out somehow - I sense, that you've got 
an important point, but I don't understand it. Seems like we are 
talking past each other.

May 05 2021

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
 What is the difference between "generic" (which as far as I 
 understand you oppose) and adding `%D` for bool, integers, 
 floats, characters, strings, arrays and AAs (which you sound as 
 being OK with, and which is, what I plan to do)?

[...]
 `%D` has currently no meaning, so we cannot change it; we can 
 just add it.

`%D` *does* currently have a meaning, though. It means "custom 
format specifier."

Here's the scenario that could potentially lead to trouble:

1. Some existing library uses `%D` as a custom format specifier 
in their `toString` methods, with a meaning other than "format as 
D source code."

2. `%D` is added to `std.format` with the meaning "format as D 
source code," and a default implementation for types that do not 
have custom `toString` methods.

3. A new library is written that takes advantage of (2) and uses 
`%D` in generic code to format arbitrary values for the purpose 
of code generation.

4. Someone uses the library from (1) and the library from (3) in 
the same project, and library (3) ends up producing garbage, 
because library (1)'s `%D` doesn't work the way library (3) 
expects it to.

The "correct" place to fix this is in library (1), but doing so 
would require a breaking change. In practice, this means that 
libraries like the one in (3) will never be able to completely 
rely on the new standard for `%D`, and will always have to 
include some kind of workaround in case they are used with types 
like the ones in library (1).

May 05 2021

Berni44 <someone somemail.com> writes:

On Wednesday, 5 May 2021 at 17:02:42 UTC, Paul Backus wrote:
 Here's the scenario that could potentially lead to trouble:

 1. Some existing library uses `%D` as a custom format specifier 
 in their `toString` methods, with a meaning other than "format 
 as D source code."

 2. `%D` is added to `std.format` with the meaning "format as D 
 source code," and a default implementation for types that do 
 not have custom `toString` methods.

 3. A new library is written that takes advantage of (2) and 
 uses `%D` in generic code to format arbitrary values for the 
 purpose of code generation.

 4. Someone uses the library from (1) and the library from (3) 
 in the same project, and library (3) ends up producing garbage, 
 because library (1)'s `%D` doesn't work the way library (3) 
 expects it to.

First of all: Thanks for clarifying. I think, I understand the 
problem now.

 The "correct" place to fix this is in library (1), but doing so 
 would require a breaking change. In practice, this means that 
 libraries like the one in (3) will never be able to completely 
 rely on the new standard for `%D`, and will always have to 
 include some kind of workaround in case they are used with 
 types like the ones in library (1).

In my opinion, the error is in (3): The new library assumes, that 
`%D` can be used with every type (and will always have the 
meaning "D literal"), which in my opinion is wrong:

It does not even hold for established characters, for example 
take `%b`: For bools, integers, characters and enums if their 
base type is one of the first three, this has currently the 
meaning "format as unsigned binary number". It currently cannot 
be used for anything else where `std.format` is responsible for.

But of course it can be used in any custom type (be it one of 
phobos or an external library or what ever). And no one will stop 
anyone from using it in a completely different way, e.g. as 
bitmap of the type or whatever.

So in my opinion in the above scenario the library in (3) should 
clearly state in its docs, that it can only be used with code 
that uses `%D` in the sense of being a "D literal". And the 
library in (1) should clearly state in its docs, what `%D` means, 
if it has a meaning. And with that it should be clear, that you 
cannot use (1) and (3) together in one project, at least not 
without adding some clue.

Now I think, I can go back to this:

 `%D` has currently no meaning, so we cannot change it; we can 
 just add it.

 `%D` *does* currently have a meaning, though. It means "custom 
 format specifier."

But doesn't that apply to *every* format specifier?

May 06 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:



On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
 On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
 So you're stuck between a rock and a hard place: Give `%D` 
 preference over custom format specifiers rendering those that 
 use `%D` invalid or let `%D` do its custom stuff if 
 *potentially* supported rendering `%D` useless in generic code 
 where most of its use-cases would lie.

 I fear, I can't follow you. Seems like I don't get your point. 
 Maybe you can give an example?

I'm speaking of aggregate types (structs, classes, etc.) that 
implement `toString` that takes a `FormatSpec` parameter 
alongside the sink to describe the format according to which it 
should be formatted. An example is `std.typecons.Tuple` which 
apart from `%s` accepts `%(...%)` and `%(...%|...%)`. If you try 
to format it with `%D`, it throws a `FormatException`. But like 
any aggregate type, it could start accepting `%D` tomorrow.

The new `format` implementation could do three things when 
encountering `%D` for formatting an object of a type with custom 
formatting:
1. Because it accepts custom formatting, use it, even if it fails 
(throws `FormatException`).
2. Because it accepts custom formatting `try` it. If it fails 
(i.e. throws `FormatException`), fall back to non-custom `%D` 
behavior. (If it succeeds, use the successful result.)
3. Ignore the custom formatting because `%D` is special.

None of these solutions is great.
1. means `%D` cannot be relied upon in generic code, i.e. where 
the type of what you're formatting isn't up to you but someone 
else. _Relied upon_ means in the way you intend `%D` to be used: 
A compiler-readable representation of the object.
2. It could fail in other ways. (Still the best.)
3. Breaks code, at least theoretically. Also, even if today no 
one actually uses `%D`, it might be the perfect match for a 
future aggregate type, but you blocked it.

 In my opinion, the main idea behind this formatting routines 
 is, to have a simple and short way for formatting output. We 
 could use your idea for every other format character too, 
 like: `format("%s = %s", character('𝜋'), 
 scientificFloatingPoint(3.14))`. We don't do that, because 
 it's more convenient to write `format("%c = %e", '𝜋', 3.14)`.

 Yes, you could. But you could use format specifiers like 
 `%-3.8f` *without losses* to get to the same result.

 ??? Again I'm stuck. What has `%-3.8f` with what I wrote above 
 to do?

Er, you started with scientific notation stuff. My point is that 
introducing _new constructs_ in the format specification such as 
width and precision is would not be an issue if it weren't there 
already, but introducing a format specification _character_ with 
special meaning is.

 And that's *the* difference between introducing a format 
 specifier character that should have generic meaning and 
 introducing, well, anything else. There was no problem 
 introducing separators like `%,3d` and neither would there be 
 a problem introducing `%y` for `int` or `double` (whatever it 
 does), or, for a concrete example, `%S` for `bool` to return 
 `TRUE` instead of `true`.

 The problem is introducing *generic* format specifier 
 *characters*.

 What is the difference between "generic" (which as far as I 
 understand you oppose) and adding `%D` for bool, integers, 
 floats, characters, strings, arrays and AAs (which you sound as 
 being OK with, and which is, what I plan to do)?

Because `%D` for `bool`, integers (note that according to Walter, 
`bool` is an integer type), `floats`, arrays, and AAs is nothing 
different from `%s`. The only part where you'd need something 
different than `%s` is characters, strings. That would be handy 
to have, I must admit. [You can mimic it using arrays 
tho](https://run.dlang.io/is/vPOnNx):
```D
auto str = format("prefix %s %(%s%) %s postfix", "before", [ 
"a\nbc" ], "after");
assert(str == `prefix before "a\nbc" after postfix`);
```

And it's almost perfect! It works for character types, numeric 
types, arrays, and AAs, too. Only for user-defined types, you 
have no control, because it does what the user-defined `toString` 
implementation defines `%s` to do. In fact, `%s` might not even 
work with a user-defined type! It could throw an exception (a 
`FormatException` if it's reasonable).

The only thing it doesn't do is respecting `wstring` and 
`dstring` literals. I cannot really estimate if that would be a 
problem, but I guess for the most part, it wouldn't.

 What we could do is special casing `%$` to mean what you want. 
 Currently, no matter what type you're formatting, `%$` is an 
 error in `FormatSpec`. You can give it semantics, no problem, 
 including one that ignores custom formatting. Even better, 
 `%$` looks like it's a special case and not some odd-but-legal 
 custom specifier.

 Using `$` would cause real troubles, because it's already used 
 for positional arguments. What would `format("%1$d", 'a');` be 
 supposed to produce? `'a'd` or `97`?

The `$` only has that meaning if it's preceded by a number. 
`%`*N*`$`*…c* has a meaning for *N* a number and *c* a character 
possibly preceded by other formatting stuff. But `%$` is 
undefined in the sense that it is an error to use it.

 Changing the meaning of `%D` begs for trouble.

 `%D` has currently no meaning, so we cannot change it; we can 
 just add it.

`%D` *potentially* has a meaning for existing (or future) 
user-defined types. On the other hand, `%$` has not, because it's 
not up to a user-defined type to define its meaning but to 
`format` (`FormatSpec` to be precise) because currently, 
`FormatSpec` does not support `%$` to begin with.

 I hope, we can figure this out somehow - I sense, that you've 
 got an important point, but I don't understand it. Seems like 
 we are talking past each other.

I guess you thought primarily about the built-in types while I 
primarily thought about user-defined types. I'm happy to clarify.



Now, let's talk about the implementation. It's far easier to talk 
about that in terms of a function. Let's call it `unMixin` 
because the goal is that `mixin(unMixin(obj))` results in `obj` 
or a copy of `obj`. On the other hand, we cannot expect 
`unMixin(mixin(str))` to return `str` because `str` could contain 
unnecessary information and even if it doesn't, it can contain 
context-dependent information that `unMixin` cannot generally 
retrieve.

Simplest example: If `unMixin(1)` returns `"1"`, we're good for 
`1`. If it returns `"cast(int) 1"`, we're also good.

May 05 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:


 Now, let's talk about the implementation. It's far easier to 
 talk about that in terms of a function. Let's call it `unMixin` 
 because the goal is that `mixin(unMixin(obj))` results in `obj` 
 or a copy of `obj`. On the other hand, we cannot expect 
 `unMixin(mixin(str))` to return `str` because `str` could 
 contain unnecessary information and even if it doesn't, it can 
 contain context-dependent information that `unMixin` cannot 
 generally retrieve.

 Simplest example: If `unMixin(1)` returns `"1"`, we're good for 
 `1`. If it returns `"cast(int) 1"`, we're also good.

I've done some experiments and the results are mixed.

The easiest by far is `typeof(null)`. For [scalar 
types](https://dlang.org/library/std/traits/is_scalar_type.html) 
and strings, the aforementioned `%(%s%)` can be used.

Pointers and slices aren't too hard either.

For structs without a constructor, `unMixin` is actually easy; if 
it has a constructor, the object cannot be described by a 
constructor call since who knows what the constructor does and 
maybe there isn't even a simple constructor call that will result 
in the given object. It can be done, but it's ugly and hacky.

Because unions can have sub-structs and stuff, I gave up on them.

I have not too much experience with D's classes, but from my 
estimation, it cannot be done. It looks like you need `typeid` at 
compile-time (at CTFE to be precise) which isn't available.

My take on it so far: 
https://run.dlang.io/gist/c98ef765cb8921595d5e41fc11c89ca7?args=-unittest%20-main

May 05 2021

Berni44 <someone somemail.com> writes:

On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:
 I guess you thought primarily about the built-in types while I 
 primarily thought about user-defined types. I'm happy to 
 clarify.

Yes, thank you. That already helped a lot, although I fear, we 
still don't agree on most of the points with regard to the 
content... :-s

 The new `format` implementation could do three things when 
 encountering `%D` for formatting an object of a type with 
 custom formatting:

For me, this seems to be the wrong way to think about it. 
`format` doesn't encounter specifiers, but objects (in the wider 
sense). And in case of structs, classes and so on it delegates 
the handling of formatting to them, without even looking at the 
specifier (with the exception of `%s` which sometimes plays a 
special role). It's then up to that struct or class to define the 
meaning of `%D` for that specific struct or class.

 note that according to Walter, `bool` is an integer type

Yeah, but `std.format` handles them in a special 
`formatValueImpl`, that's why I treat them separately.

 Because `%D` for `bool`, integers ([...]), `floats`, arrays, 
 and AAs is nothing different from `%s`.

That's not true: bytes need a cast, longs a trailing 'L', like 
reals, floating point numbers are truncated with `%s` and don't 
provide the correct value and so on. There are a lot of subtle 
differences and that's why I think it would be a good thing to 
have this new format character.

 The only part where you'd need something different than `%s` is 
 characters, strings. That would be handy to have, I must admit. 
 You can mimic it using arrays tho

That was actually the starting point for me that led me to a 
desire for having `%D`: `%s` for arrays tries to mimic the 
intended result of `%D` (but fails at several places to do so 
correctly) and therefore treats characters and strings special. 
This led to the abuse of the `-`-flag (in `"%-(...%)`) which now 
causes a lot of problems. I thought long about how this could be 
fixed: With `%D` available, there would be a smoother transition 
be possible, because people using `%s` inside of `%(...%)` could 
just replace it with `%D` to get the current result and that 
eventually will make it possible to give `%s` (and the `-`-flag) 
its correct meaning back. (Of course this still needs deprecation 
cycles and maybe a preview switch or what else - it's still not 
easy.)

 And it's almost perfect! It works for character types, numeric 
 types, arrays, and AAs, too.

As I wrote above: That might look so at first sight, but it isn't 
the case.

 The `$` only has that meaning if it's preceded by a number. 
 `%`*N*`$`*…c* has a meaning for *N* a number and *c* a 
 character possibly preceded by other formatting stuff. But `%$` 
 is undefined in the sense that it is an error to use it.

But people will start to use it with width and other parameters 
and will report issues. Let along, that it will complicate the 
format spec parser significantly and thus might even introduce 
more bugs. I'm sorry, but with `%$` you'll opening the box of 
pandora.

 Now, let's talk about the implementation.

Sorry, but as long as we do not even agree on the goal, this is 
not useful.

May 06 2021

Q. Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 6 May 2021 at 08:49:16 UTC, Berni44 wrote:
 On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:
 The new `format` implementation could do three things when 
 encountering `%D` for formatting an object of a type with 
 custom formatting:

 For me, this seems to be the wrong way to think about it. 
 `format` doesn't encounter specifiers, but objects (in the 
 wider sense). And in case of structs, classes and so on it 
 delegates the handling of formatting to them, without even 
 looking at the specifier (with the exception of `%s` which 
 sometimes plays a special role).

The role of `%s` is special, but not too special either. It just 
gives a best effort result where other formats would just fail. 
The task to return a string representation that can be 
interpreted back is nothing to be delegated to a user-defined 
routine.

 It's then up to that struct or class to define the meaning of 
 `%D` for that specific struct or class.

This makes `%D` unreliable for meta-programming. And this is 
_the_ problem I have with this, because creating a 
compiler-readable string from an object is a meta-programming 
tool. I have no idea _what else_ you'd even do with it.

Here's the showstopper: Adding a `toString` that accepts format 
specifiers becomes a potentially breaking change as it will 
change the meaning of `%D` silently.

 Because `%D` for `bool`, integers ([...]), `floats`, arrays, 
 and AAs is nothing different from `%s`.

 That's not true: bytes need a cast, longs a trailing 'L',

It depends what you want to do with it. If you want the immediate 
type of the literal to be what you plugged in, then yes. If being 
equal suffices, `"1"` and `"true"` are the same.

 like reals, floating point numbers are truncated with `%s` and 
 don't provide the correct value

_That,_ on the other hand, _is_ a problem. I don't know how big 
that problem practically is because `real` cannot even be 
formatted at CTFE and `double` and `float` aren't that common of 
things at compile-time. I guess the only sane result for floating 
point values is `%a` with sufficient digits anyways and that is 
largely apart from `%s` even if you add a gigantic precision.

It's a breaking change fixing `%s` for floating point values in 
the sense that the representation consists of enough decimals to 
accurately represent the number.

 and so on. There are a lot of subtle differences

The problem of strings and chars is obvious, the case for exact 
types is, too. Floating point types didn't cross my mind, but 
please elaborate, what else is it? I'm honestly interested.

If `%(%s%)` does not give you proper char or string, I'd consider 
it a bug.

 and that's why I think it would be a good thing to have this 
 new format character.

I agree with you that a new format is necessary to achieve this 
if done with a format character to begin with. I do question 
whether format characters are the right approach. To me, this 
looks more like a code generation tool than value formatting.

 The only part where you'd need something different than `%s` 
 is characters, strings. That would be handy to have, I must 
 admit. You can mimic it using arrays tho

 That was actually the starting point for me that led me to a 
 desire for having `%D`: `%s` for arrays tries to mimic the 
 intended result of `%D` (but fails at several places to do so 
 correctly) and therefore treats characters and strings special. 
 This led to the abuse of the `-`-flag (in `"%-(...%)`) which 
 now causes a lot of problems. I thought long about how this 
 could be fixed: With `%D` available, there would be a smoother 
 transition be possible, because people using `%s` inside of 
 `%(...%)` could just replace it with `%D` to get the current 
 result and that eventually will make it possible to give `%s` 
 (and the `-`-flag) its correct meaning back. (Of course this 
 still needs deprecation cycles and maybe a preview switch or 
 what else - it's still not easy.)

The `%-(...%)` a hack, but it can be questioned whether removing 
it is even worth the trouble. It just breaks things. The minus 
has otherwise no meaning for arrays. It's just weird.

 And it's almost perfect! It works for character types, numeric 
 types, arrays, and AAs, too.

 As I wrote above: That might look so at first sight, but it 
 isn't the case.

Right. I was a little enthusiastic about it.

 The `$` only has that meaning if it's preceded by a number. 
 `%`*N*`$`*…c* has a meaning for *N* a number and *c* a 
 character possibly preceded by other formatting stuff. But 
 `%$` is undefined in the sense that it is an error to use it.

 But people will start to use it with width and other parameters 
 and will report issues. Let along, that it will complicate the 
 format spec parser significantly and thus might even introduce 
 more bugs. I'm sorry, but with `%$` you'll opening the box of 
 pandora.

It requires a single check: Is the `%` character followed by `$`? 
The whole point of `%$` would be that it is not customizable. You 
cannot add any specification. If something comes before `$`, it 
isn't `%$`, and if something comes behind, it's not part of the 
format specifier, but just text.

---

I've been thinking about this a little. What is your goal? Maybe 
we're talking at cross purposes. I guess you want a format 
specifier that formats any _built-in_ type in a way that 
represents the object precisely. In a sense, you want a good `%s` 
and not a not-really-the-best-effort `%s`. My understanding was 
you want to represent objects as strings in a way that can be 
used by the compiler to reconstruct the object, and for what else 
than meta-programming would one do that? It's in a sense trivial 
for built-in types because it's a finite set of types.

Thinking about it, you can easily wrap objects in a struct and 
make it do The Right Thing™. It doesn't complicate the `format` 
implementation.

May 07 2021

Berni44 <someone somemail.com> writes:

On Friday, 7 May 2021 at 23:51:13 UTC, Q. Schroll wrote:
 looking at the specifier (with the exception of `%s` which 
 sometimes plays a special role).

 The role of `%s` is special, but not too special either. It 
 just gives a best effort result where other formats would just 
 fail.

That's not, what I meant. What I meant was, that some custom 
`toString` versions are only called, when `%s` is used (all, that 
do not take the format string or a `FormatSpec` as parameter).

 The task to return a string representation that can be 
 interpreted back is nothing to be delegated to a user-defined 
 routine.

Contrary, it's the only place where this can be done. The 
routines in `std.format` cannot know, what these objects need to 
be constructed. Maybe it is not even possible at all.

 _That,_ on the other hand, _is_ a problem. I don't know how big 
 that problem practically is because `real` cannot even be 
 formatted at CTFE

It can. I wrote that code. It's part of master since about two 
weeks.

 I guess the only sane result for floating point values is `%a` 
 with sufficient digits anyways

That's one reason, why I want to add `%D`: For floating point 
values I'd like to implement RYU (or something similar), which 
guarantees to emit a value, that produces exactly the same 
result, when read in.

 It's a breaking change fixing `%s` for floating point values in 
 the sense that the representation consists of enough decimals 
 to accurately represent the number.

And that's the reason, why I want to add `%D` and not to change 
`%s`.

 If `%(%s%)` does not give you proper char or string, I'd 
 consider it a bug.

Please define "proper char or string" first.

 I agree with you that a new format is necessary to achieve this 
 if done with a format character to begin with. I do question 
 whether format characters are the right approach. To me, this 
 looks more like a code generation tool than value formatting.

Sounds like you are thinking about a serialization tool or 
something like this. That's not what I plan. I think just about 
formatting values.

 The `%-(...%)` a hack, but it can be questioned whether 
 removing it is even worth the trouble.

If I were the only one to decide, I would remove it immediately, 
because in my eyes it is a bug. An yes, it's worth the trouble.

 It just breaks things.

That's why I haven't filed a PR yet. But I'm looking forward to a 
possibility to change this. And again here `%D` will help.

 The minus has otherwise no meaning for arrays.

It has: Left justification instead of right justification. It is 
just not/buggy implemented.

 It requires a single check: Is the `%` character followed by 
 `$`? The whole point of `%$` would be that it is not 
 customizable.

I want to have it customizable. For example I'd like to have 
output, that vertically aligns.

 I've been thinking about this a little. What is your goal?
 Maybe we're talking at cross purposes. I guess you want a 
 format specifier that formats any _built-in_ type in a way that 
 represents the object precisely. In a sense, you want a good 
 `%s` and not a not-really-the-best-effort `%s`.

Yes, that could be said so. I don't know the whole history of 
`%s`, but I think, its first meaning was "string" and this later 
was misunderstood to produce something, that is similar to a 
literal. This makes `%s` a mix. `%D` would take one of these 
meanings away from `%s`, giving it back its original meaning.

 My understanding was you want to represent objects as strings 
 in a way that can be used by the compiler to reconstruct the 
 object,

Well, that's what a source code literal is supposed to be, isn't 
it? But of course it is not limited to this use. People might use 
it to automatically generate asserts for unittests. Or to compare 
output of different runs, where you can be sure, that the 
differences are not due to rounding effects or such things, but 
real differences. Or whatever they want to do with it.

 Thinking about it, you can easily wrap objects in a struct and 
 make it do The Right Thing™. It doesn't complicate the `format` 
 implementation.

Of course there are always workarounds. With that argument you 
can question every function in phobos...

May 08 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Wanted: Format character for source code literal