www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Wanted: Format character for source code literal

reply Berni44 <someone somemail.com> writes:
I plan to add an extension to `std.format`, namely a new format 
character with the meaning of producing a source code literal. Or 
more formally, the following snippet should work for every type 
this extension will support:

```
enum a = <something>;
enum b = mixin(format!"%S"(a));

static assert(a == b && is(typeof(a) == typeof(b)));

```
(Please note, that even for floats `a == b` should hold for all 
values, but NaNs; I plan to use RYU for this.)

The big question is now, which character to use. I thought of 
`%S` like source code literal. Andrei suggested `%D` like D 
literal. Both ideas have the disadvantage of using uppercase 
letters, which would break the of uppercase letters meaning that 
the output uses uppercase instead of lowercase (i.e. 1E10 instead 
of 1e10).

A first idea of a lowercase literal might be `%l` but this might 
easily be confused with `%I` and `%1` (both don't exist); and 
also `l` is used in C's `printf` for `long` which we luckily 
don't need here. Anyway I fear confusion.

What do you think? Which letter would be best?
Apr 30
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]
Actually S is already used. The only specifier that is "free" would be D. But, what's so bad with using L? πŸ€”
Apr 30
parent reply Berni44 <someone somemail.com> writes:
On Friday, 30 April 2021 at 07:20:11 UTC, Imperatorn wrote:
 Actually S is already used.
Huh, where? Not in `std.format`...
Apr 30
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 30 April 2021 at 08:02:44 UTC, Berni44 wrote:
 On Friday, 30 April 2021 at 07:20:11 UTC, Imperatorn wrote:
 Actually S is already used.
Huh, where? Not in `std.format`...
No I meant for [w]printf https://docs.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions
Apr 30
prev sibling next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]
%m in reference to mixin ?
Apr 30
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/30/21 4:10 AM, Patrick Schluter wrote:
 On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. Or more 
 formally, the following snippet should work for every type this 
 extension will support:

 [...]
%m in reference to mixin ?
I like this. I also could be OK with capital D. I know it normally stands for make uppercase, but decimal numbers have no letters in them. -Steve
Apr 30
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 30, 2021 at 07:10:38AM +0000, Berni44 via Digitalmars-d wrote:
 I plan to add an extension to `std.format`, namely a new format
 character with the meaning of producing a source code literal. Or more
 formally, the following snippet should work for every type this
 extension will support:
 
 ```
 enum a = <something>;
 enum b = mixin(format!"%S"(a));
 
 static assert(a == b && is(typeof(a) == typeof(b)));
 ```
[...] What's the scope of this feature? Can <something> be, say, a code literal like a lambda? Can std.format even support printing the function body of a lambda in a way that can be parsed by mixin()? How far do we intend to go with this? Or does this only apply to POD values? I can imagine there'd be problems if you have, say, a class from a different module with private members, possibly with nested private classes, so you couldn't actually reconstruct the class instance from a string alone. Unless the scope is significantly constrained, I see the potential for this feature devolving into a big time-sink that really only caters to a niche use-case. I'd be happy to be proven wrong, though. T -- "You know, maybe we don't *need* enemies." "Yeah, best friends are about all I can take." -- Calvin & Hobbes
Apr 30
parent Berni44 <someone somemail.com> writes:
On Friday, 30 April 2021 at 21:11:55 UTC, H. S. Teoh wrote:
 What's the scope of this feature? Can \<something\> be, say, a 
 code literal like a lambda? Can std.format even support 
 printing the function body of a lambda in a way that can be 
 parsed by mixin()?  How far do we intend to go with this?  Or 
 does this only apply to POD values?

 I can imagine there'd be problems if you have, say, a class 
 from a different module with private members, possibly with 
 nested private classes, so you couldn't actually reconstruct 
 the class instance from a string alone.

 Unless the scope is significantly constrained, I see the 
 potential for this feature devolving into a big time-sink that 
 really only caters to a niche use-case. I'd be happy to be 
 proven wrong, though.
Of course, there are limits to this. I intend to implement it for bools, integers, floats, characters, strings and enums. I think it will also be possible to implement it for (associative) arrays, as long as the elements can be implemented like this. Structs, classes, interfaces and unions should implement it in their `toString` functions. I wouldn't go so far to provide generic code for them. With that we'll indeed run into problems. (Maybe with the exception of structs with default constructor.) I haven't made up my mind about pointers yet. And well functions, delegates and lambdas? Functions aren't supported by `std.format` at all and the implementation of delegates is broken and useless and should be removed (in my opinion).
May 01
prev sibling next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 [...]
If this would be unique to mixins, then %m, if not, L(iteral) or D(literal).
May 01
prev sibling next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal.
Would `%q` work? In reference to [token strings](https://dlang.org/spec/lex.html#token_strings ): ``` enum a = <something>; enum b = mixin(format!"%q"(a)); enum c = mixin(q{<something>}); static assert(a == b && is(typeof(a) == typeof(b))); static assert(c == b && is(typeof(c) == typeof(b))); ``` β€” Bastiaan.
May 01
prev sibling parent reply Q. Schroll <qs.il.paperinik gmail.com> writes:
On Friday, 30 April 2021 at 07:10:38 UTC, Berni44 wrote:
 I plan to add an extension to `std.format`, namely a new format 
 character with the meaning of producing a source code literal. 
 Or more formally, the following snippet should work for every 
 type this extension will support:

 ```D
 enum a = <something>;
 enum b = mixin(format!"%S"(a));

 static assert(a == b && is(typeof(a) == typeof(b)));
 ```

 (Please note, that even for floats `a == b` should hold for all 
 values, but NaNs; I plan to use RYU for this.)

 The big question is now, which character to use. I thought of 
 `%S` like source code literal. Andrei suggested `%D` like D 
 literal. Both ideas have the disadvantage of using uppercase 
 letters, which would break the of uppercase letters meaning 
 that the output uses uppercase instead of lowercase (i.e. 1E10 
 instead of 1e10).

 A first idea of a lowercase literal might be `%l` but this 
 might easily be confused with `%I` and `%1` (both don't exist); 
 and also `l` is used in C's `printf` for `long` which we 
 luckily don't need here. Anyway I fear confusion.

 What do you think? Which letter would be best?
Please don't do this. Format characters can be customized. Any character you'd introduce for it either wouldn't work for some types or break those types' formatting. Why not introduce a new function like `dlangLiteral` that takes the value and returns a string? It can be used in `format` quite easily (like `format("pre %s post", dlangLiteral(<something>));`) and is explicit and not a special case at all.
May 03
parent reply Berni44 <someone somemail.com> writes:
On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 Please don't do this. Format characters can be customized. Any 
 character you'd introduce for it either wouldn't work for some 
 types or break those types' formatting.
I think, this doesn't hurt: The call of a `toString` has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.)
 Why not introduce a new function like `dlangLiteral` that takes 
 the value and returns a string? It can be used in `format` 
 quite easily (like `format("pre %s post", 
 dlangLiteral(<something>));`) and is explicit and not a special 
 case at all.
In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: `format("%s = %s", character('πœ‹'), scientificFloatingPoint(3.14))`. We don't do that, because it's more convenient to write `format("%c = %e", 'πœ‹', 3.14)`. Furthermore, there are all the parameters, that can be applied: `format("%-3c = %+.4e", 'πœ‹', 3.14)` is a simple way to change the formatting. Without it it would become something like `format("%s = %s", character!(true, false, false, false, false, false, 3)('πœ‹'), scientificFloatingPoint!(false, true, false, false, false, false, FormatSpec.UNSPECIFIED, 4)(3.14))`. An other problem will be, when used with arrays, ranges and the like, e.g. you can do something like `format("val = [%(%D,\n %)];", my_array);` to get an output with each value on a separate line. Without this literal you would at least need to map `my_array` using `dlangLiteral` and in generic code this might even cause more trouble.
May 04
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:
 On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 [...]
I think, this doesn't hurt: The call of a `toString` has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.) [...]
Just go with %m if you want lowercase, otherwise you would have to use uppercase specifiers.
May 04
prev sibling parent reply Q. Schroll <qs.il.paperinik gmail.com> writes:
On Tuesday, 4 May 2021 at 07:54:19 UTC, Berni44 wrote:
 On Monday, 3 May 2021 at 23:08:55 UTC, Q. Schroll wrote:
 Please don't do this. Format characters can be customized. Any 
 character you'd introduce for it either wouldn't work for some 
 types or break those types' formatting.
I think, this doesn't hurt: The call of a `toString` has precedence for compound types like structs and classes (and in most of these cases it won't be possible to add a generic literal at all, see my post above). So, if you use some of the predefined qualifiers, the customized version will always be used, even if it has a completely different meaning. (Admittedly it might cause some confusion, if the customized versions are not well documented.)
What you wrote in parentheses is _exactly_ the problem I have with this. Generic code cannot use it because user defined types regularly hook the format. If you use `%D` format, but the type does not support it (say std.typecons.Tuple), it will throw a FormatException. So you're stuck between a rock and a hard place: Give `%D` preference over custom format specifiers rendering those that use `%D` invalid or let `%D` do its custom stuff if *potentially* supported rendering `%D` useless in generic code where most of its use-cases would lie.
 Why not introduce a new function like `dlangLiteral` that 
 takes the value and returns a string? It can be used in 
 `format` quite easily (like `format("pre %s post", 
 dlangLiteral(<something>));`) and is explicit and not a 
 special case at all.
In my opinion, the main idea behind this formatting routines is, to have a simple and short way for formatting output. We could use your idea for every other format character too, like: `format("%s = %s", character('πœ‹'), scientificFloatingPoint(3.14))`. We don't do that, because it's more convenient to write `format("%c = %e", 'πœ‹', 3.14)`.
Yes, you could. But you could use format specifiers like `%-3.8f` *without losses* to get to the same result. And that's *the* difference between introducing a format specifier character that should have generic meaning and introducing, well, anything else. There was no problem introducing separators like `%,3d` and neither would there be a problem introducing `%y` for `int` or `double` (whatever it does), or, for a concrete example, `%S` for `bool` to return `TRUE` instead of `true`. The problem is introducing *generic* format specifier *characters*.
 An other problem will be, when used with arrays, ranges and the 
 like, e.g. you can do something like `format("val = [%(%D,\n
  %)];", my_array);` to get an output with each value on a 
 separate line. Without this literal you would at least need to 
 map `my_array` using `dlangLiteral` and in generic code this 
 might even cause more trouble.
If you want that, you need to allow something that's currently illegal. As a comparison, `%.*f` could be introduced if `*` precision weren't already a thing (compare with `%,3d`) because in any reasonable implementation, `*` instead of precision would be an error. What we could do is special casing `%$` to mean what you want. Currently, no matter what type you're formatting, `%$` is an error in `FormatSpec`. You can give it semantics, no problem, including one that ignores custom formatting. Even better, `%$` looks like it's a special case and not some odd-but-legal custom specifier. Changing the meaning of `%D` begs for trouble.
May 04
parent reply Berni44 <someone somemail.com> writes:
On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
 So you're stuck between a rock and a hard place: Give `%D` 
 preference over custom format specifiers rendering those that 
 use `%D` invalid or let `%D` do its custom stuff if 
 *potentially* supported rendering `%D` useless in generic code 
 where most of its use-cases would lie.
I fear, I can't follow you. Seems like I don't get your point. Maybe you can give an example?
 In my opinion, the main idea behind this formatting routines 
 is, to have a simple and short way for formatting output. We 
 could use your idea for every other format character too, 
 like: `format("%s = %s", character('πœ‹'), 
 scientificFloatingPoint(3.14))`. We don't do that, because 
 it's more convenient to write `format("%c = %e", 'πœ‹', 3.14)`.
Yes, you could. But you could use format specifiers like `%-3.8f` *without losses* to get to the same result.
??? Again I'm stuck. What has `%-3.8f` with what I wrote above to do?
 And that's *the* difference between introducing a format 
 specifier character that should have generic meaning and 
 introducing, well, anything else. There was no problem 
 introducing separators like `%,3d` and neither would there be a 
 problem introducing `%y` for `int` or `double` (whatever it 
 does), or, for a concrete example, `%S` for `bool` to return 
 `TRUE` instead of `true`.

 The problem is introducing *generic* format specifier 
 *characters*.
What is the difference between "generic" (which as far as I understand you oppose) and adding `%D` for bool, integers, floats, characters, strings, arrays and AAs (which you sound as being OK with, and which is, what I plan to do)?
 What we could do is special casing `%$` to mean what you want. 
 Currently, no matter what type you're formatting, `%$` is an 
 error in `FormatSpec`. You can give it semantics, no problem, 
 including one that ignores custom formatting. Even better, `%$` 
 looks like it's a special case and not some odd-but-legal 
 custom specifier.
Using `$` would cause real troubles, because it's already used for positional arguments. What would `format("%1$d", 'a');` be supposed to produce? `'a'd` or `97`?
 Changing the meaning of `%D` begs for trouble.
`%D` has currently no meaning, so we cannot change it; we can just add it. I hope, we can figure this out somehow - I sense, that you've got an important point, but I don't understand it. Seems like we are talking past each other.
May 05
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
 What is the difference between "generic" (which as far as I 
 understand you oppose) and adding `%D` for bool, integers, 
 floats, characters, strings, arrays and AAs (which you sound as 
 being OK with, and which is, what I plan to do)?
[...]
 `%D` has currently no meaning, so we cannot change it; we can 
 just add it.
`%D` *does* currently have a meaning, though. It means "custom format specifier." Here's the scenario that could potentially lead to trouble: 1. Some existing library uses `%D` as a custom format specifier in their `toString` methods, with a meaning other than "format as D source code." 2. `%D` is added to `std.format` with the meaning "format as D source code," and a default implementation for types that do not have custom `toString` methods. 3. A new library is written that takes advantage of (2) and uses `%D` in generic code to format arbitrary values for the purpose of code generation. 4. Someone uses the library from (1) and the library from (3) in the same project, and library (3) ends up producing garbage, because library (1)'s `%D` doesn't work the way library (3) expects it to. The "correct" place to fix this is in library (1), but doing so would require a breaking change. In practice, this means that libraries like the one in (3) will never be able to completely rely on the new standard for `%D`, and will always have to include some kind of workaround in case they are used with types like the ones in library (1).
May 05
parent Berni44 <someone somemail.com> writes:
On Wednesday, 5 May 2021 at 17:02:42 UTC, Paul Backus wrote:
 Here's the scenario that could potentially lead to trouble:

 1. Some existing library uses `%D` as a custom format specifier 
 in their `toString` methods, with a meaning other than "format 
 as D source code."

 2. `%D` is added to `std.format` with the meaning "format as D 
 source code," and a default implementation for types that do 
 not have custom `toString` methods.

 3. A new library is written that takes advantage of (2) and 
 uses `%D` in generic code to format arbitrary values for the 
 purpose of code generation.

 4. Someone uses the library from (1) and the library from (3) 
 in the same project, and library (3) ends up producing garbage, 
 because library (1)'s `%D` doesn't work the way library (3) 
 expects it to.
First of all: Thanks for clarifying. I think, I understand the problem now.
 The "correct" place to fix this is in library (1), but doing so 
 would require a breaking change. In practice, this means that 
 libraries like the one in (3) will never be able to completely 
 rely on the new standard for `%D`, and will always have to 
 include some kind of workaround in case they are used with 
 types like the ones in library (1).
In my opinion, the error is in (3): The new library assumes, that `%D` can be used with every type (and will always have the meaning "D literal"), which in my opinion is wrong: It does not even hold for established characters, for example take `%b`: For bools, integers, characters and enums if their base type is one of the first three, this has currently the meaning "format as unsigned binary number". It currently cannot be used for anything else where `std.format` is responsible for. But of course it can be used in any custom type (be it one of phobos or an external library or what ever). And no one will stop anyone from using it in a completely different way, e.g. as bitmap of the type or whatever. So in my opinion in the above scenario the library in (3) should clearly state in its docs, that it can only be used with code that uses `%D` in the sense of being a "D literal". And the library in (1) should clearly state in its docs, what `%D` means, if it has a meaning. And with that it should be clear, that you cannot use (1) and (3) together in one project, at least not without adding some clue. Now I think, I can go back to this:
 `%D` has currently no meaning, so we cannot change it; we can 
 just add it.
`%D` *does* currently have a meaning, though. It means "custom format specifier."
But doesn't that apply to *every* format specifier?
May 06
prev sibling parent reply Q. Schroll <qs.il.paperinik gmail.com> writes:


On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
 On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
 So you're stuck between a rock and a hard place: Give `%D` 
 preference over custom format specifiers rendering those that 
 use `%D` invalid or let `%D` do its custom stuff if 
 *potentially* supported rendering `%D` useless in generic code 
 where most of its use-cases would lie.
I fear, I can't follow you. Seems like I don't get your point. Maybe you can give an example?
I'm speaking of aggregate types (structs, classes, etc.) that implement `toString` that takes a `FormatSpec` parameter alongside the sink to describe the format according to which it should be formatted. An example is `std.typecons.Tuple` which apart from `%s` accepts `%(...%)` and `%(...%|...%)`. If you try to format it with `%D`, it throws a `FormatException`. But like any aggregate type, it could start accepting `%D` tomorrow. The new `format` implementation could do three things when encountering `%D` for formatting an object of a type with custom formatting: 1. Because it accepts custom formatting, use it, even if it fails (throws `FormatException`). 2. Because it accepts custom formatting `try` it. If it fails (i.e. throws `FormatException`), fall back to non-custom `%D` behavior. (If it succeeds, use the successful result.) 3. Ignore the custom formatting because `%D` is special. None of these solutions is great. 1. means `%D` cannot be relied upon in generic code, i.e. where the type of what you're formatting isn't up to you but someone else. _Relied upon_ means in the way you intend `%D` to be used: A compiler-readable representation of the object. 2. It could fail in other ways. (Still the best.) 3. Breaks code, at least theoretically. Also, even if today no one actually uses `%D`, it might be the perfect match for a future aggregate type, but you blocked it.
 In my opinion, the main idea behind this formatting routines 
 is, to have a simple and short way for formatting output. We 
 could use your idea for every other format character too, 
 like: `format("%s = %s", character('πœ‹'), 
 scientificFloatingPoint(3.14))`. We don't do that, because 
 it's more convenient to write `format("%c = %e", 'πœ‹', 3.14)`.
Yes, you could. But you could use format specifiers like `%-3.8f` *without losses* to get to the same result.
??? Again I'm stuck. What has `%-3.8f` with what I wrote above to do?
Er, you started with scientific notation stuff. My point is that introducing _new constructs_ in the format specification such as width and precision is would not be an issue if it weren't there already, but introducing a format specification _character_ with special meaning is.
 And that's *the* difference between introducing a format 
 specifier character that should have generic meaning and 
 introducing, well, anything else. There was no problem 
 introducing separators like `%,3d` and neither would there be 
 a problem introducing `%y` for `int` or `double` (whatever it 
 does), or, for a concrete example, `%S` for `bool` to return 
 `TRUE` instead of `true`.

 The problem is introducing *generic* format specifier 
 *characters*.
What is the difference between "generic" (which as far as I understand you oppose) and adding `%D` for bool, integers, floats, characters, strings, arrays and AAs (which you sound as being OK with, and which is, what I plan to do)?
Because `%D` for `bool`, integers (note that according to Walter, `bool` is an integer type), `floats`, arrays, and AAs is nothing different from `%s`. The only part where you'd need something different than `%s` is characters, strings. That would be handy to have, I must admit. [You can mimic it using arrays tho](https://run.dlang.io/is/vPOnNx): ```D auto str = format("prefix %s %(%s%) %s postfix", "before", [ "a\nbc" ], "after"); assert(str == `prefix before "a\nbc" after postfix`); ``` And it's almost perfect! It works for character types, numeric types, arrays, and AAs, too. Only for user-defined types, you have no control, because it does what the user-defined `toString` implementation defines `%s` to do. In fact, `%s` might not even work with a user-defined type! It could throw an exception (a `FormatException` if it's reasonable). The only thing it doesn't do is respecting `wstring` and `dstring` literals. I cannot really estimate if that would be a problem, but I guess for the most part, it wouldn't.
 What we could do is special casing `%$` to mean what you want. 
 Currently, no matter what type you're formatting, `%$` is an 
 error in `FormatSpec`. You can give it semantics, no problem, 
 including one that ignores custom formatting. Even better, 
 `%$` looks like it's a special case and not some odd-but-legal 
 custom specifier.
Using `$` would cause real troubles, because it's already used for positional arguments. What would `format("%1$d", 'a');` be supposed to produce? `'a'd` or `97`?
The `$` only has that meaning if it's preceded by a number. `%`*N*`$`*…c* has a meaning for *N* a number and *c* a character possibly preceded by other formatting stuff. But `%$` is undefined in the sense that it is an error to use it.
 Changing the meaning of `%D` begs for trouble.
`%D` has currently no meaning, so we cannot change it; we can just add it.
`%D` *potentially* has a meaning for existing (or future) user-defined types. On the other hand, `%$` has not, because it's not up to a user-defined type to define its meaning but to `format` (`FormatSpec` to be precise) because currently, `FormatSpec` does not support `%$` to begin with.
 I hope, we can figure this out somehow - I sense, that you've 
 got an important point, but I don't understand it. Seems like 
 we are talking past each other.
I guess you thought primarily about the built-in types while I primarily thought about user-defined types. I'm happy to clarify. Now, let's talk about the implementation. It's far easier to talk about that in terms of a function. Let's call it `unMixin` because the goal is that `mixin(unMixin(obj))` results in `obj` or a copy of `obj`. On the other hand, we cannot expect `unMixin(mixin(str))` to return `str` because `str` could contain unnecessary information and even if it doesn't, it can contain context-dependent information that `unMixin` cannot generally retrieve. Simplest example: If `unMixin(1)` returns `"1"`, we're good for `1`. If it returns `"cast(int) 1"`, we're also good.
May 05
next sibling parent Q. Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:


 Now, let's talk about the implementation. It's far easier to 
 talk about that in terms of a function. Let's call it `unMixin` 
 because the goal is that `mixin(unMixin(obj))` results in `obj` 
 or a copy of `obj`. On the other hand, we cannot expect 
 `unMixin(mixin(str))` to return `str` because `str` could 
 contain unnecessary information and even if it doesn't, it can 
 contain context-dependent information that `unMixin` cannot 
 generally retrieve.

 Simplest example: If `unMixin(1)` returns `"1"`, we're good for 
 `1`. If it returns `"cast(int) 1"`, we're also good.
I've done some experiments and the results are mixed. The easiest by far is `typeof(null)`. For [scalar types](https://dlang.org/library/std/traits/is_scalar_type.html) and strings, the aforementioned `%(%s%)` can be used. Pointers and slices aren't too hard either. For structs without a constructor, `unMixin` is actually easy; if it has a constructor, the object cannot be described by a constructor call since who knows what the constructor does and maybe there isn't even a simple constructor call that will result in the given object. It can be done, but it's ugly and hacky. Because unions can have sub-structs and stuff, I gave up on them. I have not too much experience with D's classes, but from my estimation, it cannot be done. It looks like you need `typeid` at compile-time (at CTFE to be precise) which isn't available. My take on it so far: https://run.dlang.io/gist/c98ef765cb8921595d5e41fc11c89ca7?args=-unittest%20-main
May 05
prev sibling parent reply Berni44 <someone somemail.com> writes:
On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:
 I guess you thought primarily about the built-in types while I 
 primarily thought about user-defined types. I'm happy to 
 clarify.
Yes, thank you. That already helped a lot, although I fear, we still don't agree on most of the points with regard to the content... :-s
 The new `format` implementation could do three things when 
 encountering `%D` for formatting an object of a type with 
 custom formatting:
For me, this seems to be the wrong way to think about it. `format` doesn't encounter specifiers, but objects (in the wider sense). And in case of structs, classes and so on it delegates the handling of formatting to them, without even looking at the specifier (with the exception of `%s` which sometimes plays a special role). It's then up to that struct or class to define the meaning of `%D` for that specific struct or class.
 note that according to Walter, `bool` is an integer type
Yeah, but `std.format` handles them in a special `formatValueImpl`, that's why I treat them separately.
 Because `%D` for `bool`, integers ([...]), `floats`, arrays, 
 and AAs is nothing different from `%s`.
That's not true: bytes need a cast, longs a trailing 'L', like reals, floating point numbers are truncated with `%s` and don't provide the correct value and so on. There are a lot of subtle differences and that's why I think it would be a good thing to have this new format character.
 The only part where you'd need something different than `%s` is 
 characters, strings. That would be handy to have, I must admit. 
 You can mimic it using arrays tho
That was actually the starting point for me that led me to a desire for having `%D`: `%s` for arrays tries to mimic the intended result of `%D` (but fails at several places to do so correctly) and therefore treats characters and strings special. This led to the abuse of the `-`-flag (in `"%-(...%)`) which now causes a lot of problems. I thought long about how this could be fixed: With `%D` available, there would be a smoother transition be possible, because people using `%s` inside of `%(...%)` could just replace it with `%D` to get the current result and that eventually will make it possible to give `%s` (and the `-`-flag) its correct meaning back. (Of course this still needs deprecation cycles and maybe a preview switch or what else - it's still not easy.)
 And it's almost perfect! It works for character types, numeric 
 types, arrays, and AAs, too.
As I wrote above: That might look so at first sight, but it isn't the case.
 The `$` only has that meaning if it's preceded by a number. 
 `%`*N*`$`*…c* has a meaning for *N* a number and *c* a 
 character possibly preceded by other formatting stuff. But `%$` 
 is undefined in the sense that it is an error to use it.
But people will start to use it with width and other parameters and will report issues. Let along, that it will complicate the format spec parser significantly and thus might even introduce more bugs. I'm sorry, but with `%$` you'll opening the box of pandora.
 Now, let's talk about the implementation.
Sorry, but as long as we do not even agree on the goal, this is not useful.
May 06
parent reply Q. Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 6 May 2021 at 08:49:16 UTC, Berni44 wrote:
 On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:
 The new `format` implementation could do three things when 
 encountering `%D` for formatting an object of a type with 
 custom formatting:
For me, this seems to be the wrong way to think about it. `format` doesn't encounter specifiers, but objects (in the wider sense). And in case of structs, classes and so on it delegates the handling of formatting to them, without even looking at the specifier (with the exception of `%s` which sometimes plays a special role).
The role of `%s` is special, but not too special either. It just gives a best effort result where other formats would just fail. The task to return a string representation that can be interpreted back is nothing to be delegated to a user-defined routine.
 It's then up to that struct or class to define the meaning of 
 `%D` for that specific struct or class.
This makes `%D` unreliable for meta-programming. And this is _the_ problem I have with this, because creating a compiler-readable string from an object is a meta-programming tool. I have no idea _what else_ you'd even do with it. Here's the showstopper: Adding a `toString` that accepts format specifiers becomes a potentially breaking change as it will change the meaning of `%D` silently.
 Because `%D` for `bool`, integers ([...]), `floats`, arrays, 
 and AAs is nothing different from `%s`.
That's not true: bytes need a cast, longs a trailing 'L',
It depends what you want to do with it. If you want the immediate type of the literal to be what you plugged in, then yes. If being equal suffices, `"1"` and `"true"` are the same.
 like reals, floating point numbers are truncated with `%s` and 
 don't provide the correct value
_That,_ on the other hand, _is_ a problem. I don't know how big that problem practically is because `real` cannot even be formatted at CTFE and `double` and `float` aren't that common of things at compile-time. I guess the only sane result for floating point values is `%a` with sufficient digits anyways and that is largely apart from `%s` even if you add a gigantic precision. It's a breaking change fixing `%s` for floating point values in the sense that the representation consists of enough decimals to accurately represent the number.
 and so on. There are a lot of subtle differences
The problem of strings and chars is obvious, the case for exact types is, too. Floating point types didn't cross my mind, but please elaborate, what else is it? I'm honestly interested. If `%(%s%)` does not give you proper char or string, I'd consider it a bug.
 and that's why I think it would be a good thing to have this 
 new format character.
I agree with you that a new format is necessary to achieve this if done with a format character to begin with. I do question whether format characters are the right approach. To me, this looks more like a code generation tool than value formatting.
 The only part where you'd need something different than `%s` 
 is characters, strings. That would be handy to have, I must 
 admit. You can mimic it using arrays tho
That was actually the starting point for me that led me to a desire for having `%D`: `%s` for arrays tries to mimic the intended result of `%D` (but fails at several places to do so correctly) and therefore treats characters and strings special. This led to the abuse of the `-`-flag (in `"%-(...%)`) which now causes a lot of problems. I thought long about how this could be fixed: With `%D` available, there would be a smoother transition be possible, because people using `%s` inside of `%(...%)` could just replace it with `%D` to get the current result and that eventually will make it possible to give `%s` (and the `-`-flag) its correct meaning back. (Of course this still needs deprecation cycles and maybe a preview switch or what else - it's still not easy.)
The `%-(...%)` a hack, but it can be questioned whether removing it is even worth the trouble. It just breaks things. The minus has otherwise no meaning for arrays. It's just weird.
 And it's almost perfect! It works for character types, numeric 
 types, arrays, and AAs, too.
As I wrote above: That might look so at first sight, but it isn't the case.
Right. I was a little enthusiastic about it.
 The `$` only has that meaning if it's preceded by a number. 
 `%`*N*`$`*…c* has a meaning for *N* a number and *c* a 
 character possibly preceded by other formatting stuff. But 
 `%$` is undefined in the sense that it is an error to use it.
But people will start to use it with width and other parameters and will report issues. Let along, that it will complicate the format spec parser significantly and thus might even introduce more bugs. I'm sorry, but with `%$` you'll opening the box of pandora.
It requires a single check: Is the `%` character followed by `$`? The whole point of `%$` would be that it is not customizable. You cannot add any specification. If something comes before `$`, it isn't `%$`, and if something comes behind, it's not part of the format specifier, but just text. --- I've been thinking about this a little. What is your goal? Maybe we're talking at cross purposes. I guess you want a format specifier that formats any _built-in_ type in a way that represents the object precisely. In a sense, you want a good `%s` and not a not-really-the-best-effort `%s`. My understanding was you want to represent objects as strings in a way that can be used by the compiler to reconstruct the object, and for what else than meta-programming would one do that? It's in a sense trivial for built-in types because it's a finite set of types. Thinking about it, you can easily wrap objects in a struct and make it do The Right Thingβ„’. It doesn't complicate the `format` implementation.
May 07
parent Berni44 <someone somemail.com> writes:
On Friday, 7 May 2021 at 23:51:13 UTC, Q. Schroll wrote:
 looking at the specifier (with the exception of `%s` which 
 sometimes plays a special role).
The role of `%s` is special, but not too special either. It just gives a best effort result where other formats would just fail.
That's not, what I meant. What I meant was, that some custom `toString` versions are only called, when `%s` is used (all, that do not take the format string or a `FormatSpec` as parameter).
 The task to return a string representation that can be 
 interpreted back is nothing to be delegated to a user-defined 
 routine.
Contrary, it's the only place where this can be done. The routines in `std.format` cannot know, what these objects need to be constructed. Maybe it is not even possible at all.
 _That,_ on the other hand, _is_ a problem. I don't know how big 
 that problem practically is because `real` cannot even be 
 formatted at CTFE
It can. I wrote that code. It's part of master since about two weeks.
 I guess the only sane result for floating point values is `%a` 
 with sufficient digits anyways
That's one reason, why I want to add `%D`: For floating point values I'd like to implement RYU (or something similar), which guarantees to emit a value, that produces exactly the same result, when read in.
 It's a breaking change fixing `%s` for floating point values in 
 the sense that the representation consists of enough decimals 
 to accurately represent the number.
And that's the reason, why I want to add `%D` and not to change `%s`.
 If `%(%s%)` does not give you proper char or string, I'd 
 consider it a bug.
Please define "proper char or string" first.
 I agree with you that a new format is necessary to achieve this 
 if done with a format character to begin with. I do question 
 whether format characters are the right approach. To me, this 
 looks more like a code generation tool than value formatting.
Sounds like you are thinking about a serialization tool or something like this. That's not what I plan. I think just about formatting values.
 The `%-(...%)` a hack, but it can be questioned whether 
 removing it is even worth the trouble.
If I were the only one to decide, I would remove it immediately, because in my eyes it is a bug. An yes, it's worth the trouble.
 It just breaks things.
That's why I haven't filed a PR yet. But I'm looking forward to a possibility to change this. And again here `%D` will help.
 The minus has otherwise no meaning for arrays.
It has: Left justification instead of right justification. It is just not/buggy implemented.
 It requires a single check: Is the `%` character followed by 
 `$`? The whole point of `%$` would be that it is not 
 customizable.
I want to have it customizable. For example I'd like to have output, that vertically aligns.
 I've been thinking about this a little. What is your goal?
 Maybe we're talking at cross purposes. I guess you want a 
 format specifier that formats any _built-in_ type in a way that 
 represents the object precisely. In a sense, you want a good 
 `%s` and not a not-really-the-best-effort `%s`.
Yes, that could be said so. I don't know the whole history of `%s`, but I think, its first meaning was "string" and this later was misunderstood to produce something, that is similar to a literal. This makes `%s` a mix. `%D` would take one of these meanings away from `%s`, giving it back its original meaning.
 My understanding was you want to represent objects as strings 
 in a way that can be used by the compiler to reconstruct the 
 object,
Well, that's what a source code literal is supposed to be, isn't it? But of course it is not limited to this use. People might use it to automatically generate asserts for unittests. Or to compare output of different runs, where you can be sure, that the differences are not due to rounding effects or such things, but real differences. Or whatever they want to do with it.
 Thinking about it, you can easily wrap objects in a struct and 
 make it do The Right Thingβ„’. It doesn't complicate the `format` 
 implementation.
Of course there are always workarounds. With that argument you can question every function in phobos...
May 08