www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Escaping control in formatting

reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
I've never used new excellent range formatting syntax by Kenji Hara 
until now. And I've met with difficulties, because "%(%(%c%), %)" is the 
most common format for string array for me and it neither obvious nor 
elegant. It occurs that "%c" disables character escaping. What the hell? 
Why? Not obvious at all.

So I think it will be good to add 'Escaping' part after 'Precision' in 
format specifications:

Escaping:
   empty
   !-
   !+
   !'
   !"
   !?'
   !?"
   !?!

Escaping affect formatting depending on the specifier as follows.

Escaping    Semantics
   !-      disable escaping, for a range it also disables [,]
   !+      enable escaping using single quotes for chars and double 
quotes for strings
   !'      enable escaping using single quotes
   !"      enable escaping using double quotes
   !?'     like !' but without adding the quotes and [,] for a range
   !?"     like !" but without adding the quotes and [,] for a range
   !?!     enable escaping, both single and double quotes will be 
escaped without adding any quotes and [,] for a range

Escaping is enabled by default only for associative arrays, ranges (not 
strings), user-defined types, and all its sub-elements.

I'd like to remove "%c"'s ability to magically disable escaping and it 
looks possible until it is documented.

Look at the example:
---
import std.stdio;

void main() {
     writeln("    char");
     char c = '\'';
     writefln("unescaped: %s."  ,   c  );
     writefln(`escaped+': %(%).`, [ c ]); // proposal: %!+s or %!'s
     writefln(`escaped+": %(%).`, [[c]]); // proposal: %!"s
     writeln (`  escaped: \t.`);          // proposal: %!?'s
     writeln();
     writeln("    string");
     string s = "a\tb";
     writefln("unescaped: %s."  ,  s );
     writefln(`escaped+": %(%).`, [s]); // proposal: %!+s or %!"s
     writeln (`  escaped: a\tb.`);      // proposal: %!?"s
     writeln();
     writeln("    strings");
     string[] ss = ["a\tb", "cd"];
     writefln("unescaped: %(%(%c%)%).", ss); // proposal: %!-s
     writefln(`escaped+": %(%).`      , ss);
     writeln (`  escaped: a\tbcd.`    , ss); // proposal: %!?"s
}
---

If it will be accepted, I can volunteer to try to implement it. If not, 
escaping should be at least documented (and do not forget about "%c"'s 
magic!).

Any thoughts?

P.S.
If it has already been discussed, please give me a link.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij
Apr 23 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 23.04.2012 16:36, Denis Shelomovskij wrote:
 I've never used new excellent range formatting syntax by Kenji Hara
 until now. And I've met with difficulties, because "%(%(%c%), %)" is the
 most common format for string array for me and it neither obvious nor
 elegant. It occurs that "%c" disables character escaping. What the hell?
 Why? Not obvious at all.

Does %(%s, %) not work? [snip] -- Dmitry Olshansky
Apr 23 2012
prev sibling next sibling parent reply kenji hara <k.hara.pg gmail.com> writes:
2012$BG/(B4$B7n(B23$BF|(B21:36 Denis Shelomovskij
<verylonglogin.reg gmail.com>:
 I've never used new excellent range formatting syntax by Kenji Hara until
 now. And I've met with difficulties, because "%(%(%c%), %)" is the most
 common format for string array for me and it neither obvious nor elegant. It
 occurs that "%c" disables character escaping. What the hell? Why? Not
 obvious at all.

 So I think it will be good to add 'Escaping' part after 'Precision' in
 format specifications:

 Escaping:
  empty
  !-
  !+
  !'
  !"
  !?'
  !?"
  !?!

 Escaping affect formatting depending on the specifier as follows.

 Escaping    Semantics
  !-      disable escaping, for a range it also disables [,]
  !+      enable escaping using single quotes for chars and double quotes for
 strings
  !'      enable escaping using single quotes
  !"      enable escaping using double quotes
  !?'     like !' but without adding the quotes and [,] for a range
  !?"     like !" but without adding the quotes and [,] for a range
  !?!     enable escaping, both single and double quotes will be escaped
 without adding any quotes and [,] for a range

 Escaping is enabled by default only for associative arrays, ranges (not
 strings), user-defined types, and all its sub-elements.

 I'd like to remove "%c"'s ability to magically disable escaping and it looks
 possible until it is documented.

 Look at the example:
 ---
 import std.stdio;

 void main() {
    writeln("    char");
    char c = '\'';
    writefln("unescaped: %s."  ,   c  );
    writefln(`escaped+': %(%).`, [ c ]); // proposal: %!+s or %!'s
    writefln(`escaped+": %(%).`, [[c]]); // proposal: %!"s
    writeln (`  escaped: \t.`);          // proposal: %!?'s
    writeln();
    writeln("    string");
    string s = "a\tb";
    writefln("unescaped: %s."  ,  s );
    writefln(`escaped+": %(%).`, [s]); // proposal: %!+s or %!"s
    writeln (`  escaped: a\tb.`);      // proposal: %!?"s
    writeln();
    writeln("    strings");
    string[] ss = ["a\tb", "cd"];
    writefln("unescaped: %(%(%c%)%).", ss); // proposal: %!-s
    writefln(`escaped+": %(%).`      , ss);
    writeln (`  escaped: a\tbcd.`    , ss); // proposal: %!?"s
 }
 ---

 If it will be accepted, I can volunteer to try to implement it. If not,
 escaping should be at least documented (and do not forget about "%c"'s
 magic!).

 Any thoughts?

Please give us use cases. I cannot imagine why you want to change/remove quotations but keep escaped contents.
 P.S.
 If it has already been discussed, please give me a link.

As far as I know, there is not yet discussions. Kenji Hara
Apr 23 2012
parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
23.04.2012 18:54, kenji hara написал:
 Please give us use cases. I cannot imagine why you want to
 change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO). An example is generating a complicated string for C/C++: --- myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`, str1, str2, str3) --- -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 23 2012
parent reply Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
23.04.2012 21:15, kenji hara написал:
 2012年4月24日1:14 Denis Shelomovskij<verylonglogin.reg gmail.com>:
 23.04.2012 18:54, kenji hara написал:

 Please give us use cases. I cannot imagine why you want to
 change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO). An example is generating a complicated string for C/C++: --- myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`, str1, str2, str3) --- -- Денис В. Шеломовский Denis V. Shelomovskij

During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier. Example: import std.format, std.array; auto aa = [1:"hello", 2:"world"]; auto writer = appender!string(); formattedWrite(writer, "%s", aa); aa = null; auto output = writer.data; formattedRead(output, "%s",&aa); // same format specifier assert(aa == [1:"hello", 2:"world"]); More details: https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264 I call this "reflective formatting", and it supports simple text based serialization and de-serialization. Automatic quotation/escaping for nested elements is necessary for the feature. But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively. For these reasons, your suggestion is hard to accept. Kenji Hara

Is there sum misunderstanding? Reflective formatting is good! But it isn't what you always want. It is needed mostly for debug purposes. But debugging is one of two usings of formatting, the second one is just writing something somewhere. There are already some non-reflective constructs (like "%(%(%c%), %)" for a range and "X%sY%sZ" for strings) and I just propose adding more comfortable ones because every second time I use formatting I use it for writing (I mean not for debugging). -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 23 2012
parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
23.04.2012 21:49, Denis Shelomovskij написал:
 23.04.2012 21:15, kenji hara написал:
 2012年4月24日1:14 Denis Shelomovskij<verylonglogin.reg gmail.com>:
 23.04.2012 18:54, kenji hara написал:

 Please give us use cases. I cannot imagine why you want to
 change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO). An example is generating a complicated string for C/C++: --- myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`, str1, str2, str3) --- -- Денис В. Шеломовский Denis V. Shelomovskij

During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier. Example: import std.format, std.array; auto aa = [1:"hello", 2:"world"]; auto writer = appender!string(); formattedWrite(writer, "%s", aa); aa = null; auto output = writer.data; formattedRead(output, "%s",&aa); // same format specifier assert(aa == [1:"hello", 2:"world"]); More details: https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264 I call this "reflective formatting", and it supports simple text based serialization and de-serialization. Automatic quotation/escaping for nested elements is necessary for the feature. But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively. For these reasons, your suggestion is hard to accept. Kenji Hara

Is there sum misunderstanding? Reflective formatting is good! But it isn't what you always want. It is needed mostly for debug purposes. But debugging is one of two usings of formatting, the second one is just writing something somewhere. There are already some non-reflective constructs (like "%(%(%c%), %)" for a range and "X%sY%sZ" for strings) and I just propose adding more comfortable ones because every second time I use formatting I use it for writing (I mean not for debugging).

Completely forgot. %!+s in my proposal is exactly for reflective formatting (e.g. "X%!+sY%!+sZ" in reflective for strings). -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 23 2012
prev sibling next sibling parent kenji hara <k.hara.pg gmail.com> writes:
2012$BG/(B4$B7n(B24$BF|(B1:14 Denis Shelomovskij
<verylonglogin.reg gmail.com>:
 23.04.2012 18:54, kenji hara $B'_'Q'a'Z'c'Q'](B:

 Please give us use cases. I cannot imagine why you want to
 change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO). An example is generating a complicated string for C/C++: --- myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`, str1, str2, str3) --- -- $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B Denis V. Shelomovskij

During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier. Example: import std.format, std.array; auto aa = [1:"hello", 2:"world"]; auto writer = appender!string(); formattedWrite(writer, "%s", aa); aa = null; auto output = writer.data; formattedRead(output, "%s", &aa); // same format specifier assert(aa == [1:"hello", 2:"world"]); More details: https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264 I call this "reflective formatting", and it supports simple text based serialization and de-serialization. Automatic quotation/escaping for nested elements is necessary for the feature. But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively. For these reasons, your suggestion is hard to accept. Kenji Hara
Apr 23 2012
prev sibling next sibling parent kenji hara <k.hara.pg gmail.com> writes:
2012$BG/(B4$B7n(B24$BF|(B2:49 Denis Shelomovskij
<verylonglogin.reg gmail.com>:
 23.04.2012 21:15, kenji hara $B'_'Q'a'Z'c'Q'](B:
 2012$BG/(B4$B7n(B24$BF|(B1:14 Denis
Shelomovskij<verylonglogin.reg gmail.com>:
 23.04.2012 18:54, kenji hara $B'_'Q'a'Z'c'Q'](B:


 Please give us use cases. I cannot imagine why you want to
 change/remove quotations but keep escaped contents.

Sorry, I should mention that !' and !" are optional and aren't commonly used, and all !?* are very optional and are here just for completeness (IMHO). An example is generating a complicated string for C/C++: --- myCppFile.writefln(`tmp = "%!?"s, and %!?"s, and even %!?"s";`, str1, str2, str3) --- -- $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B Denis V. Shelomovskij

During my improvements of std.format module, I have decided a design. If you format some values with a format specifier, you should unformat the output with same format specifier. Example: import std.format, std.array; auto aa = [1:"hello", 2:"world"]; auto writer = appender!string(); formattedWrite(writer, "%s", aa); aa = null; auto output = writer.data; formattedRead(output, "%s",&aa); // same format specifier assert(aa == [1:"hello", 2:"world"]); More details: https://github.com/D-Programming-Language/phobos/blob/master/std/format.d#L3264 I call this "reflective formatting", and it supports simple text based serialization and de-serialization. Automatic quotation/escaping for nested elements is necessary for the feature. But your proposal will break this design very easy, and it is impossible to unformat the outputs reflectively. For these reasons, your suggestion is hard to accept. Kenji Hara

Is there sum misunderstanding? Reflective formatting is good! But it isn't what you always want. It is needed mostly for debug purposes. But debugging is one of two usings of formatting, the second one is just writing something somewhere. There are already some non-reflective constructs (like "%(%(%c%), %)" for a range and "X%sY%sZ" for strings) and I just propose adding more comfortable ones because every second time I use formatting I use it for writing (I mean not for debugging). -- $B'%'V'_'Z'c(B $B'#(B. $B':'V']'`'^'`'S'c'\'Z'[(B Denis V. Shelomovskij

My concern is that the proposal is much complicated and less useful for general use cases. You can emulate such formatting like follows: import std.array, std.format, std.stdio; import std.range, std.uni; void main() { auto strs = ["It's", "\"world\""]; { // emulation of !?" auto w = appender!string(); foreach (s; strs) formatStrWithEscape(w, s, '"'); writeln(w.data); } { // emulation of !?' auto w = appender!string(); foreach (s; strs) formatStrWithEscape(w, s, '\''); writeln(w.data); } } void formatStrWithEscape(W)(W writer, string str, char quote) { writer.put(quote); foreach (dchar c; str) formatChar(writer, c, quote); writer.put(quote); } // copy from std.format void formatChar(Writer)(Writer w, in dchar c, in char quote) { if (std.uni.isGraphical(c)) { if (c == quote || c == '\\') put(w, '\\'), put(w, c); else put(w, c); } else if (c <= 0xFF) { put(w, '\\'); switch (c) { case '\a': put(w, 'a'); break; case '\b': put(w, 'b'); break; case '\f': put(w, 'f'); break; case '\n': put(w, 'n'); break; case '\r': put(w, 'r'); break; case '\t': put(w, 't'); break; case '\v': put(w, 'v'); break; default: formattedWrite(w, "x%02X", cast(uint)c); } } else if (c <= 0xFFFF) formattedWrite(w, "\\u%04X", cast(uint)c); else formattedWrite(w, "\\U%08X", cast(uint)c); } I can agree changing private functions in std.format, e.g. formatChar, to public undocumented, but cannot agree adding such complicated rule into supported format specifier. Kenji Hara
Apr 23 2012
prev sibling parent "Denis Shelomovskij" <verylonglogin.reg gmail.com> writes:
On Tuesday, 24 April 2012 at 04:55:34 UTC, kenji hara wrote:
 My concern is that the proposal is much complicated and less 
 useful
 for general use cases.
 You can emulate such formatting like follows:

IMHO addition of %!+s and %!-s alone and removing %c's magic will only simplify formatting for the user. It was hard (for me) to understand current escaping rules because it's undocumented and looks dissonant (for me) because of the fact that escaping is a part of formatting but user is unable to control it unless magical %c is used. I agree that !', !", and !?* of course aren't commonly used as I have already written. Personally I don't need them at all. But this is a common pattern for me: `xformat("My pets: %(%!-s, %)", petsAsStrings)`. And "My pets: %(%(%c%), %)" is too complicated, dissonant and not general (will not work if I'll give it pets as int[] e.g.) that I never use it. I use `.joiner(", ")` instead and every time I do it I think that something is really wrong with array formatting in Phobos. -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 24 2012