www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9621] New: std.conv.parseEscape fails on octals and named

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621

           Summary: std.conv.parseEscape fails on octals and named
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: monarchdodra gmail.com


--- Comment #0 from monarchdodra gmail.com 2013-03-01 02:27:46 PST ---
D allows this:

void main()
{
  string s1 = "\&";
  string s2 = "\141";

  assert(s1 == "&");
  assert(s2 == "a");
}

But parse doesn't allow it (not supported in parse escape).
//----
void main()
{
  string s1 =
  `[
    "\&",
    "\141"
  ]`;
  writeln(parse!(string[])(s1));
}
//----
Can't parse string: Unknown escape character &
Can't parse string: Unknown escape character 1

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 01 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com


--- Comment #1 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-03-01
02:59:43 PST ---
Is it documented anywhere that std.conv.parse should follow D lexer conventions
on parsing??

If not I guess we shouldn't pretend it does and pull the whole freaking table
of HTML4/5 entities in *every* program that uses parse to read a couple of
ints.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #2 from monarchdodra gmail.com 2013-03-01 03:27:10 PST ---
(In reply to comment #1)
 Is it documented anywhere that std.conv.parse should follow D lexer conventions
 on parsing??

Well it's kind of implied, isn't it? Why would parse follow a convention other than D's ? No it's not documented, but I do remember somewhere in the threads that Jonathan (I thin it was him), specifically saying that the idea is that it allowed parsing pretty much anything that's valid D.
 If not I guess we shouldn't pretend it does and pull the whole freaking table
 of HTML4/5 entities in *every* program that uses parse to read a couple of
 ints.

I Disagree because the function *is* named parse, and is capable of parsing a string, and returning the object parsed (in this case a string). If "\&quot;" is a valid D string, then I'd expect parse to not choke on it. As long as the user is parsing string to int, then no, he shouldn't need it, but if the parse outcome is a string, there is no excuse to not do it right. Shouldn't the fact that the table would only ever be used in a template function (parse) mean the compiler should be able to know whether or not to link with said table? Or would importing std.conv immediately link in the table into the final executable? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #3 from monarchdodra gmail.com 2013-03-01 03:30:39 PST ---
(In reply to comment #1)
 If not I guess we shouldn't pretend it does and pull the whole freaking table
 of HTML4/5 entities in *every* program that uses parse to read a couple of
 ints.

How does std.uni does it? I mean, in the case I want to know if unicode character is white, does it mean I'll have to pull the entire unicode tables for isUpper etc. etc. etc. I'm not trying to justify by comparison, but trying to see how other modules work with this "problem". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #4 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-03-01
04:12:34 PST ---
(In reply to comment #3)
 (In reply to comment #1)
 If not I guess we shouldn't pretend it does and pull the whole freaking table
 of HTML4/5 entities in *every* program that uses parse to read a couple of
 ints.

How does std.uni does it?

That's why I'm increasinlgy against of adding tables that are hidden behind opaque interface. I feel uneasy about it. That's why I exposed all I ould about tables & predefined sets in std.uni. For instance any set is usable not only for std.uni puprposes. I also took tremendous effort to not include tables unless user code needs them and will seek new ways to avoid it. Having a dead HTML5 entity table burried beneath innocently looking function is NOT good enough. If we do it there HAS to be a way to tap into HTML entities so that people wouldn't have to include the VERY SAME table twice should they need full access to HTML5 entities.
 I mean, in the case I want to know if unicode character is white, does it mean
 I'll have to pull the entire unicode tables for isUpper etc. etc. etc.

Something I'm going to change. Technically there is no reason to pull these tables. Also in case of parse the cost to benefit is far greater since if you use isXXX you surely need the table, period. In case of parse you may easily never hit escape sequence or even mean to unescape it in your data but you'd pay all the same.
 I'm not trying to justify by comparison, but trying to see how other modules
 work with this "problem".

I thought std.conv.parse goal was closer to sscanf of C. In other words that it's a backbone behind the formattedRead, readf etc. If the goal is to parse whatever D strings are I fail to see the use case as e.g. std.d.lexer would 100% likely to use its own tricks to process escapes etc. to be more efficient. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #5 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-03-01
04:13:40 PST ---
 Something I'm going to change. Technically there is no reason to pull these
 tables. Also in case of parse the cost to benefit is far

I've meant lower, obviously.
 since if you
 use isXXX you surely need the table, period. In case of parse you may easily
 never hit escape sequence or even mean to unescape it in your data but you'd
 pay all the same.

-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #6 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-03-01
04:33:15 PST ---
(In reply to comment #5)
 Something I'm going to change. Technically there is no reason to pull these
 tables. Also in case of parse the cost to benefit is far

I've meant lower, obviously.

Looks like I'm on streak... for std.conv.parse it's *higher* cost to benefit ratio after all. Sorry for the confusion. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9621



--- Comment #7 from monarchdodra gmail.com 2013-03-01 04:50:56 PST ---
(In reply to comment #4)
 I thought std.conv.parse goal was closer to sscanf of C. In other words that
 it's a backbone behind the formattedRead, readf etc.

I guess the whole discussion boils down to rather "what should/does formattedRead" accept then? Given the fact that it is "higher order" and capable of parsing arrays of stuff, what happens what it parses a string that represents an array of strings? I mean, imagine this program: string s1 = ... string s2[]; formattedRead(s1, "%s", &s2); The question is: What are legal s1 values? s1 = `["a", "b"]`; => ["a", "b"] s1 = `["a", "b", ]`; => ["a", "b"] (1) s1 = `["ab", ['a', 'b']]` => ["ab", "ab"] s1 = `["\t", "\n"]`; => ["\t", "\n"] s1 = `["\0"]`; => ["\0"] (2) s1 = `["\141"]`; => ["a"] s1 = `["\x61"]`; => ["a"] s1 = `["\u0061"]`; => ["a"] s1 = `["\U00000061"]`; => ["a"] s1 = `["\&amp;"]`; => ["&"] (3) (1) //Not currently supported (2) //Not currently supported (3) //Not currently supported Unless formatted read can document what it can(should) and doesn't support, we'll just run around in circles... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2013