www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9471] New: std.xml uses DbC to validate XML syntax

http://d.puremagic.com/issues/show_bug.cgi?id=9471

           Summary: std.xml uses DbC to validate XML syntax
           Product: D
           Version: D2
          Platform: All
               URL: http://dlang.org/phobos/std_xml.html
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: smjg iname.com


--- Comment #0 from Stewart Gordon <smjg iname.com> 2013-02-07 15:15:11 PST ---
Take a look at the std.xml documentation:

class Document

this(string s);
    Constructs a Document by parsing XML text.

    This function creates a complete DOM (Document Object Model) tree.

    The input to this function MUST be valid XML. This is enforced by
DocumentParser's in contract.


The DocumentParser constructor's doc states something similar.

This is absurd.  Normally, XML data comes from a file or an external process. 
Validation of data from outside the program doesn't belong in an in contract. 
It is part of what the XML parser needs to do as part of its normal operation. 
But instead, XML code validity has become a prerequisite for being allowed to
put it through the XML parser in the first place.

This leads to a need to call std.xml.check on the contents of an XML file
before constructing a Document.

check(xml);
Document data = new Document(xml);

As well as having to do manually what should be done automatically as part of
the process, it means that (in a development build) the XML is parsed three
times:
- through the call to check in the app code
- through the call to check in DocumentParser's constructor's in contract
- through the Document constructor as it is actually building the DOM.

This shouldn't be necessary.  Validity should be checked automatically while
parsing the XML to build the DOM.  This would mean that the XML is parsed only
once, which is much more efficient as well as being a first step towards
enabling the XML to be read from a stream and parsed on the fly.

And it should throw a normal exception if it fails, not an assertion failure. 
I haven't taken the time to figure out what actually does happen if malformed
XML is passed in in a release build.  But I don't suppose that it errors out
gracefully in the general case.

The call to check needs to be removed from the in contract, and the XML parser
code improved so that all cases of invalidity throw an exception.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 07 2013