www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 7519] New: std.xml cannot manage single quoted attribute values

reply d-bugmail puremagic.com writes:

           Summary: std.xml cannot manage single quoted attribute values
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: michaelrynn optusnet.com.au

07:29:15 PST ---
Search for std.xml on google, and you will get a "top answer: Don't use
Nevertheless, I've put up on review list, some candidates for a xml tool set
for D.  

Using my experience in building an xml parser the hard way for a while,
yesterday, I took a look again at the old std.xml, and remembered my first
efforts to understand it, and how I got lost and gave up trying to make a few
changes on it.

Now I have backported a few efficiencies and a bug fix or too, to make a
std.xml1, from std.xml,   (my own project is currently labelled std.xml2).
Xml is probably a separate library, given its proper code size.

But I just made a first different new version by editing your Phobos toy xml.

Its nearly 50% faster on the release compile, due to a number of obvious
optimizations. The main Element parse loop was given a more reasonable
arrangement, and is more efficient with some custom munchers.

On my now more educated code review, I found and fixed a most amazing bug, that
current std.xml does not support single quoted attribute values. This almost
certainly proves no one is using it.

I haven't yet started to replace its  its monomaniacal error checking debug
code, which slows debug version execution performance down to a snail. I put a
version tag in, to suggest throwing away those crazy catagory arrays in the
Element class.

std.xml1, is a toy parser still.  I have added it to my std.xmlp project, to
see how far a tiny phobos parser can go, because I've certainly got more code
invested in other versions.   Still, there are some interesting approaches in
it.  So I did a days coding work and replaced some of its toy pieces, from my
experience of what works better. Further improvements without radical change
will be more difficult.  I think I might just also change it to use my versions
of common code that I borrowed and developed, originally from std.encode and

std.xml1 has an added module dependency of my own creation, currently called
alt.zstring. This includes an Array!(T), which is meant to be an efficient
array struct, that also tracks its capacity. Its nice to have this in a class,
so its always available on callback.   I know there are Appender thingos in
std.array, but I wanted my own hand tuned, hard tested version, used in the
std.xmlp xml tools. I suppose I should look hard at the std.array for a
substitute, or for ways to improve this one. But the Array!T can be easily
removed or substituted, in std.xml1. I know I throw it in, to find out its
limitations, requiring improvement.

This means that its not now a drop in replacement for std.xml.
But I think I could do one in a few days, given encouragment, and some access

Here is the URL of d2-xml project: 

Its now on DigMars review list, for attracting attention and comment.

and view code at

Here is the original offending attribute parsing code in class Tag, constructor

                string key = munch(s,"^="~whitespace);
                string val = decode(munch(s,"^\""), DecodeMode.LOOSE);
                attr[key] = val;

Note only double quotes are expected, by reqc.

Here is some of my replacement, referring to some new munchers, which are
simple loop switch char, find and slice, to replace the generic pattern
muncher. decode function, to look for entities, changed as well.

               string key = munchAttribute(s);
               if (s.length == 0)
               dchar quoteMe = s[0];
        if ((quoteMe != '\'') && (quoteMe != '\"'))                      
                 s = s[1..$];
                string val = decode(munchTillNext(s,quoteMe),
                 if (s.length < 1 || s[0] != quoteMe)
                 s = s[1..$];
                attr[key] = val;

Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 16 2012
parent d-bugmail puremagic.com writes:

Richard Webb <webby beardmouse.org.uk> changed:

           What    |Removed                     |Added
                 CC|                            |webby beardmouse.org.uk

PST ---
For what it's worth, a simple

    string s = cast(string)std.file.read(filePath);
    auto doc = new Document(s);

using the xml file attached to https://bugs.launchpad.net/d2-xml/+bug/933594

Takes ~18.6 seconds using std.xml and ~16.4 seconds using std.xml1.
The parse time drops substantially if the GC is disabled while the Document is
being created.

Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 17 2012