digitalmars.D.bugs - [Issue 7519] New: std.xml cannot manage single quoted attribute values
- d-bugmail puremagic.com (94/94) Feb 16 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7519
- d-bugmail puremagic.com (16/16) Feb 17 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7519
http://d.puremagic.com/issues/show_bug.cgi?id=7519 Summary: std.xml cannot manage single quoted attribute values Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: michaelrynn optusnet.com.au --- Comment #0 from Michael Rynn <michaelrynn optusnet.com.au> 2012-02-16 07:29:15 PST --- Search for std.xml on google, and you will get a "top answer: Don't use std.xml". Nevertheless, I've put up on review list, some candidates for a xml tool set for D. Using my experience in building an xml parser the hard way for a while, yesterday, I took a look again at the old std.xml, and remembered my first efforts to understand it, and how I got lost and gave up trying to make a few changes on it. Now I have backported a few efficiencies and a bug fix or too, to make a std.xml1, from std.xml, (my own project is currently labelled std.xml2). Xml is probably a separate library, given its proper code size. But I just made a first different new version by editing your Phobos toy xml. Its nearly 50% faster on the release compile, due to a number of obvious optimizations. The main Element parse loop was given a more reasonable arrangement, and is more efficient with some custom munchers. On my now more educated code review, I found and fixed a most amazing bug, that current std.xml does not support single quoted attribute values. This almost certainly proves no one is using it. I haven't yet started to replace its its monomaniacal error checking debug code, which slows debug version execution performance down to a snail. I put a version tag in, to suggest throwing away those crazy catagory arrays in the Element class. std.xml1, is a toy parser still. I have added it to my std.xmlp project, to see how far a tiny phobos parser can go, because I've certainly got more code invested in other versions. Still, there are some interesting approaches in it. So I did a days coding work and replaced some of its toy pieces, from my experience of what works better. Further improvements without radical change will be more difficult. I think I might just also change it to use my versions of common code that I borrowed and developed, originally from std.encode and std.xml std.xml1 has an added module dependency of my own creation, currently called alt.zstring. This includes an Array!(T), which is meant to be an efficient array struct, that also tracks its capacity. Its nice to have this in a class, so its always available on callback. I know there are Appender thingos in std.array, but I wanted my own hand tuned, hard tested version, used in the std.xmlp xml tools. I suppose I should look hard at the std.array for a substitute, or for ways to improve this one. But the Array!T can be easily removed or substituted, in std.xml1. I know I throw it in, to find out its limitations, requiring improvement. This means that its not now a drop in replacement for std.xml. But I think I could do one in a few days, given encouragment, and some access please. Here is the URL of d2-xml project: https://launchpad.net/d2-xml Its now on DigMars review list, for attracting attention and comment. and view code at http://bazaar.launchpad.net/~michael-rynn-500/d2-xml/d2-xml-dev/view/head:/std/xml1.d Here is the original offending attribute parsing code in class Tag, constructor string key = munch(s,"^="~whitespace); munch(s,whitespace); reqc(s,'='); munch(s,whitespace); reqc(s,'"'); string val = decode(munch(s,"^\""), DecodeMode.LOOSE); reqc(s,'"'); munch(s,whitespace); attr[key] = val; Note only double quotes are expected, by reqc. Here is some of my replacement, referring to some new munchers, which are simple loop switch char, find and slice, to replace the generic pattern muncher. decode function, to look for entities, changed as well. string key = munchAttribute(s); eatWhiteSpace(s); reqc(s,'='); eatWhiteSpace(s); if (s.length == 0) badParseEnd(); dchar quoteMe = s[0]; if ((quoteMe != '\'') && (quoteMe != '\"')) badAttributeQuote(quoteMe); s = s[1..$]; string val = decode(munchTillNext(s,quoteMe), DecodeMode.LOOSE); if (s.length < 1 || s[0] != quoteMe) badAttributeQuote(s[0]); s = s[1..$]; eatWhiteSpace(s); attr[key] = val; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 16 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7519 Richard Webb <webby beardmouse.org.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |webby beardmouse.org.uk --- Comment #1 from Richard Webb <webby beardmouse.org.uk> 2012-02-17 08:09:00 PST --- For what it's worth, a simple string s = cast(string)std.file.read(filePath); auto doc = new Document(s); using the xml file attached to https://bugs.launchpad.net/d2-xml/+bug/933594 Takes ~18.6 seconds using std.xml and ~16.4 seconds using std.xml1. The parse time drops substantially if the GC is disabled while the Document is being created. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 17 2012