www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - std.xml2 candidate

reply Michael Rynn <michaelrynn optusnet.com.au> writes:
Availability of Updated xml parser for D2,   
	organised very presumptively as std.xml2

Downloadable with SVN. 

svn co http://svn.dsource.org/projects/xmlp/trunk/std
(release 20).

This imports a conventional DOM of linked nodes -- std.xmlp.linkdom
A Core parser which emits parsed items -- std.xmlp.coreparse.
A validating parser including DOCTYPE validation. std.xmlp.domparse.

Performance seems not too bad.  There are more lines of code, 
but it does the same work of std.xml in about 65% of the time.

Well-formed-ness check is done during the parse, so there is no need to
do separate check. 

It takes string inputs  or file inputs in various encodings.

The DOMErrorHandler DOM interface is included 
	in the Validating parser for the linkdom.

The parsers and DOM have a straight forward interface.

There is aso a very nearly compatible version of the DOM used in std.xml. 
-- std.xmlp.arraydom.  
The arraydom DocumentParser is also faster than the std.xml,
	 as it uses the std.xmlp.coreparse.

Its not complete or final, nor much reviewed.

The layout and interfaces seem to be OK.
I expect its already more useful than std.xml.

Michael Rynn.
Dec 11 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 	organised very presumptively as std.xml2

Great! Do you plan to submit this to Phobos? Andrei
Dec 11 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:
 On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 organised very presumptively as std.xml2

Great! Do you plan to submit this to Phobos?

One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. Andrei
Dec 11 2010
parent reply Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
Andrei Alexandrescu wrote:

 On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:
 On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 organised very presumptively as std.xml2

Great! Do you plan to submit this to Phobos?

One more thing - with XML parsers, I think Tango has definitely set the performance bar where it belongs. Any proposal for Phobos would need to meet it. Andrei

That is considerable. A quick benchmark suggests that a lot of work is needed. If you take into account that tango's xml parser does less validation and that it is up to par with the fastest C++ parsers out there, I suggest lowering the bar a little bit at first. For example, outperforming libxml2.
Dec 12 2010
parent Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
so wrote:

 If you take into account that tango's xml parser does less validation and
 that it is up to par with the fastest C++ parsers out there, I suggest
 lowering the bar a little bit at first. For example, outperforming
 libxml2.

There is no reason a D code should perform worse than C++ if you are not using some high level constructs. When it comes to strings/slicing/template, you might actually get performance boost comparing to C++. The C++ parser mentioned here (RapidXML) depends heavily on these.

I know, and tango's parser is proof of that. But it can take a lot of work getting to that level. Right now we have an xml library a lot of people don't want to use, has bugs and performs 60 times worse than tango's. Imho it's better to include it if performance is merely acceptable and see if it is possible to improve from there on.
Dec 12 2010
prev sibling next sibling parent so <so so.do> writes:
 If you take into account that tango's xml parser does less validation and
 that it is up to par with the fastest C++ parsers out there, I suggest
 lowering the bar a little bit at first. For example, outperforming  
 libxml2.

There is no reason a D code should perform worse than C++ if you are not using some high level constructs. When it comes to strings/slicing/template, you might actually get performance boost comparing to C++. The C++ parser mentioned here (RapidXML) depends heavily on these. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
prev sibling parent so <so so.do> writes:
 Imho it's better to include it if performance is merely acceptable and  
 see
 if it is possible to improve from there on.

On that i absolutely agree. People have this misconception that D should perform worse than said languages, so i had to state the obvious :) -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
prev sibling parent Eric Desbiens <olace.mail gmail.com> writes:
Hello,

It's great to see interest in replacing std.xml. I am also working on a
replacement for std.xml, maybe we can collaborate on this and not duplicate
effort. We should choose one of our codebase and develop from there a strong
alternative.

I propose my codebase for the following 2 reasons:

1.It performs better and scale better with file size. Here's a quick benchmark
for
dom parsing on my computer. I don't know how well it's performed compare to
Tango.

=== XMLP ===
XMLP 1Mb  Parsing time: 0.548 s
XMLP 11Mb Parsing time: 29.570
=== My Alternative* ===
Alt 1Mb  Parsing time: 0.134 s
Alt 11Mb Parsing time: 1.225 s

*This is using XMl1.1 compliant parser.

2. It is more flexible
All parsers are templated and you can choose the degree of conformance, if
namespace are used, the type of entity decoding and support parsing document
fragment. It also parse any type of range wich the element type is some sort of
character.

Your library is more complete tough. It support a Sax like interface, have a
validating parser and try to be compatible with std.xml (which I'm not sure is
needed). It also normalize attribute, which mine does not. On compliance, I
think
the 2 libraries are on the same level.

Feel free to talk about your code and show where it is better than mine and if
you
think it should be better to build on your code instead of mine. Probably a mix
of
both library will make a better base. I think that if we collaborate on this, we
will make a great library.

Code can be downloaded from : https://github.com/olace/experimental

check exp/xml.d
Dec 12 2010