digitalmars.D.announce - std.xml2 candidate

Michael Rynn (24/24) Dec 11 2010 Availability of Updated xml parser for D2,

Andrei Alexandrescu (4/6) Dec 11 2010 [snip]

Andrei Alexandrescu (5/11) Dec 11 2010 One more thing - with XML parsers, I think Tango has definitely set the

Lutger Blijdestijn (6/20) Dec 12 2010 That is considerable. A quick benchmark suggests that a lot of work is

so (7/11) Dec 12 2010 There is no reason a D code should perform worse than C++ if you are not...

Lutger Blijdestijn (6/17) Dec 12 2010 I know, and tango's parser is proof of that. But it can take a lot of wo...

so (5/8) Dec 12 2010 On that i absolutely agree.

Eric Desbiens (30/30) Dec 12 2010 Hello,

Michael Rynn <michaelrynn optusnet.com.au> writes:

Availability of Updated xml parser for D2,   
	organised very presumptively as std.xml2

Downloadable with SVN. 

svn co http://svn.dsource.org/projects/xmlp/trunk/std
(release 20).

This imports a conventional DOM of linked nodes -- std.xmlp.linkdom
A Core parser which emits parsed items -- std.xmlp.coreparse.
A validating parser including DOCTYPE validation. std.xmlp.domparse.

Performance seems not too bad.  There are more lines of code, 
but it does the same work of std.xml in about 65% of the time.

Well-formed-ness check is done during the parse, so there is no need to
do separate check. 

It takes string inputs  or file inputs in various encodings.

The DOMErrorHandler DOM interface is included 
	in the Validating parser for the linkdom.

The parsers and DOM have a straight forward interface.

There is aso a very nearly compatible version of the DOM used in std.xml. 
-- std.xmlp.arraydom.  
The arraydom DocumentParser is also faster than the std.xml,
	 as it uses the std.xmlp.coreparse.

Its not complete or final, nor much reviewed.

The layout and interfaces seem to be OK.
I expect its already more useful than std.xml.

Michael Rynn.

Dec 11 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 	organised very presumptively as std.xml2

[snip]

Great! Do you plan to submit this to Phobos?

Andrei

Dec 11 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:
 On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 organised very presumptively as std.xml2

 [snip]

 Great! Do you plan to submit this to Phobos?

One more thing - with XML parsers, I think Tango has definitely set the 
performance bar where it belongs. Any proposal for Phobos would need to 
meet it.

Andrei

Dec 11 2010

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

Andrei Alexandrescu wrote:

 On 12/11/10 7:15 PM, Andrei Alexandrescu wrote:
 On 12/11/10 7:23 AM, Michael Rynn wrote:
 Availability of Updated xml parser for D2,
 organised very presumptively as std.xml2

 [snip]

 Great! Do you plan to submit this to Phobos?

 
 One more thing - with XML parsers, I think Tango has definitely set the
 performance bar where it belongs. Any proposal for Phobos would need to
 meet it.
 
 Andrei

That is considerable. A quick benchmark suggests that a lot of work is 
needed. 

If you take into account that tango's xml parser does less validation and 
that it is up to par with the fastest C++ parsers out there, I suggest 
lowering the bar a little bit at first. For example, outperforming libxml2.

Dec 12 2010

so <so so.do> writes:

 If you take into account that tango's xml parser does less validation and
 that it is up to par with the fastest C++ parsers out there, I suggest
 lowering the bar a little bit at first. For example, outperforming  
 libxml2.

There is no reason a D code should perform worse than C++ if you are not  
using some high level constructs.
When it comes to strings/slicing/template, you might actually get  
performance boost comparing to C++.
The C++ parser mentioned here (RapidXML) depends heavily on these.

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Dec 12 2010

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

so wrote:

 If you take into account that tango's xml parser does less validation and
 that it is up to par with the fastest C++ parsers out there, I suggest
 lowering the bar a little bit at first. For example, outperforming
 libxml2.

 
 There is no reason a D code should perform worse than C++ if you are not
 using some high level constructs.
 When it comes to strings/slicing/template, you might actually get
 performance boost comparing to C++.
 The C++ parser mentioned here (RapidXML) depends heavily on these.
 

I know, and tango's parser is proof of that. But it can take a lot of work 
getting to that level. Right now we have an xml library a lot of people 
don't want to use, has bugs and performs 60 times worse than tango's.

Imho it's better to include it if performance is merely acceptable and see 
if it is possible to improve from there on.

Dec 12 2010

so <so so.do> writes:

 Imho it's better to include it if performance is merely acceptable and  
 see
 if it is possible to improve from there on.

On that i absolutely agree.
People have this misconception that D should perform worse than said  
languages, so i had to state the obvious :)

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Dec 12 2010

Eric Desbiens <olace.mail gmail.com> writes:

Hello,

It's great to see interest in replacing std.xml. I am also working on a
replacement for std.xml, maybe we can collaborate on this and not duplicate
effort. We should choose one of our codebase and develop from there a strong
alternative.

I propose my codebase for the following 2 reasons:

1.It performs better and scale better with file size. Here's a quick benchmark
for
dom parsing on my computer. I don't know how well it's performed compare to
Tango.

=== XMLP ===
XMLP 1Mb  Parsing time: 0.548 s
XMLP 11Mb Parsing time: 29.570
=== My Alternative* ===
Alt 1Mb  Parsing time: 0.134 s
Alt 11Mb Parsing time: 1.225 s

*This is using XMl1.1 compliant parser.

2. It is more flexible
All parsers are templated and you can choose the degree of conformance, if
namespace are used, the type of entity decoding and support parsing document
fragment. It also parse any type of range wich the element type is some sort of
character.

Your library is more complete tough. It support a Sax like interface, have a
validating parser and try to be compatible with std.xml (which I'm not sure is
needed). It also normalize attribute, which mine does not. On compliance, I
think
the 2 libraries are on the same level.

Feel free to talk about your code and show where it is better than mine and if
you
think it should be better to build on your code instead of mine. Probably a mix
of
both library will make a better base. I think that if we collaborate on this, we
will make a great library.

Code can be downloaded from : https://github.com/olace/experimental

check exp/xml.d

Dec 12 2010

D Programming

C/C++ Programming

Other

digitalmars.D.announce - std.xml2 candidate