www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: High performance XML parser

Tomek SowiƄski Wrote:

 One way is the slicing approach mentioned on this NG, notably used by
RapidXML. I already contacted Marcin (the author) to ensure that using
solutions inspired by his lib is OK with him; it is. But I don't think I'll go
this way. One reason is, surprisingly, performance. RapidXML cannot start
parsing until the entire document is loaded and ready as a random-access
string. Then it's blazingly fast but the time for I/O has already elapsed.
Besides, as Marcin himself said, we need a 100% W3C-compliant implementation
and RapidXML isn't one.
 
 I think a much more fertile approach is to operate on a forward range, perhaps
assuming bufferized input. That way I can start parsing as soon as the first
buffer gets filled. Not to mention that the end result will use much less
memory. Plenty of the XML data stream is indents, spaces, and markup -- there's
no reason to copy all this into memory.

Did you measure how much memory is wasted by markup?
Feb 07 2011