www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Std.xml is twice as slow on windows vs Linux. std.xml2 is pushing

Std.xml may have a druntime or compile issue on windows.

csv formatted performance results, past into spreadsheet for better 

"platform+compile","input","parse","slice parse","parse+dom","slice
,"Percentage to 0.1",,,,,

The figure in the last column shows std.xml to be twice as slow on 
windows. All programs compiled to release.  Each row is the execution 
times on same running process with different test, run 100s of times one 
after the other and averaged. (sxml.d)

Input is going through buffer and filter, to examine every unicode 
character in the document as dchar.

Parse -  Core parser throughput, returning a copy of each XML node 
content, tag name, attribute value pairs. No data structures are created 
from parse items returned.

Slice-Parse  -  Parser throughput with string alias of in-memory 
document, much less re-allocations.
(std.xmlp.sliceparse) No data structures are created from parse items 

Parse+DOM -  Create a DOM using immutable string duplicates of document 
content. Alias of tag names and attribute names.

Slice+DOM -  DOM using the same aliased strings of the in-memory document.

Anomaly for std.xml

Std.xml is actually pretty nifty on both Ubuntu compiles (dmd and gdc).
Its about twice as slow with windows dmd, wheras the the other 
implementation tests are quite similar between Ubuntu and Windows.  
Std.xml does seem to slice the original in-memory string, on code 
inspection. Building its array dom model might be inefficient compared to 
the linked DOM model of the other DOM tests. I am not sure what the slow 
down  on windows is due  be. Any ideas please?

As for being ready to submit these varied replacements for std.xml, I can 
say that I hope its getting close.  The actual XML parsers and DOM are 
complete as far as I know, in that I am only making minor changes as I 
work on the Xpath 1.0 expression compile and execution and some XSLT, 
which are not yet test  suite ready.  The validating parser is 100% XML 
test suite compliant.  I do not know what the submission process is.

The issue with a parser that references the original in-memory image, is 
that when things get complicated with different source encodings, and 
compliance source line end substitutions, entity replacements, character 
and standard entity replacements in content and attributes, this tends to 
replace much more of the original source, so the speed advantage 
diminishes.  Also the entire document stays in memory till the last 
reference is garbage collectable.  Its all on dsource.org/xml/trunk/

Michael Rynn.
Feb 12 2011