www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Recommendations on parsing XML via an InputRange

reply Chris Piker <chris hoopjump.com> writes:
Hi D

I just finished a ~1K line project using `dxml` as the XML reader 
for my data streams.  It works well in my test examples using 
memory mapped files, but like an impulse shopper I didn't notice 
that dxml requires `ForwardRange` objects.  That's unfortunate, 
because my next enhancement was to start parsing streams as they 
come in from stdin. (doh!)

So I've learned my lesson and will RTFM closer next time, but now 
I'm casting about for a solution.  Two ideas, either:

1. Find a different StAX-ish parser that works with `InputRange` 
(and buffers internally a bit if needed), or

2. Find a way to represent standard input as a ForwardRange 
without saving the whole stream in memory. (iopipe?)

Dxml is very nice, as I have small sections of the stream that I 
parse into a DOM, but the majority of the items are handled and 
discarded element by element.

Any recommendations?
Sep 13 2021
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/13/21 10:43 PM, Chris Piker wrote:
 Hi D
 
 I just finished a ~1K line project using `dxml` as the XML reader for my 
 data streams.  It works well in my test examples using memory mapped 
 files, but like an impulse shopper I didn't notice that dxml requires 
 `ForwardRange` objects.  That's unfortunate, because my next enhancement 
 was to start parsing streams as they come in from stdin. (doh!)
 
 So I've learned my lesson and will RTFM closer next time, but now I'm 
 casting about for a solution.  Two ideas, either:
 
 1. Find a different StAX-ish parser that works with `InputRange` (and 
 buffers internally a bit if needed), or
 
 2. Find a way to represent standard input as a ForwardRange without 
 saving the whole stream in memory. (iopipe?)
Iopipe is no better than an input range unless you plan to read the whole stream into a buffer. A forward range is required because dxml uses saved ranges to refer to previous data. This requires the whole thing to be stored in memory. I've thought of building an xml parser on top of iopipe, and I probably will some day (maybe a port of dxml). The iopipejson library does not require the whole thing to be in memory, and has some facilities to pin parsed data to jump back. I imagine something like that is doable for xml, but probably just storing current element ancestry while parsing (probably off to the side in another stack-like thing). -Steve
Sep 14 2021