digitalmars.D.announce - dxml 0.2.0 released

Jonathan M Davis (23/23) Feb 11 2018 dxml 0.2.0 has now been released.

Aravinda VK (50/76) Feb 11 2018 Awesome. Just tried it now as below and it works. Thanks for this
Chris (3/9) Feb 12 2018 Will this replace `std.xml` one day?

rikki cattermole (3/14) Feb 12 2018 As long as DTD support is essentially non-existent, my vote will always

Chris (5/20) Feb 12 2018 How hard would it be to add DTD support? One could take dxml and

rikki cattermole (10/30) Feb 12 2018 From what I read in the other thread, it would require a complete

Jonathan M Davis (42/53) Feb 12 2018 Maybe. That depends on community feedback and ultimately on the Phobos

Chris (21/42) Feb 12 2018 I thought the same when I glanced over std.xml. There's no DTD

rikki cattermole (10/54) Feb 12 2018 https://github.com/dlang-community/experimental.xml

Adam D. Ruppe (11/15) Feb 12 2018 About 5 years ago (I think, I actually have the link on my other

rikki cattermole (23/37) Feb 12 2018 It depends.

Jonathan M Davis (38/43) Feb 12 2018 That literally cannot be done. dxml returns slices (or takeExactly's) of...

rikki cattermole (10/18) Feb 12 2018 We are definitely not better off with just std.xml currently.

H. S. Teoh (29/35) Feb 12 2018 And thus Phobos continues to let the perfect be the enemy of the good,

rikki cattermole (14/57) Feb 12 2018 In other places it was said that it wasn't possible to build it on top
Nick Sabalausky (Abscissa) (11/25) Feb 12 2018 +Several billion.

Russel Winder (12/17) Feb 13 2018 The problem is that std.xml needs removing to make it clear there is

Adam D. Ruppe (4/6) Feb 12 2018 I wrote one 8 years ago... though mine is more focused on HTML
bachmeier (6/11) Feb 12 2018 Can't you simply give it a name other than std.xml that indicates

bachmeier (2/13) Feb 12 2018 Hit send too fast. std.xml.base would be reasonable.

Jonathan M Davis (14/29) Feb 12 2018 I have no interest in bikeshedding the name right now or even really arg...

H. S. Teoh (32/43) Feb 12 2018 Actually, thinking about this, I'm wondering if a combination of

rikki cattermole (4/12) Feb 12 2018 dxml 7.5k LOC

Chris (9/12) Feb 12 2018 How could it possibly make the situation any worse than it is

Jacob Carlborg (6/8) Feb 12 2018 I'm using std.xml in a new project right now. It's a really small

Chris (9/18) Feb 12 2018 A few lines of code that could be replaced easily once something

Jacob Carlborg (5/7) Feb 12 2018 Fairly easy because it's so small. I'm actually using the SAX interface

Nick Sabalausky (Abscissa) (3/8) Feb 12 2018 4.5k LOC == "a lot worse"?

Jonathan M Davis (22/29) Feb 12 2018 There is sometimes a tendency for folks to think that something having a...

Nick Sabalausky (Abscissa) (10/19) Feb 12 2018 Yea, totally. Another example: mysql-native used to be one (!!) source

Kagamin (3/11) Feb 13 2018 And it's like 2k LOC of code and 5.5k LOC of tests and docs.

Jonathan M Davis (65/88) Feb 12 2018 The core problem is that entity references get replaced with more XML th...

Kagamin (4/16) Feb 13 2018 Standard entities like & have the same problem, so the same

Jonathan M Davis (27/43) Feb 13 2018 That depends on what exactly an entity reference can contain. If it can ...

Patrick Schluter (31/82) Feb 13 2018 There's also the issue that entity references open a whole can of

Jonathan M Davis (48/52) Feb 13 2018 Well, if dxml just passes the entity references along unparsed beyond

Patrick Schluter (7/29) Feb 14 2018 Yikes! In any case, even if I had to implement a parser I would

Jonathan M Davis (18/50) Feb 14 2018 Well, since folks other than me are going to use this parser, and it's e...

H. S. Teoh (93/135) Feb 13 2018 This made me go to the W3C spec (https://www.w3.org/TR/xml/) to figure

Chris (8/32) Feb 14 2018 Thanks for the analysis. I'd say you're right. It makes no sense

H. S. Teoh (36/61) Feb 13 2018 AFAICT, section 4.3.2 in the spec (probably the one you're referring to)

Kagamin (9/12) Feb 14 2018 The parser now returns raw text, entity replacement can be done

Jonathan M Davis (13/15) Feb 14 2018 It's very difficult in general to write a parser that isn't at least a

rikki cattermole (34/52) Feb 14 2018 See lines:

Adrian Matoga (5/18) Feb 14 2018 `temp = input.save` is exactly what you want here, which means

rikki cattermole (2/22) Feb 14 2018 Ah I must be thinking of ranges that support indexing.

Jonathan M Davis (5/27) Feb 14 2018 Random access ranges are also forward ranges and would require a call to

rikki cattermole (2/33) Feb 14 2018 Luckily in my code I can forget that ;)

Jonathan M Davis (20/60) Feb 14 2018 wrote:

jmh530 (3/6) Feb 15 2018 That sounds like an interesting topic for a blog post.

Jonathan M Davis (38/41) Feb 13 2018 Well, there are plenty of folks who talk like XML is a pile of steaming ...

nkm1 (13/20) Aug 30 2018 Bump!

H. S. Teoh (15/31) Sep 13 2018 +1. I vote for adding dxml to Phobos.

H. S. Teoh (49/60) Feb 12 2018 [...]

Chris (14/20) Feb 13 2018 In this vein, if a new version of std.xml didn't offer pure and

Jonathan M Davis (31/58) Feb 12 2018 Which was my point. The API as-is doesn't work with DTD support for thos...
Jonathan M Davis (22/38) Feb 13 2018 XML 1.0 does not require the section - which is the main reas...

Johannes Loher (12/14) Feb 12 2018 Thank you very much for your efforts, I really appreciate it, as

Jonathan M Davis (9/23) Feb 12 2018 Thanks. When you do use it, please give feedback - particularly if you f...

Jesse Phillips (9/14) Feb 23 2018 This is absolutely awesome. It is a little low level (compared to

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

dxml 0.2.0 has now been released.

I really wasn't planning on releasing anything this quickly after announcing
dxml, but when I went to start working on DOM support, it turned out to be
surprisingly quick and easy to implement. So, dxml now has basic DOM
support.

As part of that, it became clear that dxml.parser.stax should be renamed to
dxml.parser, since it's really the only parser (DOM support involves just
providing a way to hold the results of the parser, not any actual parsing,
and that's clear from the API rather than being an implementation detail),
and it makes for a shorter import path. So, I figured that I should do a
release sooner rather than later to reduce how many folks the rename ends up
affecting.

For this release, dxml.parser.stax is now an empty, deprecated, module that
publicly imports dxml.parser, but it will be removed in 0.3.0, whenever that
is released. So, the few folks who grabbed the initial release won't end up
with immediate code breakage if they upgrade.

One nice side effect of how I implemented DOM support is that it's trivial
to get the DOM for a portion of an XML document rather than the entire
thing, since it will produce a DOMEntity from any point in an EntityRange.

Documentation: http://jmdavisprog.com/docs/dxml/0.2.0/
Github: https://github.com/jmdavis/dxml/tree/v0.2.0
Dub: http://code.dlang.org/packages/dxml

- Jonathan M Davis

Feb 11 2018

Aravinda VK <mail aravindavk.in> writes:

On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis 
wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly 
 after announcing dxml, but when I went to start working on DOM 
 support, it turned out to be surprisingly quick and easy to 
 implement. So, dxml now has basic DOM support.

 As part of that, it became clear that dxml.parser.stax should 
 be renamed to dxml.parser, since it's really the only parser 
 (DOM support involves just providing a way to hold the results 
 of the parser, not any actual parsing, and that's clear from 
 the API rather than being an implementation detail), and it 
 makes for a shorter import path. So, I figured that I should do 
 a release sooner rather than later to reduce how many folks the 
 rename ends up affecting.

 For this release, dxml.parser.stax is now an empty, deprecated, 
 module that publicly imports dxml.parser, but it will be 
 removed in 0.3.0, whenever that is released. So, the few folks 
 who grabbed the initial release won't end up with immediate 
 code breakage if they upgrade.

 One nice side effect of how I implemented DOM support is that 
 it's trivial to get the DOM for a portion of an XML document 
 rather than the entire thing, since it will produce a DOMEntity 
 from any point in an EntityRange.

 Documentation: http://jmdavisprog.com/docs/dxml/0.2.0/
 Github: https://github.com/jmdavis/dxml/tree/v0.2.0
 Dub: http://code.dlang.org/packages/dxml

 - Jonathan M Davis

Awesome. Just tried it now as below and it works. Thanks for this 
library

import std.stdio;

import dxml.dom;

struct Record
{
     string name;
     string email;
}


Record[] parseRecords(string xml)
{
     Record[] records;
     auto d = parseDOM!simpleXML(xml);
     auto root = d.children[0];

     foreach(record; root.children)
     {
         auto rec = Record();
         foreach(ele; record.children)
         {
             if (ele.name == "name")
                 rec.name = ele.children[0].text;
             if (ele.name == "email")
                 rec.email = ele.children[0].text;
         }
         records ~= rec;
     }

     return records;
}

void main()
{
     auto xml = "<root>\n" ~
         "    <record>\n" ~
         "        <name>N1</name>\n" ~
         "        <email>E1</email>\n" ~
         "    </record>\n" ~
         "    <record>\n" ~
         "        <name>N2</name>\n" ~
         "        <email>E2</email>\n" ~
         "    </record>\n" ~
         "    <record>\n" ~
         "        <email>E3</email>\n" ~
         "        <name>N3</name>\n" ~
         "    </record>\n" ~
         "<!--no comment -->\n" ~
         "</root>";
     auto records = parseRecords(xml);
     writeln(records);
}

Feb 11 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis 
wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly 
 after announcing dxml, but when I went to start working on DOM 
 support, it turned out to be surprisingly quick and easy to 
 implement. So, dxml now has basic DOM support.

 [...]

Will this replace `std.xml` one day?

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 12:38 PM, Chris wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly after 
 announcing dxml, but when I went to start working on DOM support, it 
 turned out to be surprisingly quick and easy to implement. So, dxml 
 now has basic DOM support.

 [...]

 
 Will this replace `std.xml` one day?

As long as DTD support is essentially non-existent, my vote will always 
be no.

Feb 12 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 12:49:30 UTC, rikki cattermole 
wrote:
 On 12/02/2018 12:38 PM, Chris wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis 
 wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly 
 after announcing dxml, but when I went to start working on 
 DOM support, it turned out to be surprisingly quick and easy 
 to implement. So, dxml now has basic DOM support.

 [...]

 
 Will this replace `std.xml` one day?

 As long as DTD support is essentially non-existent, my vote 
 will always be no.

How hard would it be to add DTD support? One could take dxml and 
extend it in order to include it in Phobos. I haven't used 
`std.xml` for years now. It is essentially dead and unusable atm.

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 1:51 PM, Chris wrote:
 On Monday, 12 February 2018 at 12:49:30 UTC, rikki cattermole wrote:
 On 12/02/2018 12:38 PM, Chris wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly after 
 announcing dxml, but when I went to start working on DOM support, it 
 turned out to be surprisingly quick and easy to implement. So, dxml 
 now has basic DOM support.

 [...]

 Will this replace `std.xml` one day?

 As long as DTD support is essentially non-existent, my vote will 
 always be no.

 
 How hard would it be to add DTD support? One could take dxml and extend 
 it in order to include it in Phobos. I haven't used `std.xml` for years 
 now. It is essentially dead and unusable atm.

 From what I read in the other thread, it would require a complete 
redesign and a major performance hit.

I don't care what J.M.D. puts in his own library. We just can't 
advertise to having an 'XML' library when we out right ignore a large 
portion of (and fairly important to real world adoption IMO) the 
specification for no other reason than personal opinions of the author.

Now if you want a subset as the 'default' but have full support 
including DTD as an opt-in with the only difference is how you 
initialize the parser, I'd be happy and so will our end users in the future.

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 12:38:51 Chris via Digitalmars-d-announce 
wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis

 wrote:
 dxml 0.2.0 has now been released.

 I really wasn't planning on releasing anything this quickly
 after announcing dxml, but when I went to start working on DOM
 support, it turned out to be surprisingly quick and easy to
 implement. So, dxml now has basic DOM support.

 [...]

 Will this replace `std.xml` one day?

Maybe. That depends on community feedback and ultimately on the Phobos
review process. Assuming that there's support for putting it through the
Phobos review process, then once I feel that it's complete enough and had
enough use to make it clear that I didn't miss something critical, then I'll
submit it for review.

What little feedback there has been thus far has been positive, but it would
be nice to get it battle-tested a bit, and there is still functionality that
I need to add.

Given that std.xml needs to be replaced, I think that it would be good if
dxml were able to do that, but that depends heavily on what others think of
what I've done and what they think Phobos' xml solution should look like.
But the way things are going though, if dxml doesn't replace std.xml, I
don't know that anything ever will. XML parsers are one of those things that
everyone seems to want and no one seems to want to work on.

However, if folks as a whole think that Phobos' xml parser needs to support
the DTD section to be acceptable, then dxml won't replace std.xml, because
dxml is not going to implement DTD support. DTD support fundamentally does
not fit in with dxml's design. Someone would basically have to write an
entirely new parser to be able to handle it (some of dxml's internals could
be reused, but they'd also have to be refactored a fair bit, and a ton of
extra stuff would have to be added). Such a parser could theoretically
coexist with dxml's parser, since each would provide its own advantages, but
I have no plans to implement an XML parser to handle the DTD section. It's
simply not worth my time or effort, and this project has already taken way
more time and effort than I anticipated.

However, std.xml does not support the DTD section, and glancing over it, it
doesn't look like it even handles skipping the DTD section properly (it
doesn't handle the fact that '>' can appear within quoted sections within
the DTD). So, dxml is not worse than std.xml in that regard, and we wouldn't
lose any functionality by having dxml replace std.xml. It just wouldn't
necessarily do as much as some folks might like.

My guess is that DTD support won't be a deal breaker given that std.xml
doesn't support it, that std.xml has needed to be replaced for years now,
and that no one else is working on replacing it, but I don't know.
Disagreements over what should be done with std.json's replacement has meant
that it has never been replaced even though significant work was done
towards replacing it, so unfortunately, there's already precedence for a
module not being replaced with something better due to disagreements over
what the replacement would ideally be. So, I don't know.

- Jonathan M Davis

Feb 12 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis 
wrote:
 On Monday, February 12, 2018 12:38:51 Chris via 
 Digitalmars-d-announce wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis


 However, std.xml does not support the DTD section, and glancing 
 over it, it doesn't look like it even handles skipping the DTD 
 section properly (it doesn't handle the fact that '>' can 
 appear within quoted sections within the DTD). So, dxml is not 
 worse than std.xml in that regard, and we wouldn't lose any 
 functionality by having dxml replace std.xml. It just wouldn't 
 necessarily do as much as some folks might like.

I thought the same when I glanced over std.xml. There's no DTD 
support there either and I don't think it would be a deal breaker 
for most users.

 My guess is that DTD support won't be a deal breaker given that 
 std.xml doesn't support it, that std.xml has needed to be 
 replaced for years now, and that no one else is working on 
 replacing it, but I don't know. Disagreements over what should 
 be done with std.json's replacement has meant that it has never 
 been replaced even though significant work was done towards 
 replacing it, so unfortunately, there's already precedence for 
 a module not being replaced with something better due to 
 disagreements over what the replacement would ideally be. So, I 
 don't know.

 - Jonathan M Davis

Wasn't there a replacement module that never got past the initial 
review steps? Some GSoC thing or so. But I wonder if that module 
would be up to the latest D standards.

While one may argue that DTD support is important, I would rather 
have something fast and simple like dxml that covers, say, 90% of 
the cases than nothing. It doesn't make sense to me that we 
should accept the current situation, only because of some 
bikeshedding that concerns 10% of the use cases. After all, it's 
only a module not a fundamental decision that concerns the 
direction D will take in the future. I think stuff like that can 
seriously turn off potential users. A lot of useful things begin 
with one person deciding to give it a go. vibe.d, dub, DScanner 
and DlangUI, for example. If the creators had started 
bikeshedding before writing the first line of code, there would 
still be a flamewar about the best way to go about it - and 
nothing would have happened.

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 2:45 PM, Chris wrote:
 On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis wrote:
 On Monday, February 12, 2018 12:38:51 Chris via Digitalmars-d-announce 
 wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis


 
 However, std.xml does not support the DTD section, and glancing over 
 it, it doesn't look like it even handles skipping the DTD section 
 properly (it doesn't handle the fact that '>' can appear within quoted 
 sections within the DTD). So, dxml is not worse than std.xml in that 
 regard, and we wouldn't lose any functionality by having dxml replace 
 std.xml. It just wouldn't necessarily do as much as some folks might 
 like.

 
 I thought the same when I glanced over std.xml. There's no DTD support 
 there either and I don't think it would be a deal breaker for most users.
 
 My guess is that DTD support won't be a deal breaker given that 
 std.xml doesn't support it, that std.xml has needed to be replaced for 
 years now, and that no one else is working on replacing it, but I 
 don't know. Disagreements over what should be done with std.json's 
 replacement has meant that it has never been replaced even though 
 significant work was done towards replacing it, so unfortunately, 
 there's already precedence for a module not being replaced with 
 something better due to disagreements over what the replacement would 
 ideally be. So, I don't know.

 - Jonathan M Davis

 
 Wasn't there a replacement module that never got past the initial review 
 steps? Some GSoC thing or so. But I wonder if that module would be up to 
 the latest D standards.

https://github.com/dlang-community/experimental.xml

Code isn't great, and not complete yet.
Author has just disappeared sadly.

 While one may argue that DTD support is important, I would rather have 
 something fast and simple like dxml that covers, say, 90% of the cases 
 than nothing. It doesn't make sense to me that we should accept the 
 current situation, only because of some bikeshedding that concerns 10% 
 of the use cases. After all, it's only a module not a fundamental 
 decision that concerns the direction D will take in the future. I think 
 stuff like that can seriously turn off potential users. A lot of useful 
 things begin with one person deciding to give it a go. vibe.d, dub, 
 DScanner and DlangUI, for example. If the creators had started 
 bikeshedding before writing the first line of code, there would still be 
 a flamewar about the best way to go about it - and nothing would have 
 happened.

Everything you have mentioned is not in Phobos. Just because something 
is 'good enough' does not make it 'good enough' for Phobos. In the words 
of Andrei "Good enough is not good enough", we need to aim higher to 
show what we actually can do.

Personally I find J.M.D. arguments quite reasonable for a third-party 
library, since yes it does cover 90% of the use cases.

Feb 12 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 12 February 2018 at 14:54:48 UTC, rikki cattermole 
wrote:
 Just because something is 'good enough' does not make it 'good 
 enough' for Phobos. In the words of Andrei "Good enough is not 
 good enough", we need to aim higher to show what we actually 
 can do.

About 5 years ago (I think, I actually have the link on my other 
computer but it is 2,000 miles away right now), Andrei said 
something along the lines of "without the review process, we get 
junk like std.json".

Ironically, that same review process may be why we still have 
such "junk". (actually personally, I don't hate std.json).

If std.xml is really so bad and has been for so long, surely we 
ought to take an opportunity to change that, even if the change 
isn't perfect.

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 3:08 PM, Adam D. Ruppe wrote:
 On Monday, 12 February 2018 at 14:54:48 UTC, rikki cattermole wrote:
 Just because something is 'good enough' does not make it 'good enough' 
 for Phobos. In the words of Andrei "Good enough is not good enough", 
 we need to aim higher to show what we actually can do.

 
 About 5 years ago (I think, I actually have the link on my other 
 computer but it is 2,000 miles away right now), Andrei said something 
 along the lines of "without the review process, we get junk like std.json".
 
 Ironically, that same review process may be why we still have such 
 "junk". (actually personally, I don't hate std.json).
 
 If std.xml is really so bad and has been for so long, surely we ought to 
 take an opportunity to change that, even if the change isn't perfect.

It depends.

The implementation does not need to be perfect or full fledged to go 
into experimental.

But if at the start of the review process it is already well known that 
the public API would require a complete change to accommodate the 
intended goal it is unacceptable.

Take std.experimental.allocators as an example. It currently is going 
through a massive API change, but when it first got PR'd, did we know 
that we should be RC'ing allocators? No of course not, otherwise we'd 
have done it.

At this point in time I cannot say that dxml in good faith serves to 
represent the XML specification for the D community in full. This is 
unfortunately not about bike shedding.

It is one thing to bike shed features, but when scope does not match the 
intended goal, we have got to be careful about what goes into Phobos.

All J.M.D. has to do to change this, is make the API match the spec (as 
close as possible, without writing another parser) and separate out the 
implementation into a different and very clear module (probably a sub 
package) which states clearly that it is a subset with the full grammar 
listed that it supports.

That way everybody is clear and we can later on get a full 
implementation as part of taking it out of experimental :)

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 15:26:24 rikki cattermole via Digitalmars-d-
announce wrote:
 All J.M.D. has to do to change this, is make the API match the spec (as
 close as possible, without writing another parser) and separate out the
 implementation into a different and very clear module (probably a sub
 package) which states clearly that it is a subset with the full grammar
 listed that it supports.

That literally cannot be done. dxml returns slices (or takeExactly's) of the
original input. For it to do otherwise would harm performance and usability,
but in order to implement full DTD support, it's impossible to return slices
of the original input in the general case, because you have to be able to
mutate the data whenever entity references get involved. If the API were
entirely string-based, then whether the implementation returned slices or
newly allocated strings could be an implementation detail, but as soon as
you're dealing with arbitrary ranges of characters, that doesn't work. At
that point, you're forced to either return strings for everything (which
means allocating for any ranges that aren't strings) or to return a lazy
range of characters and thus can't return the original type. And that means
that if you pass it a string, you're stuck with a lazy range out the other
end instead of a string, and to get a string again, you have to allocate,
whereas with what I have now, the parser does almost no allocations, and as
long as the input type supports slicing, you get exactly the same type out
the other end, which is a huge usabality improvement IMHO.

So, you can't have DTD support with the kind of API that dxml has, and
changing the API to something that could work with DTD support would harm
the parser for all of the cases where DTD support is unnecessary.

Even if I were going to implement full DTD support, I would do it with
another parser, not change the parser that dxml already has. And if dxml
ends up in Phobos with the parser that it has, that doesn't prevent another
parser from being added for the DTD case later if someone actually decides
to put in the time and effort to do it. Either way, for any XML document
that doesn't need DTD support, the way that dxml does things is more
efficient and user-friendly than one that had DTD support would be, much as
that obviously doesn't cut it for those documents that do need DTD support.

In any case, I'm going to finish implementing dxml without any kind of DTD
support and then see how things go as far as the Phobos review process goes.
If dxml gets rejected, because the majority of folks think that we're better
off with std.xml (or no xml parser at all in Phobos) than one that doesn't
have DTD support, then oh well. That sucks, but anyone who wants dxml can
then use it as a 3rd party library. I think that the D community would be
worse off because of that, but it's not ultimately my decision to make, and
either way, I have the parser that I need.

- Jonathan M Davis

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 3:50 PM, Jonathan M Davis wrote:
 In any case, I'm going to finish implementing dxml without any kind of DTD
 support and then see how things go as far as the Phobos review process goes.
 If dxml gets rejected, because the majority of folks think that we're better
 off with std.xml (or no xml parser at all in Phobos) than one that doesn't
 have DTD support, then oh well. That sucks, but anyone who wants dxml can
 then use it as a 3rd party library. I think that the D community would be
 worse off because of that, but it's not ultimately my decision to make, and
 either way, I have the parser that I need.

We are definitely not better off with just std.xml currently.

The problem comes from the word currently. By going into Phobos even if 
experimental, its going to be around for a while in some form or 
another. So we need to invest a decent amount of time into not creating 
more problems for new users expecting the world and not getting it.

If somebody (say a student?) were to write up a proper API and use dxml 
as a basis for a simpler parser, now that could be a worth while project 
and definitely could go into Phobos.

I may even consider doing it at some point in the future.

Feb 12 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Feb 12, 2018 at 02:54:48PM +0000, rikki cattermole via
Digitalmars-d-announce wrote:
[...]
 Everything you have mentioned is not in Phobos. Just because something
 is 'good enough' does not make it 'good enough' for Phobos. In the
 words of Andrei "Good enough is not good enough", we need to aim
 higher to show what we actually can do.

And thus Phobos continues to let the perfect be the enemy of the good,
and 10 years later std.xml will still be around, and we will still be
arguing over how to replace it.


 Personally I find J.M.D. arguments quite reasonable for a third-party
 library, since yes it does cover 90% of the use cases.

As I have just said in another post, dxml itself does not need to be
changed to implement DTD support.  It's perfectly possible to write a
wrapper on top of it that *does* implement DTD support.  In fact, I dare
say it might be possible to lazily switch from a thin wrapper over dxml
to full DTD mode, so that end users don't even need to care about the
difference if they don't care to.

As far as API is concerned, it could be as simple as something like:

	auto parseXml(R, DtdSupport = dtdSupport.true)(R input) if (...)
	{
		static if (DtdSupport)
			return dtdWrapper(dxmlParse(input));
		else
			return dxmlParse(input);
	}

Then just note in the documentation that turning off DTD support would
provide extra features X, Y, and Z (speed, slices, whatever). Then let
the user choose.

Seriously, I would have thought something like this would be obvious to
programmers of the calibre found on these forums.  I'm a little
astonished that this would even be such a point of contention in the
first place, since the solution is so simple.


T

-- 
Many open minds should be closed for repairs. -- K5 user

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 10:02 PM, H. S. Teoh wrote:
 On Mon, Feb 12, 2018 at 02:54:48PM +0000, rikki cattermole via
Digitalmars-d-announce wrote:
 [...]
 Everything you have mentioned is not in Phobos. Just because something
 is 'good enough' does not make it 'good enough' for Phobos. In the
 words of Andrei "Good enough is not good enough", we need to aim
 higher to show what we actually can do.

 
 And thus Phobos continues to let the perfect be the enemy of the good,
 and 10 years later std.xml will still be around, and we will still be
 arguing over how to replace it.
 
 
 Personally I find J.M.D. arguments quite reasonable for a third-party
 library, since yes it does cover 90% of the use cases.

 
 As I have just said in another post, dxml itself does not need to be
 changed to implement DTD support.  It's perfectly possible to write a
 wrapper on top of it that *does* implement DTD support.  In fact, I dare
 say it might be possible to lazily switch from a thin wrapper over dxml
 to full DTD mode, so that end users don't even need to care about the
 difference if they don't care to.
 
 As far as API is concerned, it could be as simple as something like:
 
 	auto parseXml(R, DtdSupport = dtdSupport.true)(R input) if (...)
 	{
 		static if (DtdSupport)
 			return dtdWrapper(dxmlParse(input));
 		else
 			return dxmlParse(input);
 	}
 
 Then just note in the documentation that turning off DTD support would
 provide extra features X, Y, and Z (speed, slices, whatever). Then let
 the user choose.
 
 Seriously, I would have thought something like this would be obvious to
 programmers of the calibre found on these forums.  I'm a little
 astonished that this would even be such a point of contention in the
 first place, since the solution is so simple.
 
 
 T

In other places it was said that it wasn't possible to build it on top 
of it.

But yes, I would be expecting an entry point like you described and is 
something that I mentioned :)

std.experimental.xml:
	- interfaces.d: interface Element {...}
	- entry.d: auto parseXML(...)(...) {...}
	- impl_subset:
		- dom.d
		ext.
	- impl_full:
		- entry.d
		ext.

Feb 12 2018

"Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:

On 02/12/2018 05:02 PM, H. S. Teoh wrote:
 On Mon, Feb 12, 2018 at 02:54:48PM +0000, rikki cattermole via
Digitalmars-d-announce wrote:
 [...]
 Everything you have mentioned is not in Phobos. Just because something
 is 'good enough' does not make it 'good enough' for Phobos. In the
 words of Andrei "Good enough is not good enough", we need to aim
 higher to show what we actually can do.

 
 And thus Phobos continues to let the perfect be the enemy of the good,
 and 10 years later std.xml will still be around, and we will still be
 arguing over how to replace it.

+Several billion.

Like the improved assert messages we would've had since many years ago 
and was implemented, done and ready to go, but it was instead thrown 
away because...(and here's the real kicker, considering current D 
climate)...because it was a fully in-library solution instead of a new 
compiler feature. Go figure ::eyeroll::

 Seriously, I would have thought something like this would be obvious to
 programmers of the calibre found on these forums.  I'm a little
 astonished that this would even be such a point of contention in the
 first place, since the solution is so simple.

I would've expected so too, if it weren't that one of the top favorite 
activities 'round these parts is nitpicking reasonable ideas to death 
for stupid reasons. And, generally letting the perfect be the enemy of 
the good.

Feb 12 2018

Russel Winder <russel winder.org.uk> writes:

On Mon, 2018-02-12 at 14:54 +0000, rikki cattermole via Digitalmars-d-
announce wrote:
 [=E2=80=A6]
=20
 Personally I find J.M.D. arguments quite reasonable for a third-
 party=20
 library, since yes it does cover 90% of the use cases.

The problem is that std.xml needs removing to make it clear there is
no good XML package in Phobos. The people will go looking in the Dub
repository.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Feb 13 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis 
wrote:
 XML parsers are one of those things that everyone seems to want 
 and no one seems to want to work on.

I wrote one 8 years ago... though mine is more focused on HTML 
parsing, and the XML aspect is just a side effect!

Feb 12 2018

bachmeier <no spam.net> writes:

On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis 
wrote:

 However, if folks as a whole think that Phobos' xml parser 
 needs to support the DTD section to be acceptable, then dxml 
 won't replace std.xml, because dxml is not going to implement 
 DTD support. DTD support fundamentally does not fit in with 
 dxml's design.

Can't you simply give it a name other than std.xml that indicates 
it doesn't do everything related to xml? It doesn't make sense to 
not put it into Phobos because of the name, and that should be an 
easy problem to solve.

Feb 12 2018

bachmeier <no spam.net> writes:

On Monday, 12 February 2018 at 15:43:59 UTC, bachmeier wrote:
 On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis 
 wrote:

 However, if folks as a whole think that Phobos' xml parser 
 needs to support the DTD section to be acceptable, then dxml 
 won't replace std.xml, because dxml is not going to implement 
 DTD support. DTD support fundamentally does not fit in with 
 dxml's design.

 Can't you simply give it a name other than std.xml that 
 indicates it doesn't do everything related to xml? It doesn't 
 make sense to not put it into Phobos because of the name, and 
 that should be an easy problem to solve.

Hit send too fast. std.xml.base would be reasonable.

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 15:45:50 bachmeier via Digitalmars-d-announce 
wrote:
 On Monday, 12 February 2018 at 15:43:59 UTC, bachmeier wrote:
 On Monday, 12 February 2018 at 14:04:38 UTC, Jonathan M Davis

 wrote:
 However, if folks as a whole think that Phobos' xml parser
 needs to support the DTD section to be acceptable, then dxml
 won't replace std.xml, because dxml is not going to implement
 DTD support. DTD support fundamentally does not fit in with
 dxml's design.

 Can't you simply give it a name other than std.xml that
 indicates it doesn't do everything related to xml? It doesn't
 make sense to not put it into Phobos because of the name, and
 that should be an easy problem to solve.

 Hit send too fast. std.xml.base would be reasonable.

I have no interest in bikeshedding the name right now or even really arguing
about Phobos inclusion (I've already said more in this thread about that
than I probably should have). That can be left up to the review process,
which already tends to be nasty enough that it wouldn't surprise me at all
if dxml doesn't get accepted. The only reason that I have any plans to try
for Phobos inclusion with dxml is because std.xml needs to be replaced. If
Phobos didn't have an XML parser already, I don't expect that I'd bother,
since I don't think that it's all that important that a standard library
have an XML parser. I just think that it's important that it not have have a
bad one. In general, I think that XML is the sort of thing that's perfectly
fine as a 3rd party solution.

- Jonathan M Davis

Feb 12 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Feb 12, 2018 at 07:04:38AM -0700, Jonathan M Davis via
Digitalmars-d-announce wrote:
[...]
 However, if folks as a whole think that Phobos' xml parser needs to
 support the DTD section to be acceptable, then dxml won't replace
 std.xml, because dxml is not going to implement DTD support. DTD
 support fundamentally does not fit in with dxml's design.

Actually, thinking about this, I'm wondering if a combination of
preprocessing and/or postprocessing might make it possible to implement
DTD support without needing to rewrite the guts of dxml. AIUI, dxml does
parse the DTD section correctly, i.e., as an XML directive, but only
doesn't look into its internal details. So one way to implement DTD
support might be:

- Write an auxiliary parser that's basically a wrapper around dxml,
  forwarding XML events to the caller, except:
- If a DTD event is encountered, eagerly parse it, store DTD
  declarations internally for future reference.
- If there's a DTD that has been seen, perform on-the-fly validation as
  XML events are forwarded.
- In PCDATA sections, if there are entity references to the DTD, expand
  them, possibly inserting more XML events into the stream based on
  what's defined in the DTD. (This may need to reuse some dxml internals
  to parse XML snippets that might be contained in an entity definition,
  for example.)


[...]
 However, std.xml does not support the DTD section, and glancing over
 it, it doesn't look like it even handles skipping the DTD section
 properly (it doesn't handle the fact that '>' can appear within quoted
 sections within the DTD). So, dxml is not worse than std.xml in that
 regard, and we wouldn't lose any functionality by having dxml replace
 std.xml. It just wouldn't necessarily do as much as some folks might
 like.

[...]

If std.xml currently does not support DTDs, then I say dxml is
definitely a Phobos candidate.  At the very least, it does not make the
current situation worse.  Rejecting dxml because it doesn't support DTDs
is basically letting the perfect be the enemy of the good, which is
something this community has been plagued with for far too long.  What's
worse: a std.dxml that doesn't support DTDs, or a std.xml with
fundamental problems that continue to plague us for the next decade
while nobody else steps up to implement a suitable replacement?


T

-- 
Ph.D. = Permanent head Damage

Feb 12 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 12/02/2018 3:59 PM, H. S. Teoh wrote:
 If std.xml currently does not support DTDs, then I say dxml is
 definitely a Phobos candidate.  At the very least, it does not make the
 current situation worse.  Rejecting dxml because it doesn't support DTDs
 is basically letting the perfect be the enemy of the good, which is
 something this community has been plagued with for far too long.  What's
 worse: a std.dxml that doesn't support DTDs, or a std.xml with
 fundamental problems that continue to plague us for the next decade
 while nobody else steps up to implement a suitable replacement?

dxml 7.5k LOC
std.xml 3k LOC

dxml would make the situation a lot worse.

Feb 12 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 16:15:54 UTC, rikki cattermole 
wrote:

 dxml 7.5k LOC
 std.xml 3k LOC

 dxml would make the situation a lot worse.

How could it possibly make the situation any worse than it is 
now? Atm, nobody will ever use std.xml, because it is 
sub-standard and has no future.

As others have already mentioned: a DTD parser can still be added 
at a later point. It's like not moving into newly built house, 
because the winter garden is not yet finished (and you live in 
Florida :)

Feb 12 2018

Jacob Carlborg <doob me.com> writes:

On 2018-02-12 17:49, Chris wrote:

 How could it possibly make the situation any worse than it is now? Atm,
 nobody will ever use std.xml, because it is sub-standard and has no future.

I'm using std.xml in a new project right now. It's a really small 
private project that just need to extracts some data from an XML 
document. I started it a couple of days before dxml was announced.

-- 
/Jacob Carlborg

Feb 12 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 19:47:09 UTC, Jacob Carlborg wrote:
 On 2018-02-12 17:49, Chris wrote:

 How could it possibly make the situation any worse than it is 
 now? Atm,
 nobody will ever use std.xml, because it is sub-standard and 
 has no future.

 I'm using std.xml in a new project right now. It's a really 
 small private project that just need to extracts some data from 
 an XML document. I started it a couple of days before dxml was 
 announced.

A few lines of code that could be replaced easily once something 
better is available? But who will start an important commercial 
project with std.xml when it says in red letters:

"Warning: This module is considered out-dated and not up to 
Phobos' current standards. It will remain until we have a 
suitable replacement, but be aware that it will not remain long 
term."

I for my part wouldn't and I'm glad there's dxml now.

Feb 12 2018

Jacob Carlborg <doob me.com> writes:

On 2018-02-12 21:19, Chris wrote:

 A few lines of code that could be replaced easily once something better 
 is available?

Fairly easy because it's so small. I'm actually using the SAX interface 
from std.xml and it quite nicely fits my needs.

-- 
/Jacob Carlborg

Feb 12 2018

"Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:

On 02/12/2018 11:15 AM, rikki cattermole wrote:
 
 dxml 7.5k LOC
 std.xml 3k LOC
 
 dxml would make the situation a lot worse.

4.5k LOC == "a lot worse"?

Uuuuhhh...WAT?

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 21:53:21 Nick Sabalausky  via Digitalmars-d-
announce wrote:
 On 02/12/2018 11:15 AM, rikki cattermole wrote:
 dxml 7.5k LOC
 std.xml 3k LOC

 dxml would make the situation a lot worse.

 4.5k LOC == "a lot worse"?

 Uuuuhhh...WAT?

There is sometimes a tendency for folks to think that something having a lot
of lines of code is bad, and there can be some truth to that. If something
can be done in a simpler way, it tends to be shorter and easier to maintain,
but shorter isn't always better, and simpler isn't always better -
especially if that complexity is needed to get the job done. So, LOC tells
you something, but what it really tells you is up for debate.

And actually, well-written D code is going to have a much higher line count
in general because of stuff like documentation and unit tests being in the
source file. In this case, while std.xml does seem to have a fair bit of
documentation, it has very little in the way of unit tests, whereas dxml has
fairly thorough unit tests - maybe not quite as extreme as std.datetime, but
I do tend to be thorough with unit tests.

Andrei used to complain periodically about how large std.datetime was,
thinking that it was way too much code, and then someone actually went to
the effort of stripping out all of the comments and unit tests and whatnot
to count the actual lines of code in the implementation, and it was a _way_
smaller number than the lines in the file (IIRC, it might have even been
something like only 10% of the file, if that). That's what happens when you
write documentation and unit tests that are thorough.

- Jonathan M Davis

Feb 12 2018

"Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:

On 02/12/2018 10:49 PM, Jonathan M Davis wrote:
 
 Andrei used to complain periodically about how large std.datetime was,
 thinking that it was way too much code, and then someone actually went to
 the effort of stripping out all of the comments and unit tests and whatnot
 to count the actual lines of code in the implementation, and it was a _way_
 smaller number than the lines in the file (IIRC, it might have even been
 something like only 10% of the file, if that). That's what happens when you
 write documentation and unit tests that are thorough.
 

Yea, totally. Another example: mysql-native used to be one (!!) source 
file. It was maybe a bit on the large size for a single module, but it 
was still workable. In the last several years, that library has grown 
many times its old size. But now, I'd say that easily the majority of 
lines are either comments or tests. The *actual* implementation and API 
isn't really all that much more LOC than it used to be. The original 
one-module version, by contrast, was less documented and had...I don't 
think it even had a single test (IIRC, the 
now-old-and-probably-bitrotted "app.d" wasn't even there.)

Feb 12 2018

Kagamin <spam here.lot> writes:

On Tuesday, 13 February 2018 at 02:53:21 UTC, Nick Sabalausky 
(Abscissa) wrote:
 On 02/12/2018 11:15 AM, rikki cattermole wrote:
 
 dxml 7.5k LOC
 std.xml 3k LOC
 
 dxml would make the situation a lot worse.

 4.5k LOC == "a lot worse"?

 Uuuuhhh...WAT?

And it's like 2k LOC of code and 5.5k LOC of tests and docs.

Feb 13 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 07:59:24 H. S. Teoh via Digitalmars-d-announce 
wrote:
 On Mon, Feb 12, 2018 at 07:04:38AM -0700, Jonathan M Davis via
 Digitalmars-d-announce wrote: [...]

 However, if folks as a whole think that Phobos' xml parser needs to
 support the DTD section to be acceptable, then dxml won't replace
 std.xml, because dxml is not going to implement DTD support. DTD
 support fundamentally does not fit in with dxml's design.

 Actually, thinking about this, I'm wondering if a combination of
 preprocessing and/or postprocessing might make it possible to implement
 DTD support without needing to rewrite the guts of dxml. AIUI, dxml does
 parse the DTD section correctly, i.e., as an XML directive, but only
 doesn't look into its internal details. So one way to implement DTD
 support might be:

 - Write an auxiliary parser that's basically a wrapper around dxml,
   forwarding XML events to the caller, except:
 - If a DTD event is encountered, eagerly parse it, store DTD
   declarations internally for future reference.
 - If there's a DTD that has been seen, perform on-the-fly validation as
   XML events are forwarded.
 - In PCDATA sections, if there are entity references to the DTD, expand
   them, possibly inserting more XML events into the stream based on
   what's defined in the DTD. (This may need to reuse some dxml internals
   to parse XML snippets that might be contained in an entity definition,
   for example.)

The core problem is that entity references get replaced with more XML that
needs to be parsed. So, they can't simply be passed on for post-processing.
As I understand it, they have to be replaced while the parsing is going on.
And that means that you can't do something like return slices of the
original input that don't bother with the entity references and then have a
separate parser take that and process it further to deal with the entity
references. The first parser has to deal with them, and that means not
returning slices of the original input unless you're dealing purely with
strings and are willing to allocate new strings in the cases where the data
needs to be mutated because of an entity reference.

If we were going to stick to strings and only strings, it would be quite
possible to define the API in a way that it may or may not do DTD
processing, but that doesn't work with arbitrary ranges of characters, not
unless you give up on returning slices of the original input, and that means
harming the performance and usability for the common case in order to
support DTDs.

Also, anything that has the concept of "events" would be drastically
different from what dxml does. dxml is completely range-based. It has no
callbacks or anything of the sort, and having anything like that would
complicate it considerably.

There are lots of interesting things that could be done to try and deal with
the DTD section, but they fundamentally don't work with returning slices of
the original input unless you're only using strings.

In any case, I refuse to change dxml so that it has DTD support, and I
refuse to change it so that it doesn't return slices of the original input.
If I were to do so, it would make the parser worse for any use case I care
about and require a lot of time and effort on my part that I'm not willing
to spend. So, if that makes it so that dxml is never included in Phobos,
then so be it.

Folks are free to decide to support dxml for inclusion when the time comes
and free to vote it as unacceptable. Personally, I think that dxml's
approach is ideal for XML that doesn't use entity references, and I'd much
rather use that kind of parser regardless of whether it's in the standard
library or not. I think that the D community would be far better off with
std.xml being replaced by dxml, but whatever happens happens. I'd be just as
fine with a decision to remove std.xml and not include dxml. I'm less fine
with std.xml being left in Phobos and dxml being rejected, because std.xml
has been recognized as bad, and it sure doesn't look like anyone else is
going to write a replacement any time soon. I also think that dxml's
approach is better for the common case than anything that supported DTDs
would be, so I think that having dxml's solution in Phobos would be better
for the community even if Phobos also had a solution that supported DTDs,
but at this point, it looks like the options are going to be

1. std.xml stays and continues to suck.
2. std.xml gets ripped out and dxml replaces it.
3. std.xml gets ripped out and we have no xml solution in Phobos.

But as it stands, it doesn't seem likely that any XML solution that supports
DTDs being in Phobos is likely to happen any time soon, if ever, because
AFAIK, only three people have put in any real effort towards replacing
std.xml since 2010 or whenever it was that we decided it needed to be
replaced. The first two people both disappeared into oblivion without ever
finishing, and here I am with a working StAX parser (now with DOM support)
and an XML writer in the works - and given how involved I am with D, I think
that it's pretty unlikely that I'm disappearing anywhere short of getting
hit by a bus or whatnot. So, at least I've actually put in the time and
effort towards a solution and made it available, and it will almost
certainly be an essentially complete solution by the time that dconf rolls
around if not well before.

So, I do expect that the question of Phobos inclusion will ultimately be a
question of whether std.xml _ever_ gets replaced, but regardless, at least
there is a solution, and it will continue to be available as a 3rd party
library even if it never makes it into Phobos.

- Jonathan M Davis

Feb 12 2018

Kagamin <spam here.lot> writes:

On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis 
wrote:
 The core problem is that entity references get replaced with 
 more XML that needs to be parsed. So, they can't simply be 
 passed on for post-processing. As I understand it, they have to 
 be replaced while the parsing is going on. And that means that 
 you can't do something like return slices of the original input 
 that don't bother with the entity references and then have a 
 separate parser take that and process it further to deal with 
 the entity references. The first parser has to deal with them, 
 and that means not returning slices of the original input 
 unless you're dealing purely with strings and are willing to 
 allocate new strings in the cases where the data needs to be 
 mutated because of an entity reference.

Standard entities like &amp; have the same problem, so the same 
solution should work too.

Feb 13 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, February 13, 2018 15:22:32 Kagamin via Digitalmars-d-announce 
wrote:
 On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis

 wrote:
 The core problem is that entity references get replaced with
 more XML that needs to be parsed. So, they can't simply be
 passed on for post-processing. As I understand it, they have to
 be replaced while the parsing is going on. And that means that
 you can't do something like return slices of the original input
 that don't bother with the entity references and then have a
 separate parser take that and process it further to deal with
 the entity references. The first parser has to deal with them,
 and that means not returning slices of the original input
 unless you're dealing purely with strings and are willing to
 allocate new strings in the cases where the data needs to be
 mutated because of an entity reference.

 Standard entities like &amp; have the same problem, so the same
 solution should work too.

That depends on what exactly an entity reference can contain. If it can do
something like put a start tag in there, and then it has to be terminated by
the document putting an end tag in there or another entity reference
containing an end tag, then it can't be handled after the fact like &amp;
can be, since &amp; is just replaced by text. If an entity reference can't
contain a start tag without a matching end tag, then sure. But I find the
XML spec to be surprisingly hard to understand with regards to entity
references. It's not clear to me where it's even legal to put them or not,
let alone what you're allowed to put in them exactly. And I can't even
really trust the XML gramamr as long as entity references are involved,
because the gramamr in the spec is the grammar _after_ entity references
have all been replaced, which I was quite dismayed to figure out.

If it's 100% sure that entity references can be treated as just text and
that you can't end up with stuff like start tags or end tags being inserted
and messing with the parsing such that they all have to be replaced for the
XML to be correctly parsed, then I have no problem passing entity references
along, and a higher level parser could try to do something with them, but
it's not clear to me at all that an XML document with entity references is
correct enough to be parsed while not replacing the entity references with
whatever XML markup they contain. I had originally passed them along with
the idea that a higher level parser could do something with them, but I
decided that I couldn't do that if you could do something like drop a start
tag in there and change the meaning of the stuff that needs to be parsed
that isn't directly in the entity reference.

- Jonathan M Davis

Feb 13 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 13 February 2018 at 20:10:59 UTC, Jonathan M Davis 
wrote:
 On Tuesday, February 13, 2018 15:22:32 Kagamin via 
 Digitalmars-d-announce wrote:
 On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis

 wrote:
 The core problem is that entity references get replaced with 
 more XML that needs to be parsed. So, they can't simply be 
 passed on for post-processing. As I understand it, they have 
 to be replaced while the parsing is going on. And that means 
 that you can't do something like return slices of the 
 original input that don't bother with the entity references 
 and then have a separate parser take that and process it 
 further to deal with the entity references. The first parser 
 has to deal with them, and that means not returning slices 
 of the original input unless you're dealing purely with 
 strings and are willing to allocate new strings in the cases 
 where the data needs to be mutated because of an entity 
 reference.

 Standard entities like &amp; have the same problem, so the 
 same solution should work too.

 That depends on what exactly an entity reference can contain. 
 If it can do something like put a start tag in there, and then 
 it has to be terminated by the document putting an end tag in 
 there or another entity reference containing an end tag, then 
 it can't be handled after the fact like &amp; can be, since 
 &amp; is just replaced by text. If an entity reference can't 
 contain a start tag without a matching end tag, then sure. But 
 I find the XML spec to be surprisingly hard to understand with 
 regards to entity references. It's not clear to me where it's 
 even legal to put them or not, let alone what you're allowed to 
 put in them exactly. And I can't even really trust the XML 
 gramamr as long as entity references are involved, because the 
 gramamr in the spec is the grammar _after_ entity references 
 have all been replaced, which I was quite dismayed to figure 
 out.

 If it's 100% sure that entity references can be treated as just 
 text and that you can't end up with stuff like start tags or 
 end tags being inserted and messing with the parsing such that 
 they all have to be replaced for the XML to be correctly 
 parsed, then I have no problem passing entity references along, 
 and a higher level parser could try to do something with them, 
 but it's not clear to me at all that an XML document with 
 entity references is correct enough to be parsed while not 
 replacing the entity references with whatever XML markup they 
 contain. I had originally passed them along with the idea that 
 a higher level parser could do something with them, but I 
 decided that I couldn't do that if you could do something like 
 drop a start tag in there and change the meaning of the stuff 
 that needs to be parsed that isn't directly in the entity 
 reference.

There's also the issue that entity references open a whole can of 
worms concerning security. It quite possible to have an 
exponential growing entity replacement that can take down any 
parser.

<!DOCTYPE root [
  <!ELEMENT root ANY>
  <!ENTITY LOL "LOL">
  <!ENTITY LOL1 
"&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;">
  <!ENTITY LOL2 
"&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;">
  <!ENTITY LOL3 
"&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;">
  <!ENTITY LOL4 
"&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;">
  <!ENTITY LOL5 
"&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;">
  <!ENTITY LOL6 
"&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;">
  <!ENTITY LOL7 
"&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;">
  <!ENTITY LOL8 
"&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;">
  <!ENTITY LOL9 
"&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;">
]>
<root>&LOL9;</root>

Hope you have enough memory (this expands to a 3 000 000 000 
LOL's)

Feb 13 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, February 13, 2018 21:18:12 Patrick Schluter via Digitalmars-d-
announce wrote:
 There's also the issue that entity references open a whole can of
 worms concerning security. It quite possible to have an
 exponential growing entity replacement that can take down any
 parser.

Well, if dxml just passes the entity references along unparsed beyond
validating that the entity reference itself contains valid characters (e.g.
it's not something like &.; or & by itself), then dxml would still not be
replacing the entity references with anything. Any security or performance
problems associated with entity references would be left up to whatever
parser parsed the DTD section and then used dxml to parse the rest of the
XML and replaced the entity references in dxml's parsing results with
whatever they were.

The big problem is how the entity references affect the parsing. If start
tags can be dropped in and affect the parsing (and it's still not clear to
me from the spec whether that's legal - there is a section talking about
being nested properly which might indicate that that's not legal, but it's
not very specific or clear), and if it's legal to do something like use an
entity reference for a tag name - e.g. <&foo;>, then that's a serious
problem. And problems like that are the main reason why I completely dropped
any attempt to do anything with the DTD section.

If entity references are only legal in the text between start and end tags
and between the quotes of attribute values, and whatever they're replaced
with cannot actually affect anything else in the XML document (i.e. it can't
just be a start or end tag or anything like that - it has to be fulling
parseable on its own and not affect the parsing of the document itself),
then passing them along should be fine.

Basically, if I can change dxml so that in the places where it currently
allows one of the standard entity references to be, it then also allows
other entity references but passes them along without replacing them instead
of throwing an XMLParsingException, and that works without having documents
be screwed up due to missing start tags or something, then passing them
along should be fine. But if entity references allow arbitrary enough chunks
of XML, that doesn't work. It also doesn't work if entity references are
allowed in places other than the text between start and end tags or within
attribute values. And it's not clear to me at all what is legal in an entity
reference or where exactly they're legal. The spec talks about the grammar
being the grammar _after_ all of the references have been replaced, which
makes the grammar rather untrustworthy, and I find the spec very hard to
understand in general.

Regardless, there's no risk of dxml's parser ever being changed to actually
replace entity references. That doesn't work with returning slices of the
original input, and it really doesn't work with a parser that's just
supposed to take a range of characters and parse it. To fully handle all of
the DTD stuff means actually reading files from disk or from the internet -
which of course is where the security problems come in, but it also means
that you're not just dealing with a parser anymore. In principle, dxml's
parser should be pure (though some implementation make it so that it isn't
right now), whereas an XML parser that fully handles the DTD section could
never be pure.

- Jonathan M Davis

Feb 13 2018

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Tuesday, 13 February 2018 at 22:00:59 UTC, Jonathan M Davis 
wrote:
 On Tuesday, February 13, 2018 21:18:12 Patrick Schluter via 
 Digitalmars-d- announce wrote:
 [...]

 Well, if dxml just passes the entity references along unparsed 
 beyond validating that the entity reference itself contains 
 valid characters (e.g. it's not something like &.; or & by 
 itself), then dxml would still not be replacing the entity 
 references with anything. Any security or performance problems 
 associated with entity references would be left up to whatever 
 parser parsed the DTD section and then used dxml to parse the 
 rest of the XML and replaced the entity references in dxml's 
 parsing results with whatever they were.

 The big problem is how the entity references affect the 
 parsing. If start tags can be dropped in and affect the parsing 
 (and it's still not clear to me from the spec whether that's 
 legal - there is a section talking about being nested properly 
 which might indicate that that's not legal, but it's not very 
 specific or clear), and if it's legal to do something like use 
 an entity reference for a tag name - e.g. <&foo;>, then that's 
 a serious problem. And problems like that are the main reason 
 why I completely dropped any attempt to do anything with the 
 DTD section.

Yikes! In any case, even if I had to implement a parser I would 
tend to not implement this "feature" as it sounds quite 
unreasonable. Only if a real need (i.e. one in the real world, 
not one that could be contrived out of the specs) arises would I 
then potentially implement the real deal.

Feb 14 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, February 14, 2018 10:03:45 Patrick Schluter via Digitalmars-d-
announce wrote:
 On Tuesday, 13 February 2018 at 22:00:59 UTC, Jonathan M Davis

 wrote:
 On Tuesday, February 13, 2018 21:18:12 Patrick Schluter via

 Digitalmars-d- announce wrote:
 [...]

 Well, if dxml just passes the entity references along unparsed
 beyond validating that the entity reference itself contains
 valid characters (e.g. it's not something like &.; or & by
 itself), then dxml would still not be replacing the entity
 references with anything. Any security or performance problems
 associated with entity references would be left up to whatever
 parser parsed the DTD section and then used dxml to parse the
 rest of the XML and replaced the entity references in dxml's
 parsing results with whatever they were.

 The big problem is how the entity references affect the
 parsing. If start tags can be dropped in and affect the parsing
 (and it's still not clear to me from the spec whether that's
 legal - there is a section talking about being nested properly
 which might indicate that that's not legal, but it's not very
 specific or clear), and if it's legal to do something like use
 an entity reference for a tag name - e.g. <&foo;>, then that's
 a serious problem. And problems like that are the main reason
 why I completely dropped any attempt to do anything with the
 DTD section.

 Yikes! In any case, even if I had to implement a parser I would
 tend to not implement this "feature" as it sounds quite
 unreasonable. Only if a real need (i.e. one in the real world,
 not one that could be contrived out of the specs) arises would I
 then potentially implement the real deal.

Well, since folks other than me are going to use this parser, and it's even
potentially going to end up in D's standard library, it needs to at least be
good enough to not let through invalid XML or incorrectly interpret any XML.
It can potentially not support portions of the spec as long as it does so in
a clear and clean manner, but it's going to have to correctly handle
anything that it does handle.

For better or worse, I'm the sort of person who prefers to completely
implement a spec when I'm implementing one, but in this case, it wasn't
really reasonable. Fortunately however, from the perspective of implementing
something that's useful for me personally, the DTD section is completely
unnecessary. From that perspective, processing instructions and CDATA
sections are also unnecessary, since I'd never do anythnig with them, but I
don't think that it would be reasonable to skip those, so they're
implemented. And it's not like they're hard to implement support for, unlike
the DTD section.

- Jonathan M Davis

Feb 14 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Feb 13, 2018 at 09:18:12PM +0000, Patrick Schluter via
Digitalmars-d-announce wrote:
 On Tuesday, 13 February 2018 at 20:10:59 UTC, Jonathan M Davis wrote:

[...]
 If it's 100% sure that entity references can be treated as just text
 and that you can't end up with stuff like start tags or end tags
 being inserted and messing with the parsing such that they all have
 to be replaced for the XML to be correctly parsed, then I have no
 problem passing entity references along, and a higher level parser
 could try to do something with them, but it's not clear to me at all
 that an XML document with entity references is correct enough to be
 parsed while not replacing the entity references with whatever XML
 markup they contain. I had originally passed them along with the
 idea that a higher level parser could do something with them, but I
 decided that I couldn't do that if you could do something like drop
 a start tag in there and change the meaning of the stuff that needs
 to be parsed that isn't directly in the entity reference.


This made me go to the W3C spec (https://www.w3.org/TR/xml/) to figure
out what exactly is/isn't defined.  I discovered to my chagrin that XML
entities are a huge rabbit hole with extremely pathological behaviour
that makes it almost impossible to implement in any way that's even
remotely efficient.

Here's a page with examples of how nasty it can get:

	http://www.floriankaeferboeck.at/XML/Comparison.html

Here's an example given in the W3C spec itself:

	<?xml version='1.0'?>
	<!DOCTYPE test [
	<!ELEMENT test (#PCDATA) >


	%xx;
	]>
	<test>This sample shows a &tricky; method.</test>

A correct XML parser is supposed to produce the following text as the
body of the <test>...</test> tag (the grammatical error is intentional):

	This sample shows a error-prone method.


Fortunately, there's a glimmer of hope on the horizon: in section 4.3.2
of the spec (https://www.w3.org/TR/xml/#wf-entities), it is explicitly
stated:

	A consequence of well-formedness in general entities is that the
	logical and physical structures in an XML document are properly
	nested; no start-tag, end-tag, empty-element tag, element,
	comment, processing instruction, character reference, or entity
	reference can begin in one entity and end in another.

Meaning, if I understand it correctly, that you can't have a start tag
in &entity1; and its corresponding end tag in &entity2;, and then have
your document contain "&entity1; &entity2;".  This is because the body
of the entity can only contain text or entire tags (the production
"content" in the spec); an entity that contains an open tag without an
end tag (or vice versa) does not match this rule and is thus illegal.

So this means that we *can* use dxml as a backend to drive a
DTD-supporting XML parser implementation.  The wrapper / higher-level
parser would scan the slices returned by dxml for entity references, and
substitute them accordingly, which may involve handing the body of the
entity to another instance of dxml to parse any tags that may be nested
in there.

The nastiness involving partially-formed entity references (as seen in
the above examples) apparently only applies inside the DOCTYPE
declaration, so AIUI this can be handled by the higher-level parser as
part of replacing inline entities with their replacement text.

(The higher-level parser has a pretty tall order to fill, though,
because entities can refer to remote resources via URI, meaning that an
innocuous-looking 5-line XML file can potentially expand to terabytes of
XML tags downloaded from who knows how many external resources
recursively. Not to mention a bunch of security issues like described
below.)


 There's also the issue that entity references open a whole can of
 worms concerning security. It quite possible to have an exponential
 growing entity replacement that can take down any parser.
 
 <!DOCTYPE root [
  <!ELEMENT root ANY>
  <!ENTITY LOL "LOL">
  <!ENTITY LOL1 "&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;">
  <!ENTITY LOL2
 "&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;">
  <!ENTITY LOL3
 "&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;">
  <!ENTITY LOL4
 "&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;">
  <!ENTITY LOL5
 "&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;">
  <!ENTITY LOL6
 "&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;">
  <!ENTITY LOL7
 "&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;">
  <!ENTITY LOL8
 "&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;">
  <!ENTITY LOL9
 "&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;">
 ]>
 <root>&LOL9;</root>
 
 Hope you have enough memory (this expands to a 3 000 000 000 LOL's)

[...]

Yeah, after reading through relevant portions of the spec, I have to say
that full DTD support is a HUGE can of worms.  I tip my hats off in
advance to the brave soul (or poor fool :-P) who would attempt to
implement the spec in full. :-D

There are ways to deal with exponential entity growth, e.g., if the
expansion was carried out lazily.  But it's still a DOS vulnerability if
the software then spins practically forever trying to traverse the huge
range of stuff being churned out.

Not to mention that having embedded external references is itself a
security issue, particular since the partial entity formation thing can
be used to obfuscate the real URI of a referenced entity, so you could
potentially trick a remote XML parser to download stuff from
questionable sources.  It could be used as a covert surveillance method,
for example, or a malware delivery vector, if combined with an
exploitable bug in the parser code.  Or it could be used to read
sensitive files (e.g., if an entity references file:///etc/passwd or
some such system file).  Ick.

Ironically, the general advice I found online w.r.t XML vulnerabilities
is "don't allow DTDs", "don't expand entities", "don't resolve
externals", etc..  There also aren't many XML parsers out there that
fully support all the features called for in the spec.  IOW, this
basically amounts to "just use dxml and forget about everything else".
:-D

Now of course, there *are* valid use cases for DTDs... but a na�ve
implementation of the spec is only going to end in tears.  My current
inclination is, just merge dxml into Phobos, then whoever dares
implement DTD support can do so on top of dxml, and shoulder their own
responsibility for vulnerabilities or whatever.  (I mean, seriously,
just for the sake of being able to say "my XML is validated" we have to
implement network access, local filesystem access, a security framework,
and what amounts to a sandbox to control pathological behaviour like
exponentially recursive entities?  And all of this, just to handle rare
corner cases?  That's completely ridiculous.  It's an obvious design
smell to me.  The only thing missing from this poisonous mix is Turing
completeness, which would have made XML hackers' heaven.  Oh wait, on
further googling, I see that XSLT *is* Turing complete.  Great, just
great.   Now I know why I've always had this gut feeling that
*something* is off about the whole XML mania.)


T

-- 
English is useful because it is a mess. Since English is a mess, it maps well
onto the problem space, which is also a mess, which we call reality. Similarly,
Perl was designed to be a mess, though in the nicest of all possible ways. --
Larry Wall

Feb 13 2018

Chris <wendlec tcd.ie> writes:

On Tuesday, 13 February 2018 at 22:13:36 UTC, H. S. Teoh wrote:

 Ironically, the general advice I found online w.r.t XML 
 vulnerabilities is "don't allow DTDs", "don't expand entities", 
 "don't resolve externals", etc..  There also aren't many XML 
 parsers out there that fully support all the features called 
 for in the spec.  IOW, this basically amounts to "just use dxml 
 and forget about everything else". :-D

 Now of course, there *are* valid use cases for DTDs... but a 
 naïve implementation of the spec is only going to end in tears.
  My current inclination is, just merge dxml into Phobos, then 
 whoever dares implement DTD support can do so on top of dxml, 
 and shoulder their own responsibility for vulnerabilities or 
 whatever.  (I mean, seriously, just for the sake of being able 
 to say "my XML is validated" we have to implement network 
 access, local filesystem access, a security framework, and what 
 amounts to a sandbox to control pathological behaviour like 
 exponentially recursive entities?  And all of this, just to 
 handle rare corner cases?  That's completely ridiculous.  It's 
 an obvious design smell to me.  The only thing missing from 
 this poisonous mix is Turing completeness, which would have 
 made XML hackers' heaven.  Oh wait, on further googling, I see 
 that XSLT *is* Turing complete.  Great, just great.   Now I 
 know why I've always had this gut feeling that *something* is 
 off about the whole XML mania.)


 T

Thanks for the analysis. I'd say you're right. It makes no sense 
to keep dxml from becoming std.xml's successor only because it 
doesn't support DTDs. Also, as I said before, if we had DTD 
support in std.xml, people would complain about the lack of 
efficiency, and the discussion about interpreting the specs 
correctly, implementing them 100%, complaints about the lack of 
security would just never end.

Feb 14 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Feb 13, 2018 at 03:00:59PM -0700, Jonathan M Davis via
Digitalmars-d-announce wrote:
[...]
 The big problem is how the entity references affect the parsing. If
 start tags can be dropped in and affect the parsing (and it's still
 not clear to me from the spec whether that's legal - there is a
 section talking about being nested properly which might indicate that
 that's not legal, but it's not very specific or clear), and if it's
 legal to do something like use an entity reference for a tag name -
 e.g. <&foo;>, then that's a serious problem. And problems like that
 are the main reason why I completely dropped any attempt to do
 anything with the DTD section.

AFAICT, section 4.3.2 in the spec (probably the one you're referring to)
seems to be saying that you can't do that:

	A consequence of well-formedness in general entities is that the
	logical and physical structures in an XML document are properly
	nested; no start-tag, end-tag, empty-element tag, element,
	comment, processing instruction, character reference, or entity
	reference can begin in one entity and end in another.


 If entity references are only legal in the text between start and end
 tags and between the quotes of attribute values, and whatever they're
 replaced with cannot actually affect anything else in the XML document
 (i.e. it can't just be a start or end tag or anything like that - it
 has to be fulling parseable on its own and not affect the parsing of
 the document itself), then passing them along should be fine.

That's the approach I'm thinking of.


[...]
 Regardless, there's no risk of dxml's parser ever being changed to
 actually replace entity references. That doesn't work with returning
 slices of the original input, and it really doesn't work with a parser
 that's just supposed to take a range of characters and parse it. To
 fully handle all of the DTD stuff means actually reading files from
 disk or from the internet - which of course is where the security
 problems come in, but it also means that you're not just dealing with
 a parser anymore. In principle, dxml's parser should be pure (though
 some implementation make it so that it isn't right now), whereas an
 XML parser that fully handles the DTD section could never be pure.

[...]

Given the insane complexities of DTD that I'm only slowly beginning to
grasp from actually reading the spec, I'm quickly adopting the opinion
that dxml should remain as-is, and any DTD implementation should be
layered on top.  The only potential changes that might be needed is:

- provide a way to parse XML snippets that don't have a <?xml ...>
  declaration, so that a DTD implementation could, for example, hand an
  entity body over to dxml to extract any tags that may be nested in
  there (and if my reading of section 4.3.2 is correct, all such tags
  must always be closed inside the entity body, so there should be no
  errors produced).

- provide some way of hooking into non-default entities so that
  DTD-defined entities can be expanded by the DTD implementation.  This
  could be as simple as leaving such entities untouched in the returned
  range, or invent a special EntityType representing such entities (with
  a slice of the input containing the entity name) so that the DTD
  implementation can insert the replacement text. 

Everything else should be handled by the DTD layer, e.g., parsing the
DOCTYPE section (which is itself pretty pathological, given the actual
examples in the W3C spec to this effect), expanding entities, looking up
external entities, limiting recursive entity expansion, implementing a
security model, etc..


T

-- 
Why do conspiracy theories always come from the same people??

Feb 13 2018

Kagamin <spam here.lot> writes:

On Tuesday, 13 February 2018 at 22:29:27 UTC, H. S. Teoh wrote:
 - provide some way of hooking into non-default entities so that
   DTD-defined entities can be expanded by the DTD 
 implementation.

The parser now returns raw text, entity replacement can be done 
by DTD processor without any modification of API. So it's good 
for experimental if there's incentive to maintain it, but it's 
purely PR problem: there's nothing wrong in having xml support in 
dub registry and std.xml in phobos, if phobos is ok with it, it 
can stay as is.
It looks like EntityRange requires forward range, is it ok for a 
parser?

Feb 14 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, February 14, 2018 10:14:44 Kagamin via Digitalmars-d-announce 
wrote:
 It looks like EntityRange requires forward range, is it ok for a
 parser?

It's very difficult in general to write a parser that isn't at least a
forward range, because without that, you're stuck at only one character of
look ahead unless you play a lot of games with putting data from the input
range in a buffer so that you can keep it around to look at it again after
you've looked farther ahead.

Honestly, pure input ranges are borderline useless for a _lot_ of cases.
It's generally only the cases where you only care about operating on each
element individually irrespective of what's going on with other elements in
the range that pure input ranges are really useable, and parsing definitely
doesn't fall into that camp.

- Jonathan M Davis

Feb 14 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 14/02/2018 10:32 AM, Jonathan M Davis wrote:
 On Wednesday, February 14, 2018 10:14:44 Kagamin via Digitalmars-d-announce
 wrote:
 It looks like EntityRange requires forward range, is it ok for a
 parser?

 
 It's very difficult in general to write a parser that isn't at least a
 forward range, because without that, you're stuck at only one character of
 look ahead unless you play a lot of games with putting data from the input
 range in a buffer so that you can keep it around to look at it again after
 you've looked farther ahead.
 
 Honestly, pure input ranges are borderline useless for a _lot_ of cases.
 It's generally only the cases where you only care about operating on each
 element individually irrespective of what's going on with other elements in
 the range that pure input ranges are really useable, and parsing definitely
 doesn't fall into that camp.
 
 - Jonathan M Davis

See lines:
- Input!IR temp = input;
- input = temp;

            bool commentLine() {
		Input!IR temp = input;

		if (!temp.empty && temp.front.c == '/') {
			temp.popFront;
			if (!temp.empty && temp.front.c == '/')
				temp.popFront;
			else
				return false;
		} else
			return false;

		if (!temp.empty) {
			size_t endOffset = temp.front.location.fileOffset;

			while(temp.front.location.lineOffset != 0) {
				endOffset = temp.front.location.fileOffset;
				temp.popFront;

				if (temp.empty) {
					endOffset++;
					break;
				}
			}
			
			current.type = Token.Type.Comment_Line;
			current.location = input.front.location;
			current.location.length = endOffset - input.front.location.fileOffset;
			
			input = temp;
			return true;
		} else
			return false;
	}

Feb 14 2018

Adrian Matoga <dlang.spam matoga.info> writes:

On Wednesday, 14 February 2018 at 10:57:26 UTC, rikki cattermole 
wrote:
 See lines:
 - Input!IR temp = input;
 - input = temp;

            bool commentLine() {
 		Input!IR temp = input;

 (...)
 		if (!temp.empty) {
 (...)		
 			input = temp;
 			return true;
 		} else
 			return false;
 	}

`temp = input.save` is exactly what you want here, which means 
forward range is required. Your example won't work for range 
objects with reference semantics.

Feb 14 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 14/02/2018 2:02 PM, Adrian Matoga wrote:
 On Wednesday, 14 February 2018 at 10:57:26 UTC, rikki cattermole wrote:
 See lines:
 - Input!IR temp = input;
 - input = temp;

            bool commentLine() {
         Input!IR temp = input;

 (...)
         if (!temp.empty) {
 (...)
             input = temp;
             return true;
         } else
             return false;
     }

 
 `temp = input.save` is exactly what you want here, which means forward 
 range is required. Your example won't work for range objects with 
 reference semantics.

Ah I must be thinking of ranges that support indexing.

Feb 14 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Wednesday, February 14, 2018 14:09:21 rikki cattermole via Digitalmars-d-
announce wrote:
 On 14/02/2018 2:02 PM, Adrian Matoga wrote:
 On Wednesday, 14 February 2018 at 10:57:26 UTC, rikki cattermole wrote:
 See lines:
 - Input!IR temp = input;
 - input = temp;

            bool commentLine() {
         Input!IR temp = input;

 (...)
         if (!temp.empty) {
 (...)
             input = temp;
             return true;
         } else
             return false;
     }

 `temp = input.save` is exactly what you want here, which means forward
 range is required. Your example won't work for range objects with
 reference semantics.

 Ah I must be thinking of ranges that support indexing.

Random access ranges are also forward ranges and would require a call to
save here.

- Jonathan M Davis

Feb 14 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 14/02/2018 5:13 PM, Jonathan M Davis wrote:
 On Wednesday, February 14, 2018 14:09:21 rikki cattermole via Digitalmars-d-
 announce wrote:
 On 14/02/2018 2:02 PM, Adrian Matoga wrote:
 On Wednesday, 14 February 2018 at 10:57:26 UTC, rikki cattermole wrote:
 See lines:
 - Input!IR temp = input;
 - input = temp;

             bool commentLine() {
          Input!IR temp = input;

 (...)
          if (!temp.empty) {
 (...)
              input = temp;
              return true;
          } else
              return false;
      }

 `temp = input.save` is exactly what you want here, which means forward
 range is required. Your example won't work for range objects with
 reference semantics.

 Ah I must be thinking of ranges that support indexing.

 
 Random access ranges are also forward ranges and would require a call to
 save here.
 
 - Jonathan M Davis
 

Luckily in my code I can forget that ;)

Feb 14 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, February 15, 2018 01:55:28 rikki cattermole via Digitalmars-d-
announce wrote:
 On 14/02/2018 5:13 PM, Jonathan M Davis wrote:
 On Wednesday, February 14, 2018 14:09:21 rikki cattermole via
 Digitalmars-d->
 announce wrote:
 On 14/02/2018 2:02 PM, Adrian Matoga wrote:
 On Wednesday, 14 February 2018 at 10:57:26 UTC, rikki cattermole 




wrote:
 See lines:
 - Input!IR temp = input;
 - input = temp;

             bool commentLine() {

          Input!IR temp = input;

 (...)

          if (!temp.empty) {

 (...)

              input = temp;
              return true;

          } else

              return false;

      }

 `temp = input.save` is exactly what you want here, which means forward
 range is required. Your example won't work for range objects with
 reference semantics.

 Ah I must be thinking of ranges that support indexing.

 Random access ranges are also forward ranges and would require a call to
 save here.

 - Jonathan M Davis

 Luckily in my code I can forget that ;)

LOL. That's actually part of what makes writing range-based libraries so
much harder to get right than simply using ranges in your program. When a
piece of code is used with only a few types of ranges (or even only one type
of range, as is often the case), then it's generally not very hard to write
code that works just fine, but as soon as you have to worry about arbitrary
ranges, you get all kinds of nonsense that you have to worry about in order
to make sure that the code works correctly for any range that's passed to
it. save is the classic example of something that a lot of range-based code
gets wrong, because for most ranges, it really doesn't matter, but for those
ranges where it does, a single missed call to save results in code that
doesn't work properly. To get it right, you basically have to call save
every time you pass a range to a range-based function that is not supposed
to consume the range, and folks rarely get that right. Certainly, pretty
much any range-based code that doesn't have unit tests which include
reference-type ranges is going to be wrong for reference-type ranges. Even
Phobos has had quite a few issues with that historically.

- Jonathan M Davis

Feb 14 2018

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 15 February 2018 at 02:40:03 UTC, Jonathan M Davis 
wrote:
 LOL. That's actually part of what makes writing range-based 
 libraries so much harder to get right than simply using ranges 
 in your program. [snip]

That sounds like an interesting topic for a blog post.

Feb 15 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, February 13, 2018 14:13:36 H. S. Teoh via Digitalmars-d-announce 
wrote:
 Great, just
 great.   Now I know why I've always had this gut feeling that
 *something* is off about the whole XML mania.)

Well, there are plenty of folks who talk like XML is a pile of steaming muck
that should never be used (and then usually talk about how great JSON is). I
think that basic XML is actually pretty okay - basically the subset that
dxml supports, though if I were designing XML I'd take it a bit further.

Personally, I'd make XML documents completely recursive - meaning that the
top level is the same as any deeper level, so you could have as many element
tags at the top level as you want and as much text as you want, whereas XML
requires a root element and only allows stuff like processing instructions,
comments, and the DOCTYPE stuff outside of the root element.

I'd get rid of the <?xml...?> and <!DOCTYPE...> declarations as well as
processing instructions, and I'd probably get rid of the CDATA section in
favor of escaping characters with backslashes like you typically do in
strings (or in JSON), and related to that, I'd get rid of the predefined
entity references, making stuff like & legal. I also might get rid of empty
element tags becase they're annoying to deal with when parsing, but they do
reduce the verbosity of the document such that they might be worth keeping.
It's also tempting to get rid of the tag name on end tags, which would
actually make parsing much easier, but having them helps the legibility of
XML documents, and it's a bit like semicolons in D in the sense that they
can help ensure that error messages refer to the right thing rather than
something later in the document, so I don't know. I'd also allow all Unicode
characters instead of disallowing a number of them, since it won't really
matter for most documents, and then the parser doesn't need to care about
them when validating.

So, basically, you end up with start tags, end tags, and comments, with
start tags optionally having attributes. backslashes would then be used for
escaping stuff, and you end up with something pretty dead simple.

However, as you're finding out when reading through the XML spec, the folks
who created XML didn't think like that at all, and were clearly coming from
a _very_ different point of view as to what an XML document was for and
should contain. But as you might imagine, given my take on what XML should
have been, finding out in detail what XML actually _is_ was pretty
horrifying.

I started dxml with the intention of fully implementing all aspects of the
spec but ultimately decided that it simply wasn't worth it.

- Jonathan M Davis

Feb 13 2018

nkm1 <t4nk074 openmailbox.org> writes:

On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis 
wrote:
 Folks are free to decide to support dxml for inclusion when the 
 time comes and free to vote it as unacceptable. Personally, I 
 think that dxml's approach is ideal for XML that doesn't use 
 entity references, and I'd much rather use that kind of parser 
 regardless of whether it's in the standard library or not. I 
 think that the D community would be far better off with std.xml 
 being replaced by dxml, but whatever happens happens.

Bump!
I'm using dxml now, and it's a very good library. So I thought 
"it should be in Phobos instead of std.xml" and searched the 
newsgroup. Sorry for necroposting. Anyway, what I wanted to say 
is just take an example from Perl and call it std.xml.simple. 
Then people would know what to expect from it and would use it 
(because everyone likes simple). That would also leave a way to 
include std.xml.full (or some such) at some indefinite point in 
the future. Which is, in practice, probably never - and that's 
fine, because who needs DTD? screw it...
Anyway, thanks for the library, Jonathan.

Aug 30 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Aug 30, 2018 at 07:26:28PM +0000, nkm1 via Digitalmars-d-announce wrote:
 On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis wrote:
 Folks are free to decide to support dxml for inclusion when the time
 comes and free to vote it as unacceptable. Personally, I think that
 dxml's approach is ideal for XML that doesn't use entity references,
 and I'd much rather use that kind of parser regardless of whether
 it's in the standard library or not. I think that the D community
 would be far better off with std.xml being replaced by dxml, but
 whatever happens happens.


+1.  I vote for adding dxml to Phobos.


[...]
 I'm using dxml now, and it's a very good library. So I thought "it
 should be in Phobos instead of std.xml" and searched the newsgroup.
 Sorry for necroposting. Anyway, what I wanted to say is just take an
 example from Perl and call it std.xml.simple. Then people would know
 what to expect from it and would use it (because everyone likes
 simple). That would also leave a way to include std.xml.full (or some
 such) at some indefinite point in the future. Which is, in practice,
 probably never - and that's fine, because who needs DTD? screw it...

[...]

That's a good idea, actually.  That will stop people who expect full
DTD support from complaining that it's not supported by the standard
library.

I vote for adding dxml to Phobos as std.xml.simple.  We can either leave
std.xml as-is, or deprecate it and work on std.xml.full (or
std.xml.complex, or whatever).  The current state of std.xml gives a
poor impression to anyone coming to D the first time and wanting to work
with XML, and having std.xml.simple would be a big plus.


T

-- 
This is not a sentence.

Sep 13 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Feb 12, 2018 at 09:50:16AM -0700, Jonathan M Davis via
Digitalmars-d-announce wrote:
[...]
 The core problem is that entity references get replaced with more XML
 that needs to be parsed. So, they can't simply be passed on for
 post-processing.  As I understand it, they have to be replaced while
 the parsing is going on.  And that means that you can't do something
 like return slices of the original input that don't bother with the
 entity references and then have a separate parser take that and
 process it further to deal with the entity references. The first
 parser has to deal with them, and that means not returning slices of
 the original input unless you're dealing purely with strings and are
 willing to allocate new strings in the cases where the data needs to
 be mutated because of an entity reference.

[...]

I think you missed my point.

What I'm trying to say is, given the current functionality of dxml, one
*can* build an XML interface that implements DTD support.

Of course, some concessions obviously have to be made, such as needing
to allocate memory (I don't see how else one could keep a dictionary of
DTD rules / entity declarations otherwise, for example), or not being
able to return only slices of the input anymore.  For example, entity
support pretty much means plain slices are no longer an option, because
you have to perform substitution of entity definitions, so you'll have
to either wrap it in some kind of lazy range that chains the entity
definition to the surrounding text, or you'l have to use strings or
something else.  Which means you'll need to have memory allocation /
slower parsing / whatever, but that's the price of DTD support.

But again, the point is, basic XML parsing (without DTD support) doesn't
*need* to pay this price. What's currently in dxml doesn't need to
change. DTD support can be implemented in a submodule / separate module
that wraps around dxml and builds DTD support on top of it.

Put another way, we can implement DTD support *on top of* dxml this way:
- Parse the XML using dxml as an initial step (this can be done lazily,
  or semi-lazily, as needed).
- As an intermediate step, parse the DTD section, construct whatever
  internal state is needed to handle DTD rules, a dictionary of entity
  references, etc..
- Filter the output of dxml to insert whatever extra behaviour is needed
  to implement DTD support before handing it to the calling code, e.g.,
  expand entity references, or implement validation and throw an
  exception if validation fails, etc..

*We don't need to change dxml's current API at all.*

At the most, I anticipate that the only potential change needed is to
expose an interface to parse XML fragments (i.e., not a complete XML
document that contains an outer <xml> tag, but just some PCDATA that may
contain entities or tags) so that the DTD support wrapper can use it to
expand entities and insert any tags that may appear inside the entity
definition.

The DTD wrapper doesn't guarantee (and doesn't need to!) to return
slices of the input like dxml does. I don't see that as a problem, since
I can't see how anyone would be able to implement full DTD support with
only slices, even independently from the way dxml is implemented right
now.

We can even design the DTD support wrapper to start with being just a
thin wrapper around dxml, and lazily switch to full DTD mode only if a
DTD section is encountered.  Then user code that doesn't care to use
dxml's raw API won't even need to care about the difference.


T

-- 
Curiosity kills the cat. Moral: don't be the cat.

Feb 12 2018

Chris <wendlec tcd.ie> writes:

On Monday, 12 February 2018 at 21:51:56 UTC, H. S. Teoh wrote:
[...]
 We can even design the DTD support wrapper to start with being 
 just a thin wrapper around dxml, and lazily switch to full DTD 
 mode only if a DTD section is encountered.  Then user code that 
 doesn't care to use dxml's raw API won't even need to care 
 about the difference.


 T

In this vein, if a new version of std.xml didn't offer pure and 
fast parsing like dxml, but included DTD by default, people would 
complain that that was the real deal breaker (too slow, man!). 
Remember `autodecode`? Right.

DTD inclusion should only be available on demand. Imagine you 
want to implement a library project where ebooks (say classics) 
are catalogued and presented in an ebook reader on the web (or in 
an app on your smart phone). It is likely that the whole DTD 
thing would probably be done at the cataloguing stage, but once 
the books are in the library most users will probably just want 
to go through them page by page or search for quotes etc. - and 
for that you'd need a fast tool like dxml with no overhead.

Feb 13 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 13:51:56 H. S. Teoh via Digitalmars-d-announce 
wrote:
 For example, entity
 support pretty much means plain slices are no longer an option, because
 you have to perform substitution of entity definitions, so you'll have
 to either wrap it in some kind of lazy range that chains the entity
 definition to the surrounding text, or you'l have to use strings or
 something else.  Which means you'll need to have memory allocation /
 slower parsing / whatever, but that's the price of DTD support.

Which was my point. The API as-is doesn't work with DTD support for those
very reasons.

 But again, the point is, basic XML parsing (without DTD support) doesn't
 *need* to pay this price. What's currently in dxml doesn't need to
 change. DTD support can be implemented in a submodule / separate module
 that wraps around dxml and builds DTD support on top of it.

 Put another way, we can implement DTD support *on top of* dxml this way:
 - Parse the XML using dxml as an initial step (this can be done lazily,
   or semi-lazily, as needed).
 - As an intermediate step, parse the DTD section, construct whatever
   internal state is needed to handle DTD rules, a dictionary of entity
   references, etc..
 - Filter the output of dxml to insert whatever extra behaviour is needed
   to implement DTD support before handing it to the calling code, e.g.,
   expand entity references, or implement validation and throw an
   exception if validation fails, etc..

 *We don't need to change dxml's current API at all.*

I don't think that this works, because the entity references insert new XML
and thus affect the parsing. And as such, you can't simply pass through the
entity references to be processed by another parser. They need to be handled
by the core parser, otherwise it's going to give incorrect results, not just
results that need further parsing. I'm sure that dxml's internals could be
refactored so that they could be shared with another parser that did that,
but unless I'm misunderstanding how entity references work, you can't use
what's there now as-is and build another parser on top of it. The entity
reference replacement needs to happen in the core parser.

 The DTD wrapper doesn't guarantee (and doesn't need to!) to return
 slices of the input like dxml does. I don't see that as a problem, since
 I can't see how anyone would be able to implement full DTD support with
 only slices, even independently from the way dxml is implemented right
 now.

Yeah, if I were writing a parser that handled the DTD section, I wouldn't
make it deal with slices of the input like DTD does unless I decided to make
it always return string, in which case, you could get slices of the original
input for strings but no other range types - it's either that or using a
lazy range, which would be worse if you passed strings but better for other
range types. And that's the main reason that I gave up on having dxml handle
the DTD section. I consider that approach unacceptable. One of the key goals
for dxml was that it would be providing slices of the input and not lazy
ranges or allocating new strings.

In any case, unless I misunderstand how entity references work, that would
have to be its own parser and not simply a wrapper around dxml because of
how the entity references affect the parsing. If I'm wrong, then great,
someone else can come along later and add some sort of DTD parser on top of
dxml, and if I'm right, well, then anyone who wants to do anything like that
is going to need to write a new parser, but that can then coexist alongside
dxml's parser just fine. Either way, I like dxml's approach and don't want
to compromise what it's doing in an attempt to fully deal with DTDs.

- Jonathan M Davis

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, February 13, 2018 14:29:27 H. S. Teoh via Digitalmars-d-announce 
wrote:
 Given the insane complexities of DTD that I'm only slowly beginning to
 grasp from actually reading the spec, I'm quickly adopting the opinion
 that dxml should remain as-is, and any DTD implementation should be
 layered on top.  The only potential changes that might be needed is:

 - provide a way to parse XML snippets that don't have a <?xml ...>
   declaration, so that a DTD implementation could, for example, hand an
   entity body over to dxml to extract any tags that may be nested in
   there (and if my reading of section 4.3.2 is correct, all such tags
   must always be closed inside the entity body, so there should be no
   errors produced).

XML 1.0 does not require the <?xml...?> section - which is the main reason
why dxml implements XML 1.0 and not 1.1. When working on one of my projects
with std_experimental_xml, I had to keep adding the <?xml...?> declaration
to the start of XML snippets in all of my tests which had to deal with
sections of an XML document, and it was _really_ annoying.

dxml does require that what it's given be a valid XML 1.0 document, which
means that you have to have exactly one root element in what it's passed,
which does limit which kind of XML snippets you pass it, but it will work
for a lot of XML snippets as-is.

 - provide some way of hooking into non-default entities so that
   DTD-defined entities can be expanded by the DTD implementation.  This
   could be as simple as leaving such entities untouched in the returned
   range, or invent a special EntityType representing such entities (with
   a slice of the input containing the entity name) so that the DTD
   implementation can insert the replacement text.

After having actually implemented full parsing for the entire DTD section
before figuring out that references could be inserted in it just about
anywhere and that the grammar in the spec is only the grammar _after_ all of
the replacements were made (when I figured that out was when I gave up on
DTD support), I would strongly argue in favor of simply passing along entity
references as-is and leaving any and all such processing to a DTD-enabled
parser. Originally, the Config had options like SkipDTD and SkipProlog, and
I even provided a way to get at the information in the <?xml...?>
declaration if you wanted it, all that just wasn't worth the extra
complexity.

- Jonathan M Davis

Feb 13 2018

Johannes Loher <johannes.loher fg4f.de> writes:

On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis 
wrote:
 dxml 0.2.0 has now been released.
 [...]

Thank you very much for your efforts, I really appreciate it, as 
I have been looking for a decent xml library for quite some time.

Whethr or not this is a candidate for inclusion into phobos is 
certainly up for debate, but as you already mentioned several 
times, this thread is hardly the right place for that.

So instead I'd like to emphasize how much I appreciate you 
working on this and I am sure I am not the only one. This absence 
of a usable high quality xml library is/was a big problem for d 
in my opinion and it is great to see that this is finally being 
worked on :)

Feb 12 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, February 12, 2018 21:26:45 Johannes Loher via Digitalmars-d-
announce wrote:
 On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis

 wrote:
 dxml 0.2.0 has now been released.
 [...]

 Thank you very much for your efforts, I really appreciate it, as
 I have been looking for a decent xml library for quite some time.

 Whethr or not this is a candidate for inclusion into phobos is
 certainly up for debate, but as you already mentioned several
 times, this thread is hardly the right place for that.

 So instead I'd like to emphasize how much I appreciate you
 working on this and I am sure I am not the only one. This absence
 of a usable high quality xml library is/was a big problem for d
 in my opinion and it is great to see that this is finally being
 worked on :)

Thanks. When you do use it, please give feedback - particularly if you find
any problems or pain points. I definitely think that the API is solid
overall, but that doesn't mean that I got it completely right, and even with
all of the tests that I have, I could have missed something and ended up
with a bug in the parser. I'm reasonably confident in the code quality, but
that doesn't mean that I didn't miss anything.

- Jonathan M Davis

Feb 12 2018

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Monday, 12 February 2018 at 05:36:51 UTC, Jonathan M Davis 
wrote:
 dxml 0.2.0 has now been released.
 Documentation: http://jmdavisprog.com/docs/dxml/0.2.0/
 Github: https://github.com/jmdavis/dxml/tree/v0.2.0
 Dub: http://code.dlang.org/packages/dxml

 - Jonathan M Davis

This is absolutely awesome. It is a little low level (compared to 
SAX) so there is more to deal with, but having this provide a 
range (and flat) makes it so much clearer the ordering of 
elements. If I need to handle nesting then I can build that out, 
but if I don't I can just fly by the seat of my pants and grab 
the elements I want.

This will definitely be my goto for XML parsing.

Feb 23 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - dxml 0.2.0 released