www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [due diligence] std.xml

reply Justin Johansson <no spam.com> writes:
This module should be removed altogether from Phobos forthwith.

The code was obviously submitted and accepted without peer
review, either that or the peers were idiots as well.

It would be better to say that Phobos does not have an
XML library yet, and to seek submissions, rather than
maintain this piece of codswallop in the latest distribution.

Let's not even talk of deprecation.  Any D user currently
using std.xml is completely misguided.

Justin
Oct 19 2010
next sibling parent reply so <so so.do> writes:
Man, you sure getting on my nerves, on your last strike i lost it and gone  
berserk in one of your threads. Now this... You know a human being wrote  
it, have some respect until you come up with something better.

Thanks!

On Tue, 19 Oct 2010 16:06:31 +0300, Justin Johansson <no spam.com> wrote:

 This module should be removed altogether from Phobos forthwith.

 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

 It would be better to say that Phobos does not have an
 XML library yet, and to seek submissions, rather than
 maintain this piece of codswallop in the latest distribution.

 Let's not even talk of deprecation.  Any D user currently
 using std.xml is completely misguided.

 Justin

-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Oct 19 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 19 Oct 2010 09:36:59 -0400, so <so so.do> wrote:

 Man, you sure getting on my nerves, on your last strike i lost it and  
 gone berserk in one of your threads. Now this... You know a human being  
 wrote it, have some respect until you come up with something better.

 Thanks!

I agree with the sentiment that you should respect someone else's hard work. Having said that, I agree std.xml should be removed until something replaces it. Fixing bugs in it makes no sense since 1) the author no longer is around and 2) I think it has serious design flaws. Removing it has been discussed on the phobos list before. -Steve
Oct 19 2010
parent Justin Johansson <no spam.com> writes:
On 20/10/2010 12:44 AM, Steven Schveighoffer wrote:
 On Tue, 19 Oct 2010 09:36:59 -0400, so <so so.do> wrote:

 Man, you sure getting on my nerves, on your last strike i lost it and
 gone berserk in one of your threads. Now this... You know a human
 being wrote it, have some respect until you come up with something
 better.

 Thanks!

I agree with the sentiment that you should respect someone else's hard work. Having said that, I agree std.xml should be removed until something replaces it. Fixing bugs in it makes no sense since 1) the author no longer is around and 2) I think it has serious design flaws. Removing it has been discussed on the phobos list before. -Steve

I'm sorry and regret the impoliteness of my post. Please all accept my apology for any offense caused by my careless remarks. -Justin
Oct 19 2010
prev sibling next sibling parent Emil Madsen <sovende gmail.com> writes:
--0015174c0e74de307f0492f87959
Content-Type: text/plain; charset=ISO-8859-1

I agree with so, be polite, tho the code might not be as good as one would
wish.

2010/10/19 so <so so.do>

 Man, you sure getting on my nerves, on your last strike i lost it and gone
 berserk in one of your threads. Now this... You know a human being wrote it,
 have some respect until you come up with something better.

 Thanks!


 On Tue, 19 Oct 2010 16:06:31 +0300, Justin Johansson <no spam.com> wrote:

  This module should be removed altogether from Phobos forthwith.
 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

 It would be better to say that Phobos does not have an
 XML library yet, and to seek submissions, rather than
 maintain this piece of codswallop in the latest distribution.

 Let's not even talk of deprecation.  Any D user currently
 using std.xml is completely misguided.

 Justin

-- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

-- // Yours sincerely // Emil 'Skeen' Madsen --0015174c0e74de307f0492f87959 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div>I agree with so, be polite, tho the code might not be as good as one w= ould wish.</div><br><div class=3D"gmail_quote">2010/10/19 so <span dir=3D"l= tr">&lt;so so.do&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"m= argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Man, you sure getting on my nerves, on your last strike i lost it and gone = berserk in one of your threads. Now this... You know a human being wrote it= , have some respect until you come up with something better.<br> <br> Thanks!<div><div class=3D"h5"><br> <br> On Tue, 19 Oct 2010 16:06:31 +0300, Justin Johansson &lt;<a href=3D"mailto:= no spam.com" target=3D"_blank">no spam.com</a>&gt; wrote:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> This module should be removed altogether from Phobos forthwith.<br> <br> The code was obviously submitted and accepted without peer<br> review, either that or the peers were idiots as well.<br> <br> It would be better to say that Phobos does not have an<br> XML library yet, and to seek submissions, rather than<br> maintain this piece of codswallop in the latest distribution.<br> <br> Let&#39;s not even talk of deprecation. =A0Any D user currently<br> using std.xml is completely misguided.<br> <br> Justin<br> </blockquote> <br> <br></div></div><font color=3D"#888888"> -- <br> Using Opera&#39;s revolutionary e-mail client: <a href=3D"http://www.opera.= com/mail/" target=3D"_blank">http://www.opera.com/mail/</a><br> </font></blockquote></div><br><br clear=3D"all"><br>-- <br>// Yours sincere= ly<br>// Emil &#39;Skeen&#39; Madsen<br> --0015174c0e74de307f0492f87959--
Oct 19 2010
prev sibling next sibling parent reply "Yao G." <yao.gomez spam.gmail.com> writes:
On Tue, 19 Oct 2010 08:06:31 -0500, Justin Johansson <no spam.com> wrote:

 This module should be removed altogether from Phobos forthwith.

 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

I don't think that Walter or Andrei would take kindly being called like that, as they are some of the "peers" that review Phobos submissions. :p And yes, I agree that std.xml is not really good. P.D. Are you drunk? -- Yao G.
Oct 19 2010
parent Daniel Gibson <metalcaedes gmail.com> writes:
Yao G. schrieb:
 P.D. Are you drunk?
 

I already wondered about that when he posted that strange joke about fucking the D type system in a canoe.
Oct 19 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/19/10 8:06 CDT, Justin Johansson wrote:
 This module should be removed altogether from Phobos forthwith.

 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

 It would be better to say that Phobos does not have an
 XML library yet, and to seek submissions, rather than
 maintain this piece of codswallop in the latest distribution.

 Let's not even talk of deprecation. Any D user currently
 using std.xml is completely misguided.

 Justin

I haven't worked with XML all that much. Please make me understand the matter better - is std.xml's speed the only concern, or is the module generally obtuse to work with? Thanks, Andrei
Oct 19 2010
next sibling parent Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 I haven't worked with XML all that much. Please make me understand the 
 matter better - is std.xml's speed the only concern, or is the module 
 generally obtuse to work with?

It needs rewrite.
Oct 19 2010
prev sibling next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 19 Oct 2010 22:47:56 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/19/10 8:06 CDT, Justin Johansson wrote:
 This module should be removed altogether from Phobos forthwith.

 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

 It would be better to say that Phobos does not have an
 XML library yet, and to seek submissions, rather than
 maintain this piece of codswallop in the latest distribution.

 Let's not even talk of deprecation. Any D user currently
 using std.xml is completely misguided.

 Justin

I haven't worked with XML all that much. Please make me understand the matter better - is std.xml's speed the only concern, or is the module generally obtuse to work with? Thanks, Andrei

I use it, but design is bad and performance is awful.
Oct 19 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/19/10 14:30 CDT, Denis Koroskin wrote:
 On Tue, 19 Oct 2010 22:47:56 +0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 10/19/10 8:06 CDT, Justin Johansson wrote:
 This module should be removed altogether from Phobos forthwith.

 The code was obviously submitted and accepted without peer
 review, either that or the peers were idiots as well.

 It would be better to say that Phobos does not have an
 XML library yet, and to seek submissions, rather than
 maintain this piece of codswallop in the latest distribution.

 Let's not even talk of deprecation. Any D user currently
 using std.xml is completely misguided.

 Justin

I haven't worked with XML all that much. Please make me understand the matter better - is std.xml's speed the only concern, or is the module generally obtuse to work with? Thanks, Andrei

I use it, but design is bad and performance is awful.

More detail about the design please? I browsed through the code and the main issue seems to be heavy reliance on granular delegates to do pretty much anything. Would fixing that improve usability? (It would most likely improve performance.) Andrei
Oct 19 2010
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2010-10-19 21:37, Andrei Alexandrescu wrote:
 More detail about the design please? I browsed through the code and the
 main issue seems to be heavy reliance on granular delegates to do pretty
 much anything. Would fixing that improve usability? (It would most
 likely improve performance.)

 Andrei

It has a kind of annoying API: * Attributes are handled as an associative array instead of classes like the rest of the nodes * You cannot create an empty Document, you must either create one from existing XML data or create one with a root element * There is no way the access the parent of a node * No XPath These are a few I came up with for now. -- /Jacob Carlborg
Oct 19 2010
prev sibling parent reply div0 <div0 sourceforge.net> writes:
On 19/10/2010 20:37, Andrei Alexandrescu wrote:
 On 10/19/10 14:30 CDT, Denis Koroskin wrote:

 More detail about the design please? I browsed through the code and the
 main issue seems to be heavy reliance on granular delegates to do pretty
 much anything. Would fixing that improve usability? (It would most
 likely improve performance.)

 Andrei

Well one obvious problem is you have to read the document into memory first, which clearly isn't good enough for large documents. Secondly it doesn't handle xml namespaces properly. namespaces are critically important for parsing most xml documents in practice. Otherwise it should be a heavily template based design sort of like boost::spirit so you can process tags/arbitiutes as they are parsed for maximum performance. If we have XML in the std library, it's really got to be a 100% standards conformant implementation; otherwise it's a waste of space. I spend a couple of weeks writing an XML parser; but I skipped the more obtuse bits. Doing it properly is a large chunk of work and it's dull as f*ck. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Oct 19 2010
parent reply sybrandy <sybrandy gmail.com> writes:
 Well one obvious problem is you have to read the document into memory
 first, which clearly isn't good enough for large documents.

I think that depends on the type of XML library we create. A SAX library doesn't require the whole document in memory, however a DOM library typically does as, from what I can tell, they create an in-memory representation that's tree-like. If you don't read it into memory, I'm not really sure how you would be able to, for example, write XPath queries to access some random nodes that are not grouped together in a relatively efficient manner. I say relatively because yes, the memory layout can be very scattered, however it's still better than having to perform random access from disk. I guess one question we need to ask is what do we expect from this library? Do we want a full DOM implementation or is a SAX parser good enough? Or do we need something in between? In PHP or Perl, perhaps both, I saw a library where an XML document was essentially transformed into nested associative arrays. It made it very easy to read data from the XML, however I don't know how much of the official standards it complied with. The current std.xml looks like it tries to be both a DOM library and a SAX library. Personally, I'd rather break them up into two libraries, though it may make sense for the DOM library to leverage the SAX library to build up it's objects. IMHO, I love a good SAX parser. I've used them in the past and I think they work great, so having one in D I think would be ideal, especially in those situations where the XML file is essentially read-only. Do we need a DOM parser? I honestly don't know. Personally, I'd be happy with the associative array approach as it's simple. I don't need to learn a new API just to navigate through XML. Yes, I know there are advantages to using the DOM and XPath, which I also like, but for the most part, I don't need either. Of course, I personally would love to just let XML die and use better data formats, but that's an unrealistic dream :) Casey
Oct 19 2010
next sibling parent div0 <div0 sourceforge.net> writes:
On 19/10/2010 21:43, sybrandy wrote:
 Well one obvious problem is you have to read the document into memory
 first, which clearly isn't good enough for large documents.


<snip> Nobody said anything about DOM, and the current std.xml doesn't in anyway support a DOM implementation. That's a whole 'nother ball game. For my money supporting DOM is not and should not be a goal for phobos; DOM is irrelevant for the basics of XML data handling.
 Of course, I personally would love to just let XML die and use better
 data formats, but that's an unrealistic dream :)

 Casey

Damn straight. XML is a fugly crap format. But too many people have invested too much in it; and these days it is pretty much the defacto standard for a lot of data interchange and a lot of 'file formats' so ignoring it isn't really an option. .NET gives good support for XML out of the box and if phobos is going to have XML, it should do at least as well as the .NET implementation or it's not going to be realistically usable for serious use. I don't think we need to go as far as XSLT, but XML 1.1 and Namespaces is a must. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk
Oct 19 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-19 16:43:04 -0400, sybrandy <sybrandy gmail.com> said:

 I guess one question we need to ask is what do we expect from this 
 library?  Do we want a full DOM implementation or is a SAX parser good 
 enough?  Or do we need something in between?  In PHP or Perl, perhaps 
 both, I saw a library where an XML document was essentially transformed 
 into nested associative arrays.  It made it very easy to read data from 
 the XML, however I don't know how much of the official standards it 
 complied with.

Many people have different needs for XML, it's hard to come with something that pleases everyone. I might have the solution to that however: a template that makes it easy to implement any kind of parser. I've made two xml modules a little while ago. The first is a tokenizer template that can work either as a pull-parser or callback-parser, or even a mix of both, and is reentrant (you can invoke the tokenizer inside a callback to parse new tokens). The implementation has been written based on the XML spec so I'm confident that the parser is pretty much standard. In regard to the standard, the tokenizer lacks support for DTD internal subsets and user-defined character entities, and leaves some well-formness checks to the upper layers (like checking if tag name matches) where it should be less costly for those checks to happen. The second module is a basic tree model based on the tokenizer. It doesn't try to be DOM-conformant, but it shows how the tokenizer can be used and implements the higher-level well-formness checks (matching tag names). Building a SAX parser on top of the tokenizer would be a piece of cake too. It might be incomplete, but this code works: it's already in production in a small program (script?) of mine. I don't really have the time to work on it at the moment, but if anyone wants to take it and improve upon it, then it could probably become Phobos's XML parser. One thing that should be done is make the tokenizer accept ranges, something I started a couple of months ago but which I never finished. Here's the (slightly outdated) documentation. If someone wants to proceed I'll extract the code from the rest of my code and release it under the boost license. http://michelf.com/docs/d/mfr/xmltok.html http://michelf.com/docs/d/mfr/xml.html -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 19 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/19/10 17:16 CDT, Michel Fortin wrote:
 Here's the (slightly outdated) documentation. If someone wants to
 proceed I'll extract the code from the rest of my code and release it
 under the boost license.

 http://michelf.com/docs/d/mfr/xmltok.html
 http://michelf.com/docs/d/mfr/xml.html

Looks like a simple and clean API. We should be able to adopt the code into Phobos if an owner and champion is found. Andrei
Oct 19 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-19 19:28:18 -0400, "Simen kjaeraas" <simen.kjaras gmail.com> said:

 Michel Fortin <michel.fortin michelf.com> wrote:
 
 If someone wants to proceed I'll extract the code from the rest of my  
 code and release it under the boost license.

I'd love to give this a spin.

Great. I'll post that code tomorrow. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 19 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-19 21:31:04 -0400, Michel Fortin <michel.fortin michelf.com> said:

 On 2010-10-19 19:28:18 -0400, "Simen kjaeraas" <simen.kjaras gmail.com> said:
 
 Michel Fortin <michel.fortin michelf.com> wrote:
 
 If someone wants to proceed I'll extract the code from the rest of my  
 code and release it under the boost license.

I'd love to give this a spin.

Great. I'll post that code tomorrow.

Not yet tomorrow, but it's ready. Have fun. <http://michelf.com/docs/d/mfr-xml-2010-10-19.zip> I've included some notes in the archive about what each module do and what's missing. Feel free to ask if you have questions. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 19 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Michel Fortin <michel.fortin michelf.com> wrote:

 If someone wants to proceed I'll extract the code from the rest of my  
 code and release it under the boost license.

I'd love to give this a spin. -- Simen
Oct 19 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Michel Fortin <michel.fortin michelf.com> wrote:

 On 2010-10-19 21:31:04 -0400, Michel Fortin <michel.fortin michelf.com>  
 said:

 On 2010-10-19 19:28:18 -0400, "Simen kjaeraas" <simen.kjaras gmail.com>  
 said:

 Michel Fortin <michel.fortin michelf.com> wrote:

 If someone wants to proceed I'll extract the code from the rest of  
 my  code and release it under the boost license.



Not yet tomorrow, but it's ready. Have fun. <http://michelf.com/docs/d/mfr-xml-2010-10-19.zip> I've included some notes in the archive about what each module do and what's missing. Feel free to ask if you have questions.

Thank you. I'll have a look-see. -- Simen
Oct 20 2010
prev sibling parent Michael Rynn <michaelrynn optusnet.com.au> writes:
On Tue, 19 Oct 2010 21:53:56 +0200, Jacob Carlborg wrote:

 On 2010-10-19 21:37, Andrei Alexandrescu wrote:
 More detail about the design please? I browsed through the code and the
 main issue seems to be heavy reliance on granular delegates to do
 pretty much anything. Would fixing that improve usability? (It would
 most likely improve performance.)

 Andrei

It has a kind of annoying API: * Attributes are handled as an associative array instead of classes like the rest of the nodes * You cannot create an empty Document, you must either create one from existing XML data or create one with a root element * There is no way the access the parent of a node * No XPath

There is a xml parser and document structure that follows DOM 2-3 interfaces on DSource. dsource.org/projects/xmlp. It uses InputRanges to manage the parsing (sort of layered). Handles DTD . Has DOM level 2/3 interfaces (Really, so much of this XML DOM interface seems directly taken off the Java design). DOMString type can be aliased as string or wstring. Handles a lot of XML corner cases, and validates against XML test suite. Can read multiple source files in different encodings and external DTDs. The first version also handled namespaces, but I haven't checked / updated namespaces in the current version, because no-one seems interested, and so I drifted off and did other things. The resulting parser and code seems too big and messy for Phobos. I imagine people just want something that reads in already validated xml, quick and dirty. I can imagine no one would have the patience to know where to start using this , although there are some example test and validation programs. Theres also a short xpath runtime analyser based on it, and a make tool using this, XML config files and variables, to build D programs, and run commands. --- Michael.
Oct 26 2010