digitalmars.D - Status of std.xml (D2/Phobos)

Justin Johansson (6/6) Jun 27 2010 May I ask is anybody working on redeveloping std.xml in the D2/Phobos

Lutger (2/10) Jun 27 2010 Interested, very much so. I think many people are.
Simen kjaeraas (4/6) Jun 27 2010 Absolutely. It is a necessity.
Justin Johansson (21/30) Jun 27 2010 Lutger said: Interested, very much so. I think many people are.

Adam Ruppe (6/6) Jun 27 2010 I'm not terribly interested in it because I already wrote my own

Justin Johansson (18/25) Jun 27 2010 Thanks Adam for replying. I'm happy to take onboard contra-views

Adam Ruppe (7/10) Jun 27 2010 Yes, it is very simple, but so is all the XML I've ever actually

Justin Johansson (24/36) Jun 27 2010 Yeah, I understand where you are coming from; sometimes all you

Ellery Newcomer (3/23) Jun 27 2010 For the sake of us uninformed spectators, could you give a little taste

Justin Johansson (31/62) Jun 28 2010 Writing an XML parser in itself is pretty much basic CS101 stuff. The

Ellery Newcomer (3/14) Jun 29 2010 Sounds ominous. Like you'd need a serious team if you actually wanted to...

Justin Johansson (8/25) Jun 29 2010 Yep, a serious team armed with a serious programming language that

Andrei Alexandrescu (23/38) Jun 27 2010 Clearly std.xml can't stay the way it is. I'm even thinking of removing

Justin Johansson (6/52) Jun 27 2010 Thanks Andrei et. al. I'll get back to the topic after some sleep
Sean Kelly (2/21) Jun 27 2010 I'd like to cast a vote for a SAX-style parser. A DOM parser can be bui...
Justin Johansson (22/50) Jun 29 2010 Others in this thread have suggested a preemptive removal of the current

Lutger (10/32) Jun 27 2010 Well I dare only speak for myself. But perhaps I do can better:

Justin Johansson (10/46) Jun 27 2010 Yes, understand.

Lutger (7/14) Jun 27 2010 Don't know really, sorry. From a quick glance though, I would say to sta...

Jesse Phillips (7/13) Jun 27 2010 XMLP_101327.html#N101327

Nick Sabalausky (7/27) Jun 27 2010 I'm interested. I've been thinking of porting some of my stuff from D1/T...

Jacob Carlborg (6/12) Jun 27 2010 I would very much have XML support in Phobos, I think all standard
jpf (7/16) Jun 27 2010 I would also like to have a better std.xml. I still have an old D1/Tango
Yao G. (10/17) Jun 27 2010 I did a simple implementation of a pull parser, using this API as

Steven Schveighoffer (22/29) Jun 28 2010 Did you look at Tango's code in question, or look at their documentation...

Alix Pexton (35/65) Jun 28 2010 I've not looked at any of the D XML offerings (shame on me?) but I've

Steven Schveighoffer (14/24) Jun 28 2010 DOM is usually built on top of SAX, so start with the lowest common

Alix Pexton (6/10) Jun 29 2010 I've been thinking about it, and while I believe you when you say that

Michel Fortin (24/37) Jun 29 2010 It is closer to the metal, but there's a catch...

Alix Pexton (18/52) Jun 29 2010 My understanding was that SAX _doesn't_ check those things either and

BLS (6/16) Jun 28 2010 Hi Steve,

Bernard Helyer (3/12) Jun 27 2010 std.xml needs to be replaced, but I personally don't much care as kxml
Joe Hildebrand (6/10) Jun 28 2010 Agree, with one additional requirement: the ability to throw random chun...
Michel Fortin (21/27) Jun 28 2010 I have made my own parser, comprised of a tokenizer and a mini DOM layer...

Andrei Alexandrescu (6/36) Jun 28 2010 I think a tokenizer should be a higher-order range that is fed an input

Michel Fortin (24/35) Jun 28 2010 And I've implemented a tokenizer range just like you describe on top of

lurker (3/3) Jun 28 2010 I'm very interested.

Justin Johansson <no spam.com> writes:

May I ask is anybody working on redeveloping std.xml in the D2/Phobos 
library?  (Currently it looks like it needs to be started over from scratch)

Also what is the level of interest from library users for decent XML 
support in D2/Phobos?

Cheers
Justin Johansson

Jun 27 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:

 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
 library?  (Currently it looks like it needs to be started over from scratch)
 
 Also what is the level of interest from library users for decent XML
 support in D2/Phobos?
 
 Cheers
 Justin Johansson

Interested, very much so. I think many people are.

Jun 27 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Justin Johansson <no spam.com> wrote:
 Also what is the level of interest from library users for decent XML  
 support in D2/Phobos?

Absolutely. It is a necessity.

-- 
Simen

Jun 27 2010

Justin Johansson <no spam.com> writes:

Justin Johansson wrote:
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos 
 library?  (Currently it looks like it needs to be started over from 
 scratch)
 
 Also what is the level of interest from library users for decent XML 
 support in D2/Phobos?
 
 Cheers
 Justin Johansson

Lutger said: Interested, very much so. I think many people are.

Simen said: Absolutely. It is a necessity.

Thanks for fast replies from Lutger and Simen.

Being an XML/W3C addict myself I well concur with Simen and Lutger's 
sentiments.

However, Lutger simply saying that he "thinks" *many* people are
interested is not good enough for me.

Would the *many* people also please add a voice along with Simen et. al.
to inspire me to contribute to the D2 XML effort (of course under a
Walter-endorsed style of licence).

Brief about me:
I believe I possess the skills and experience in XML/XSLT & other
W3C stuff together with 20+ years as a C++ developer to contribute
good, peer-reviewable work;
It's just that I need inspiration to put this task into action.
The only other thing is that any offer on my part is conditional upon
obtaining a "sabbatical break", hopefully in the next month or so
to be able to put in the time to make it happen.

Cheers
Justin Johansson

Jun 27 2010

Adam Ruppe <destructionator gmail.com> writes:

I'm not terribly interested in it because I already wrote my own
replacement: http://arsdnet.net/dcode/dom.d

Mine is biased toward HTML, doing what I personally find useful, or
mimicing what javascript in the browser would do instead of following
the standard, but if there's anything in there that is useful to
others, you're free to take it.

Jun 27 2010

Justin Johansson <no spam.com> writes:

Adam Ruppe wrote:
 I'm not terribly interested in it because I already wrote my own
 replacement: http://arsdnet.net/dcode/dom.d
 
 Mine is biased toward HTML, doing what I personally find useful, or
 mimicing what javascript in the browser would do instead of following
 the standard, but if there's anything in there that is useful to
 others, you're free to take it.

Thanks Adam for replying.  I'm happy to take onboard contra-views
such as yours as well.  Naturally it is no point in putting in an
effort wherein there is no interest at large.

Still, I'll wait for more replies on this ng before making any
decision whether or not to commit myself to a new "D2 XML" effort.

btw. I feel it fair to add conjecture that a DOM implementation
is pretty basic stuff and that a complete XML ecosystem it much
larger than just this (i.e. an in-memory DOM).  There are all
sorts of abstractions (Andrei read ranges) and modeling that
would form part of what I believe would be a major work, and
one possibly even bigger than what one person like myself could
ever hope to achieve.

Of course, the mammoth effort by Michael Kay in producing the
Saxon (Java-based) XSLT processor is a feat that few others
will ever overshadow.

Cheers
Justin Johansson

Jun 27 2010

Adam Ruppe <destructionator gmail.com> writes:

On 6/27/10, Justin Johansson <no spam.com> wrote:
 btw. I feel it fair to add conjecture that a DOM implementation
 is pretty basic stuff and that a complete XML ecosystem it much
 larger than just this (i.e. an in-memory DOM).

Yes, it is very simple, but so is all the XML I've ever actually
encountered. I've seen ugly, convoluted HTML and I've seen name/value
pairs in verbose XML format, but very very little in the middle.
(Heck, I just used std.string.indexOf("<tagname") for quite a while.)

This is probably due to my observation bias, with all my XML
experience coming from working with web services.

Jun 27 2010

Justin Johansson <no spam.com> writes:

Adam Ruppe wrote:
 On 6/27/10, Justin Johansson <no spam.com> wrote:
 btw. I feel it fair to add conjecture that a DOM implementation
 is pretty basic stuff and that a complete XML ecosystem it much
 larger than just this (i.e. an in-memory DOM).

 
 Yes, it is very simple, but so is all the XML I've ever actually
 encountered. I've seen ugly, convoluted HTML and I've seen name/value
 pairs in verbose XML format, but very very little in the middle.
 (Heck, I just used std.string.indexOf("<tagname") for quite a while.)
 
 This is probably due to my observation bias, with all my XML
 experience coming from working with web services.

Yeah, I understand where you are coming from; sometimes all you
need is some simple DOM stuff which you can hack out yourself in
a few hours.

OTOH, there are some really significant W3C specs that you may
or may not be aware of and these are really difficult to implement
in regular imperative languages like C/C++ and Java.  Java,
being all that is the following of Java I guess, has had the most
success in implementing these specs.

IMHO, the two most fundamental and significant W3C specs that
D libraries could well address are as follows.  These form a
large amount of the (formal) XML ecosystem.

XML Schema Part 2: Datatypes Second Edition
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

and

XQuery 1.0 and XPath 2.0 Data Model (XDM)
http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/

I can tell you for sure that XPath 2.0, which is the basis
for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement
in languages like C++ and Java.  Others have succeeded with
implementations in languages like Eiffel.  I would hope though,
that D2 would be up to the task (is that is wishful thinking?).

Cheers
Justin Johansson

Jun 27 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 06/27/2010 10:16 AM, Justin Johansson wrote:
 OTOH, there are some really significant W3C specs that you may
 or may not be aware of and these are really difficult to implement
 in regular imperative languages like C/C++ and Java. Java,
 being all that is the following of Java I guess, has had the most
 success in implementing these specs.

 IMHO, the two most fundamental and significant W3C specs that
 D libraries could well address are as follows. These form a
 large amount of the (formal) XML ecosystem.

 XML Schema Part 2: Datatypes Second Edition
 http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

 and

 XQuery 1.0 and XPath 2.0 Data Model (XDM)
 http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/

 I can tell you for sure that XPath 2.0, which is the basis
 for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement
 in languages like C++ and Java. Others have succeeded with
 implementations in languages like Eiffel. I would hope though,
 that D2 would be up to the task (is that is wishful thinking?).

 Cheers
 Justin Johansson

For the sake of us uninformed spectators, could you give a little taste 
of the challenges to which you refer?

Jun 27 2010

Justin Johansson <no spam.com> writes:

Ellery Newcomer wrote:
 On 06/27/2010 10:16 AM, Justin Johansson wrote:
 OTOH, there are some really significant W3C specs that you may
 or may not be aware of and these are really difficult to implement
 in regular imperative languages like C/C++ and Java. Java,
 being all that is the following of Java I guess, has had the most
 success in implementing these specs.

 IMHO, the two most fundamental and significant W3C specs that
 D libraries could well address are as follows. These form a
 large amount of the (formal) XML ecosystem.

 XML Schema Part 2: Datatypes Second Edition
 http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

 and

 XQuery 1.0 and XPath 2.0 Data Model (XDM)
 http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/

 I can tell you for sure that XPath 2.0, which is the basis
 for XSLT 2.0 and XQuery 1.0, is truly a challenge to implement
 in languages like C++ and Java. Others have succeeded with
 implementations in languages like Eiffel. I would hope though,
 that D2 would be up to the task (is that is wishful thinking?).

 Cheers
 Justin Johansson

 
 For the sake of us uninformed spectators, could you give a little taste 
 of the challenges to which you refer?

Writing an XML parser in itself is pretty much basic CS101 stuff.  The
tough challenges come with implementing the other W3C specs in the XML
ecosystem, such as XSchema and XPath 2.0 for reason that these are such
humongous and complex beasts.

An XSchema implementation forms the basis for writing an XML content
validator and that's a pretty important tool to have for a lot of
XML processing.

An XPath 2.0 implementation forms the core of XSLT 2.0 and XQuery which
are XML transformation languages.  Again these are very useful tools.

The most successful implementations of XSchema and XPath 2.0 are written
in Java.  This is probably mostly due to the widespread popularity of
Java and there being very many open source volunteers to do the grunt.

If you look at any of the Java sources for these XML projects, you will
be astounded just how big they are, like the Saxon Java XSLT processor
by Michael Kay for example*.  Of course you will be secretly thinking to
yourself that the size these works would be considerably smaller if they
were written in D :-)

(*Michael Kay has spent the last ten years working on it.)

In the C++ world of Qt, there is the Qt XmlPatterns library which
implements XPath 2.0 which is also quite sizable and currently
incomplete (implementing only about 70% of the W3C spec) and there
are a whole bunch of (former TrollTech?) people at Nokia working on it,
again demonstrating that implementing these W3C specs is no simple feat.

If you are really interested, try downloading a copy of the Qt source
from Nokia and take a look at the C++ code in the XmlPatterns library.
 From that you will surely get more than just a taste of the challenges,
you will get a whole mouthful!  :-)

http://qt.nokia.com/downloads/

Cheers
Justin Johansson

Jun 28 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 06/28/2010 08:13 AM, Justin Johansson wrote:
 If you look at any of the Java sources for these XML projects, you will
 be astounded just how big they are, like the Saxon Java XSLT processor
 by Michael Kay for example*. Of course you will be secretly thinking to
 yourself that the size these works would be considerably smaller if they
 were written in D :-)

 (*Michael Kay has spent the last ten years working on it.)

 In the C++ world of Qt, there is the Qt XmlPatterns library which
 implements XPath 2.0 which is also quite sizable and currently
 incomplete (implementing only about 70% of the W3C spec) and there
 are a whole bunch of (former TrollTech?) people at Nokia working on it,
 again demonstrating that implementing these W3C specs is no simple feat.

Sounds ominous. Like you'd need a serious team if you actually wanted to 
do any of this stuff.

Jun 29 2010

Justin Johansson <no spam.com> writes:

Ellery Newcomer wrote:
 On 06/28/2010 08:13 AM, Justin Johansson wrote:
 If you look at any of the Java sources for these XML projects, you will
 be astounded just how big they are, like the Saxon Java XSLT processor
 by Michael Kay for example*. Of course you will be secretly thinking to
 yourself that the size these works would be considerably smaller if they
 were written in D :-)

 (*Michael Kay has spent the last ten years working on it.)

 In the C++ world of Qt, there is the Qt XmlPatterns library which
 implements XPath 2.0 which is also quite sizable and currently
 incomplete (implementing only about 70% of the W3C spec) and there
 are a whole bunch of (former TrollTech?) people at Nokia working on it,
 again demonstrating that implementing these W3C specs is no simple feat.

 
 Sounds ominous. Like you'd need a serious team if you actually wanted to 
 do any of this stuff.


Yep, a serious team armed with a serious programming language that
is capable of realizing an event horizon to explode the singularity
that is the black hole of the W3C specs and ultimately creating
works of shear beauty et ordo ab chao (and order out of chaos).

D2? :-)

Cheers
Justin Johansson

Jun 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Justin Johansson wrote:
 Adam Ruppe wrote:
 I'm not terribly interested in it because I already wrote my own
 replacement: http://arsdnet.net/dcode/dom.d

 Mine is biased toward HTML, doing what I personally find useful, or
 mimicing what javascript in the browser would do instead of following
 the standard, but if there's anything in there that is useful to
 others, you're free to take it.

 
 Thanks Adam for replying.  I'm happy to take onboard contra-views
 such as yours as well.  Naturally it is no point in putting in an
 effort wherein there is no interest at large.
 
 Still, I'll wait for more replies on this ng before making any
 decision whether or not to commit myself to a new "D2 XML" effort.

Clearly std.xml can't stay the way it is. I'm even thinking of removing 
it preemptively in wait for another implementation.

If you want to work on something you enjoy, it seems like std.xml is a 
good choice. If you want to work on the top most important item, 
probably networking would come ahead. We badly need http and ftp 
streaming libraries. I'm thinking libcurl would be a good choice as a 
backend (not interface). For D integration, it would be great to 
integrate networking with std.stdio.File - e.g. creating 
File("http://xyz.org") would just connect to the thing and allow 
streaming, ranges, everything. Adam Ruppe has a lower-level networking 
protocol that also hooks into std.stdio.File, which would be very 
important to have too.

But then it's often better to work on what you like, so don't look for a 
landslide vote. Ford didn't work on a faster horse etc. Some things that 
would be good to have in an xml library:

- should work with input ranges (not only strings)

- use aliases as lambdas if needed (std.xml's use of lambdas is nice, 
just very slow)

- define templates for char, wchar, and dchar and then define one 
working with ranges of ubyte that dispatches depending on the encoding 
tag found.


Andrei

Jun 27 2010

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu wrote:
 Justin Johansson wrote:
 Adam Ruppe wrote:
 I'm not terribly interested in it because I already wrote my own
 replacement: http://arsdnet.net/dcode/dom.d

 Mine is biased toward HTML, doing what I personally find useful, or
 mimicing what javascript in the browser would do instead of following
 the standard, but if there's anything in there that is useful to
 others, you're free to take it.

 Thanks Adam for replying.  I'm happy to take onboard contra-views
 such as yours as well.  Naturally it is no point in putting in an
 effort wherein there is no interest at large.

 Still, I'll wait for more replies on this ng before making any
 decision whether or not to commit myself to a new "D2 XML" effort.

 
 Clearly std.xml can't stay the way it is. I'm even thinking of removing 
 it preemptively in wait for another implementation.
 
 If you want to work on something you enjoy, it seems like std.xml is a 
 good choice. If you want to work on the top most important item, 
 probably networking would come ahead. We badly need http and ftp 
 streaming libraries. I'm thinking libcurl would be a good choice as a 
 backend (not interface). For D integration, it would be great to 
 integrate networking with std.stdio.File - e.g. creating 
 File("http://xyz.org") would just connect to the thing and allow 
 streaming, ranges, everything. Adam Ruppe has a lower-level networking 
 protocol that also hooks into std.stdio.File, which would be very 
 important to have too.
 
 But then it's often better to work on what you like, so don't look for a 
 landslide vote. Ford didn't work on a faster horse etc. Some things that 
 would be good to have in an xml library:
 
 - should work with input ranges (not only strings)
 
 - use aliases as lambdas if needed (std.xml's use of lambdas is nice, 
 just very slow)
 
 - define templates for char, wchar, and dchar and then define one 
 working with ranges of ubyte that dispatches depending on the encoding 
 tag found.
 
 
 Andrei

Thanks Andrei et. al.  I'll get back to the topic after some sleep
and another day at the office tomorrow; it's way after the witching
hour now in my neck of the woods.

Cheers,
Justin

Jun 27 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:

 Justin Johansson wrote:
 Adam Ruppe wrote:
 I'm not terribly interested in it because I already wrote my own
 replacement: http://arsdnet.net/dcode/dom.d

 Mine is biased toward HTML, doing what I personally find useful, or
 mimicing what javascript in the browser would do instead of following
 the standard, but if there's anything in there that is useful to
 others, you're free to take it.

 
 Thanks Adam for replying.  I'm happy to take onboard contra-views
 such as yours as well.  Naturally it is no point in putting in an
 effort wherein there is no interest at large.
 
 Still, I'll wait for more replies on this ng before making any
 decision whether or not to commit myself to a new "D2 XML" effort.

 
 Clearly std.xml can't stay the way it is. I'm even thinking of removing 
 it preemptively in wait for another implementation.

I'd like to cast a vote for a SAX-style parser.  A DOM parser can be built on
top of it, and frankly, a SAX parser the only kind I'd ever use.  I'm either
working with large streams where building a tree is impractical, or performance
is enough of an issue that again, building a tree is impractical.  I have
similar feelings about the JSON parser despite it being a pretty solid
implementation otherwise.  I'd contribute one if I could, but I did one for
work and it just isn't worth the administrative hassle.

Jun 27 2010

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu wrote:
 Justin Johansson wrote:
 Clearly std.xml can't stay the way it is. I'm even thinking of removing 
 it preemptively in wait for another implementation.

Others in this thread have suggested a preemptive removal of the current
std.xml incarnation also.  Please add my vote to this in agreement that
it *must* go.  It's current state is well beyond absolutely shocking and
only serves to bring D into disrepute.  It would be much better to say,
"sorry, D does not have a standard XML library yet"*** and ask for help
rather than leaving things as they are.

***Reminds me of an old saying, "it's better to keep your mouth shut
and appear to be an idiot than to open it and remove all doubt".
( Of course, I do confess to opening mine once or twice too often ;-) )
Translated to std.xml, this means better not to have it at all rather
than have what we currently have.


 If you want to work on something you enjoy, it seems like std.xml is a 
 good choice. If you want to work on the top most important item, 
 probably networking would come ahead. We badly need http and ftp 
 streaming libraries. I'm thinking libcurl would be a good choice as a 
 backend (not interface). For D integration, it would be great to 
 integrate networking with std.stdio.File - e.g. creating 
 File("http://xyz.org") would just connect to the thing and allow 
 streaming, ranges, everything. Adam Ruppe has a lower-level networking 
 protocol that also hooks into std.stdio.File, which would be very 
 important to have too.

Sure I do enjoy XML ecosystem stuff and by that I mean well beyond
just simple parsing.  OTOH, streaming libraries for http and ftp
are an absolute necessity to underpin industrial strength all things
both Unicode and XML so I can see why you put this at the top of the
wish list.  Robust streaming should have support not only for all
popular network protocols but content (character) encodings as well.
Since this thread has promoted a lot of ideas about std.xml, I think
we would do well to start a new thread on streaming.

Cheers
Justin Johansson


 But then it's often better to work on what you like, so don't look for a 
 landslide vote. Ford didn't work on a faster horse etc. Some things that 
 would be good to have in an xml library:
 
 - should work with input ranges (not only strings)
 
 - use aliases as lambdas if needed (std.xml's use of lambdas is nice, 
 just very slow)
 
 - define templates for char, wchar, and dchar and then define one 
 working with ranges of ubyte that dispatches depending on the encoding 
 tag found.
 
 
 Andrei

Jun 29 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:

 Justin Johansson wrote:
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
 library?  (Currently it looks like it needs to be started over from
 scratch)
 
 Also what is the level of interest from library users for decent XML
 support in D2/Phobos?
 
 Cheers
 Justin Johansson

 
 Lutger said: Interested, very much so. I think many people are.
 
 Simen said: Absolutely. It is a necessity.
 
 Thanks for fast replies from Lutger and Simen.
 
 Being an XML/W3C addict myself I well concur with Simen and Lutger's
 sentiments.
 
 However, Lutger simply saying that he "thinks" *many* people are
 interested is not good enough for me.

Well I dare only speak for myself. But perhaps I do can better:

Time and again complaints about std.xml surface, so that is one indicator. See 
this google query: 
http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml

And this proposal to replace std.xml with kxml: 
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=109646

Another point to consider is that dsource.org alone contains at least 5 xml 
projects (that I can see), some long abandoned but some seem to be active.
There 
is also this code from Adam Ruppe: http://arsdnet.net/dcode/dom.d

Jun 27 2010

Justin Johansson <no spam.com> writes:

Lutger wrote:
Justin Johansson wrote:

Justin Johansson wrote:
May I ask is anybody working on redeveloping std.xml in the D2/Phobos
library? (Currently it looks like it needs to be started over from
scratch)

Also what is the level of interest from library users for decent XML
support in D2/Phobos?

Cheers
Justin Johansson

Lutger said: Interested, very much so. I think many people are.

Simen said: Absolutely. It is a necessity.

Thanks for fast replies from Lutger and Simen.

Being an XML/W3C addict myself I well concur with Simen and Lutger's
sentiments.

However, Lutger simply saying that he "thinks" *many* people are
interested is not good enough for me.

Well I dare only speak for myself. But perhaps I do can better:

Time and again complaints about std.xml surface, so that is one indicator. See
this google query:
http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml

And this proposal to replace std.xml with kxml:
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=109646

Another point to consider is that dsource.org alone contains at least 5 xml
projects (that I can see), some long abandoned but some seem to be active.
There
is also this code from Adam Ruppe: http://arsdnet.net/dcode/dom.d

Yes, understand.

One wonders how many people have been put off D (and perhaps not to
return) given such a large volume of hits under that URL, namely
http://www.google.nl/search?q=site%3Awww.digitalmars.com%2Fpnews+std.xml

On your other point about the 5+ XML projects on dsource.org, in your
opinion, which of these do you think have the most promise, or at least
a good grounding from which to start over?

Naturally I don't expect you to waste your time looking into these 5+
project; just if you happen to know.

Jun 27 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:
...
 
 On your other point about the 5+ XML projects on dsource.org, in your
 opinion, which of these do you think have the most promise, or at least
 a good grounding from which to start over?
 
 Naturally I don't expect you to waste your time looking into these 5+
 project; just if you happen to know.

Don't know really, sorry. From a quick glance though, I would say to start 
looking into xmlp and Adam Ruppe's code. xmlp even has conformance tests, 
perhaps you can work together with the author? 

http://www.dsource.org/projects/xmlp
http://www.digitalmars.com/d/archives/digitalmars/D/XMLP_101327.html#N101327

Jun 27 2010

Jesse Phillips <jessekphillips+D gmail.com> writes:

On Sun, 27 Jun 2010 16:55:56 +0200, Lutger wrote:

 Don't know really, sorry. From a quick glance though, I would say to
 start looking into xmlp and Adam Ruppe's code. xmlp even has conformance
 tests, perhaps you can work together with the author?
 
 http://www.dsource.org/projects/xmlp
 http://www.digitalmars.com/d/archives/digitalmars/D/

XMLP_101327.html#N101327

I needed a simple library for parsing XML and started using xmlp since I 
had too many workarounds in std.xml.

I emailed the author awhile back about some possible changes (complained 
about namespaces when reading, I believe, a docx file. So I disabled the 
check). But I haven't heard back from him.

Jun 27 2010

"Nick Sabalausky" <a a.a> writes:

"Justin Johansson" <no spam.com> wrote in message 
news:i07jpn$tt$1 digitalmars.com...
 Justin Johansson wrote:
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos 
 library?  (Currently it looks like it needs to be started over from 
 scratch)

 Also what is the level of interest from library users for decent XML 
 support in D2/Phobos?

 Cheers
 Justin Johansson

 Lutger said: Interested, very much so. I think many people are.

 Simen said: Absolutely. It is a necessity.

 Thanks for fast replies from Lutger and Simen.

 Being an XML/W3C addict myself I well concur with Simen and Lutger's 
 sentiments.

 However, Lutger simply saying that he "thinks" *many* people are
 interested is not good enough for me.

 Would the *many* people also please add a voice along with Simen et. al.
 to inspire me to contribute to the D2 XML effort (of course under a
 Walter-endorsed style of licence).

I'm interested. I've been thinking of porting some of my stuff from D1/Tango 
to D2/Phobos (*not* because of political reasons or anything against Tango 
or any Tango team member), so anything that I'm using from Tango that 
doesn't have a good Phobos equivilent would be a roadblock. XML (reading) is 
one of those things.

Jun 27 2010

Jacob Carlborg <doob me.com> writes:

On 2010-06-27 12:34, Justin Johansson wrote:
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
 library? (Currently it looks like it needs to be started over from scratch)

 Also what is the level of interest from library users for decent XML
 support in D2/Phobos?

 Cheers
 Justin Johansson

I would very much have XML support in Phobos, I think all standard 
libraries should have that. I'm currently working on porting the XML 
archive of my serialization library to D2, so I need XML support in Phobos.

-- 
/Jacob Carlborg

Jun 27 2010

jpf <spam example.com> writes:

On 27.06.2010 12:34, Justin Johansson wrote:
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
 library?  (Currently it looks like it needs to be started over from
 scratch)
 
 Also what is the level of interest from library users for decent XML
 support in D2/Phobos?
 
 Cheers
 Justin Johansson

I would also like to have a better std.xml. I still have an old D1/Tango
project I wanted to port to D2/Phobos but as long as there is no good
XML library for D2 (and totally unrelated: a stable network api) that
wouldn't make sense.


-- 
Johannes Pfau

Jun 27 2010

"Yao G." <nospamyao gmail.com> writes:

I did a simple implementation of a pull parser, using this API as  
reference: http://xmlpull.org/

But I used a iterator similar to the one used by Steve (from dcollections)  
to parse the doc. It turns out that Tango did something similar first  
(using iterator to parse the document), and seeing the debacle caused by  
the Date module, I think it would be a bad idea to release it.


Yao G.


On Sun, 27 Jun 2010 05:34:30 -0500, Justin Johansson <no spam.com> wrote:

 May I ask is anybody working on redeveloping std.xml in the D2/Phobos  
 library?  (Currently it looks like it needs to be started over from  
 scratch)

 Also what is the level of interest from library users for decent XML  
 support in D2/Phobos?

 Cheers
 Justin Johansson


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Jun 27 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospamyao gmail.com> wrote:

 I did a simple implementation of a pull parser, using this API as  
 reference: http://xmlpull.org/

 But I used a iterator similar to the one used by Steve (from  
 dcollections) to parse the doc. It turns out that Tango did something  
 similar first (using iterator to parse the document), and seeing the  
 debacle caused by the Date module, I think it would be a bad idea to  
 release it.

Did you look at Tango's code in question, or look at their documentation?   
If not, then you are fine.

I think any implementation is going to have to at least try to use ranges  
or show why they are not a good idea for xml, since Andrei is set on using  
ranges for everything.

BTW, I've not used std.xml or tango's xml, but I agree that an xml library  
is a very important part of today's standard libraries.  Having xml in the  
standard allows for so much usage of it in many other places  
(serialization comes to mind immediately).  If std.xml is bad (which I've  
heard from several independent people), then throw it out and make  
something new.

I myself have tried to think of how xml can be done with ranges, but I  
believe one of the key elements is it has to parse xml without loading the  
entire document to be efficient enough for some applications.  A DOM style  
parser which presents a range interface is probably fine, but a lazy  
interface would be the best.  Since XML is a tree style, you need a range  
which allows moving down the tree.  You almost need a stacking range which  
can move down the tree and also to the next sibling element.  Ideally, the  
library should do as much as possible without allocating anything but  
buffer space to read data.

-Steve

Jun 28 2010

Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:

On 28/06/2010 13:04, Steven Schveighoffer wrote:
 On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospamyao gmail.com> wrote:

 I did a simple implementation of a pull parser, using this API as
 reference: http://xmlpull.org/

 But I used a iterator similar to the one used by Steve (from
 dcollections) to parse the doc. It turns out that Tango did something
 similar first (using iterator to parse the document), and seeing the
 debacle caused by the Date module, I think it would be a bad idea to
 release it.

 Did you look at Tango's code in question, or look at their
 documentation? If not, then you are fine.

 I think any implementation is going to have to at least try to use
 ranges or show why they are not a good idea for xml, since Andrei is set
 on using ranges for everything.

 BTW, I've not used std.xml or tango's xml, but I agree that an xml
 library is a very important part of today's standard libraries. Having
 xml in the standard allows for so much usage of it in many other places
 (serialization comes to mind immediately). If std.xml is bad (which I've
 heard from several independent people), then throw it out and make
 something new.

 I myself have tried to think of how xml can be done with ranges, but I
 believe one of the key elements is it has to parse xml without loading
 the entire document to be efficient enough for some applications. A DOM
 style parser which presents a range interface is probably fine, but a
 lazy interface would be the best. Since XML is a tree style, you need a
 range which allows moving down the tree. You almost need a stacking
 range which can move down the tree and also to the next sibling element.
 Ideally, the library should do as much as possible without allocating
 anything but buffer space to read data.

 -Steve

I've not looked at any of the D XML offerings (shame on me?) but I've 
been having a bit of a look at the types of API that are available in 
other languages, and there seems to be 3...

Event based a la SAX

Stream based a la StAX

Tree based a la "the" DOM

The simple conclusion that I have drawn is that the is no 
one-size-fits-all solution, and that it would therefore be a mistake to 
put all effort into supporting only one. (However, ranges do seem to 
match up quite nicely with the way that the Stream based APIs operate.)

It would seem to me most logical to consider the many varied use-cases 
and build a core API upon which all 3 types of XML processor can be 
built (or at least specify a core set of types to be used by all 3), 
rather than focus on implementing one particular style. Interoperability 
of all 3 styles would then be possible and perhaps facilitate the later 
implementation of higher abstractions (such as XPath and XQuery).

I think it is also important to remember that there are at least 4 
different stages to processing XML (reading, validating, mutating, 
writing) and that many programming tasks allow one or more of these 
aspects to be ignored. This can mean that one programmer is blinded to 
the requirements of another in a different domain because the ways in 
which they work with XML either overlap only partially or not at all.

I've never used anything like SAX myself, though I have used the DOM 
quite a lot, and spent most of the time wishing it worked a bit more 
like StAX (even though I hadn't heard of StAX at the time ^^).

What ever is done for D, it should allow programmers to work with XML in 
a way that is familiar to them and compatible with what others do. 
Memory should be used conservatively, and reprocessing (parsing the same 
portion of a document multiple times) should be minimised.

Most importantly, the implementation should be D-ey, rather that the 
abstraction used in any other language's most favoured solution, 
shoehorned into a D-shaped box.

A...
(whose 2 cents are worth no more or no less than anyone else's.)

Jun 28 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 28 Jun 2010 09:59:45 -0400, Alix Pexton  
<alix.DOT.pexton gmail.dot.com> wrote:

 I've never used anything like SAX myself, though I have used the DOM  
 quite a lot, and spent most of the time wishing it worked a bit more  
 like StAX (even though I hadn't heard of StAX at the time ^^).

DOM is usually built on top of SAX, so start with the lowest common  
denominator.

 What ever is done for D, it should allow programmers to work with XML in  
 a way that is familiar to them and compatible with what others do.  
 Memory should be used conservatively, and reprocessing (parsing the same  
 portion of a document multiple times) should be minimised.

Parsing multiple times should be minimized, but more important than that,  
allocations should be minimal.  Nothing kills a good parsing/input  
algorithm's performance in D than overuse of the GC.  Tango goes as far as  
having you pass in stack buffers to avoid even allocating buffers (not  
sure about it's xml lib, but knowing the rest of the lib, probably), I  
don't think std.xml has to go that far.

 Most importantly, the implementation should be D-ey, rather that the  
 abstraction used in any other language's most favoured solution,  
 shoehorned into a D-shaped box.

Yes, I don't think the phobos solution needs to mimic exactly the API of  
SAX or DOM, the author should be free to use D idioms.  But starting with  
a common proven design is probably a good idea.

-Steve

Jun 28 2010

Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:

On 28/06/2010 15:11, Steven Schveighoffer wrote:

 Yes, I don't think the phobos solution needs to mimic exactly the API of
 SAX or DOM, the author should be free to use D idioms. But starting with
 a common proven design is probably a good idea.

 -Steve

I've been thinking about it, and while I believe you when you say that 
SAX can be used to build the DOM, I'm not convinced that SAX is the 
lowest common abstraction.

Michel Fortin's Tokenizer/Range seems much closer to the metal to me.

A...

Jun 29 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-29 04:41:50 -0400, Alix Pexton <alix.DOT.pexton gmail.DOT.com> said:

 On 28/06/2010 15:11, Steven Schveighoffer wrote:
 
 Yes, I don't think the phobos solution needs to mimic exactly the API of
 SAX or DOM, the author should be free to use D idioms. But starting with
 a common proven design is probably a good idea.
 
 -Steve

 
 I've been thinking about it, and while I believe you when you say that 
 SAX can be used to build the DOM, I'm not convinced that SAX is the 
 lowest common abstraction.
 
 Michel Fortin's Tokenizer/Range seems much closer to the metal to me.

It is closer to the metal, but there's a catch...

One issue with SAX is that you must allocate an array of strings to 
pass the attributes of an element, which is probably going to need a 
dynamic allocation at some point. A lower-level abstraction such as 
mine (or Tango's pull-parser) just returns each attribute as a separate 
token as it parses them.

The downside of the tokenizer interface is that it only checks for a 
subset of well-formness, for instance it doesn't check that tags 
balance each other correctly or that there is no two attributes with 
the same name. It's just a "tokenizer" after all, it can't be described 
as a conformant XML parser by itself. The upper layer parser needs to 
check for these things. My mini DOM built on this tokenizer does these 
checks when using the tokenizer, and it's more efficient to do them 
there because that's where the context information is kept, which is 
why the tokenizer doesn't do them.

Implementing SAX on top of my tokenizer consists mostly of ensuring 
proper tag balancing, checking for duplicate attributes, and collecting 
attributes in an array (or another kind of list) you can then give to 
the openElement SAX callback.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 29 2010

Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:

On 29/06/2010 13:27, Michel Fortin wrote:
 On 2010-06-29 04:41:50 -0400, Alix Pexton
 <alix.DOT.pexton gmail.DOT.com> said:

 On 28/06/2010 15:11, Steven Schveighoffer wrote:

 Yes, I don't think the phobos solution needs to mimic exactly the API of
 SAX or DOM, the author should be free to use D idioms. But starting with
 a common proven design is probably a good idea.

 -Steve

 I've been thinking about it, and while I believe you when you say that
 SAX can be used to build the DOM, I'm not convinced that SAX is the
 lowest common abstraction.

 Michel Fortin's Tokenizer/Range seems much closer to the metal to me.

 It is closer to the metal, but there's a catch...

 One issue with SAX is that you must allocate an array of strings to pass
 the attributes of an element, which is probably going to need a dynamic
 allocation at some point. A lower-level abstraction such as mine (or
 Tango's pull-parser) just returns each attribute as a separate token as
 it parses them.

 The downside of the tokenizer interface is that it only checks for a
 subset of well-formness, for instance it doesn't check that tags balance
 each other correctly or that there is no two attributes with the same
 name. It's just a "tokenizer" after all, it can't be described as a
 conformant XML parser by itself. The upper layer parser needs to check
 for these things. My mini DOM built on this tokenizer does these checks
 when using the tokenizer, and it's more efficient to do them there
 because that's where the context information is kept, which is why the
 tokenizer doesn't do them.

 Implementing SAX on top of my tokenizer consists mostly of ensuring
 proper tag balancing, checking for duplicate attributes, and collecting
 attributes in an array (or another kind of list) you can then give to
 the openElement SAX callback.

My understanding was that SAX _doesn't_ check those things either and 
that it was up to the code responding to the events to tackle 
wellformedness. After all, if SAX handled wellformedness, there would be 
no need for it to pass an argument to closeElement to state what element 
was being closed.
SAX has its place though, when it comes to doing a single pass filter on 
a stream of XML that can be assumed to be wellformed, its simplicity is 
admittedly hard to beat. In other applications, however, there is much 
room for improvement. SAXplus, with a built in element memoisation, an 
element stack and a used id list sounds quite useful to me, as long as 
they remain optional of course.

Admittedly, my initial disappointment when looking into SAX means that 
it is something that I have not followed for some time.

Hmn, I suddenly just got nostalgic for the days when XML was all shiney 
and new and everyone was writing their own APIs or butchering old 
SGML/HTML tech. Makes me want to go look at my old code ^^

A...

Jun 29 2010

BLS <windevguy hotmail.de> writes:

On 28/06/2010 14:04, Steven Schveighoffer wrote:
 I myself have tried to think of how xml can be done with ranges, but I
 believe one of the key elements is it has to parse xml without loading
 the entire document to be efficient enough for some applications.  A DOM
 style parser which presents a range interface is probably fine, but a
 lazy interface would be the best.  Since XML is a tree style, you need a
 range which allows moving down the tree.  You almost need a stacking
 range which can move down the tree and also to the next sibling
 element.  Ideally, the library should do as much as possible without
 allocating anything but buffer space to read data.

 -Steve

Hi Steve,
Philippe Sigaud has written a very interesting lib. called dranges.
http://www.dsource.org/projects/dranges/wiki

I think treerange.d and graphrange.d are an excellent source of inspiration.
Bjoern

Jun 28 2010

Bernard Helyer <b.helyer gmail.com> writes:

On Sun, 27 Jun 2010 20:04:30 +0930, Justin Johansson wrote:

 May I ask is anybody working on redeveloping std.xml in the D2/Phobos
 library?  (Currently it looks like it needs to be started over from
 scratch)
 
 Also what is the level of interest from library users for decent XML
 support in D2/Phobos?
 
 Cheers
 Justin Johansson

std.xml needs to be replaced, but I personally don't much care as kxml 
fits my needs nicely:  http://opticron.no-ip.org/svn/branches/kxml

Jun 27 2010

Joe Hildebrand <joe.hildebrand webex.com> writes:

On 6/27/10 8:37 PM, "Sean Kelly" <sean invisibleduck.org> wrote:

 I'd like to cast a vote for a SAX-style parser.  A DOM parser can be built on
 top of it, and frankly, a SAX parser the only kind I'd ever use.  I'm either
 working with large streams where building a tree is impractical, or
 performance is enough of an issue that again, building a tree is impractical.

Agree, with one additional requirement: the ability to throw random chunks
of bytes at the parser whenever I like, along the lines of Expat or XP.
This is a must for dealing with a stream of XML like XMPP.

-- 
Joe Hildebrand

Jun 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-27 07:04:30 -0400, Justin Johansson <no spam.com> said:

 May I ask is anybody working on redeveloping std.xml in the D2/Phobos 
 library?  (Currently it looks like it needs to be started over from 
 scratch)
 
 Also what is the level of interest from library users for decent XML 
 support in D2/Phobos?

I have made my own parser, comprised of a tokenizer and a mini DOM layer.

I'm not sure how to qualify the tokenizer: it's mainly based on 
callbacks like an event parser, but a callback can decide to stop the 
parsing process and return to the original caller of the tokenizer 
(which can later restart parsing), it can choose to continue parsing 
the next token, or to recursively continue to run the parser using a 
different set of callbacks. From there it's trivial to efficiently 
implement a pull parser or a SAX parser, but the way callbacks can 
recursively call the tokenizer allows greater flexibility than those 
two models.

The mini DOM I've made is based on this tokenizer, but is quite 
ordinary in comparison.

Here's the generated documentation:

http://michelf.com/docs/d/mfr/xmltok.html
http://michelf.com/docs/d/mfr/xml.html

I'm slowly revamping it to use ranges instead of strings.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-06-27 07:04:30 -0400, Justin Johansson <no spam.com> said:
 
 May I ask is anybody working on redeveloping std.xml in the D2/Phobos 
 library?  (Currently it looks like it needs to be started over from 
 scratch)

 Also what is the level of interest from library users for decent XML 
 support in D2/Phobos?

 
 I have made my own parser, comprised of a tokenizer and a mini DOM layer.
 
 I'm not sure how to qualify the tokenizer: it's mainly based on 
 callbacks like an event parser, but a callback can decide to stop the 
 parsing process and return to the original caller of the tokenizer 
 (which can later restart parsing), it can choose to continue parsing the 
 next token, or to recursively continue to run the parser using a 
 different set of callbacks. From there it's trivial to efficiently 
 implement a pull parser or a SAX parser, but the way callbacks can 
 recursively call the tokenizer allows greater flexibility than those two 
 models.
 
 The mini DOM I've made is based on this tokenizer, but is quite ordinary 
 in comparison.
 
 Here's the generated documentation:
 
 http://michelf.com/docs/d/mfr/xmltok.html
 http://michelf.com/docs/d/mfr/xml.html
 
 I'm slowly revamping it to use ranges instead of strings.

I think a tokenizer should be a higher-order range that is fed an input 
range of ubyte, char, wchar, or dchar (so that would be a type 
parameter) and is itself a range of Tokens that include the token type, 
token value etc.

Andrei

Jun 28 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-06-28 14:27:13 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Here's the generated documentation:
 
 http://michelf.com/docs/d/mfr/xmltok.html
 http://michelf.com/docs/d/mfr/xml.html
 
 I'm slowly revamping it to use ranges instead of strings.

 
 I think a tokenizer should be a higher-order range that is fed an input 
 range of ubyte, char, wchar, or dchar (so that would be a type 
 parameter) and is itself a range of Tokens that include the token type, 
 token value etc.

And I've implemented a tokenizer range just like you describe on top of 
my tokenizer function. Look at the documentation for 
mfr.xmltok.XMLForwardRange. (I should probably rename it to 
XMLTokenRange.)

Personally, I prefer to use the callback approach which automatically 
calls the right function according to the token type. But what's nice 
about my tokenizer is that you can do both callbacks and pull-style 
tokenization (the later can be wrapped in a range), and mix these 
approaches together as needed.

What is missing is taking arbitrary ranges as input (it deals with 
strings currently). Strings are like the optimized case for 
tokenization because you don't have to dynamically allocate anything: 
referencing the original string is enough when making substrings. With 
arbitrary ranges you have to copy the text and tag names to a string 
one character at a time, which is less efficient. I don't want to write 
two separate parsers for this, so I'm trying to abstract things at the 
right level to maximize code reuse while keeping performance optimized 
for the string-as-input case, but how to do that is not so obvious.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jun 28 2010

lurker <lurker mailinator.com> writes:

I'm very interested.

Tango's XML code was very good and damn fast. Maybe license issues can be worked
out for that part at least?

Jun 28 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Status of std.xml (D2/Phobos)