www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Writing XML

reply Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:
While I'm circling the problem of parsing, I took a quick look at writing n=
ot to get stuck in analysis-paralysis. Writing XML is pretty independent fr=
om parsing and an order of magnitude easier to solve. It was perfect to get=
 myself coding.

These are the guidelines I followed:

 * Memory minimalism: don't force allocating an intermediate node structure=
 just to push a few tags down the wire.

 * Composability: operating on an arbitrary string output range.

 * Robustness: tags should not be left open, even if the routine producing =
tag interior throws.

 * Simplicity of syntax: resembling real XML if possible.

 * Space efficiency / readability: can write tightly (without indents and n=
ewlines) for faster network transfer and, having easy an means for temporar=
y tight writing, for better readability.

 * Ease of use:
   - automatic to!string of non-string values,
   - automatic string escaping according to XML standard,
   - handle nulls: close the tags short (<tag/>), don't write attributes wi=
th null values at all.

 * anything else?


The new writer meets pretty much all of the above. Here's an example to get=
 a feel of it:

auto books =3D [
    Book([Name("Gr=EAbosz", "Jerzy")], "Pasja C++", 1999),
    Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 200=
7),
    Book([Name("Tokarczuk", "Olga")], "Podr=F3=BF ludzi Ksi=EAgi", 1996),
    Book([Name("Graham", "Ronald", "L."),
         Name("Knuth", "Donald", "E."),
         Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008)
];

auto outputRange =3D ... ;
auto xml =3D xmlWriter(outputRange);

xml.comment(books.length, " favorite books of mine.");
foreach (book; books) {
    xml.book("year", book.year, {
         foreach (author; book.authors) {
             xml.tight.authorName({
                 xml.first(author.first);
                 xml.middle(author.middle);
                 xml.last(author.last);
             });
         }
         xml.tight.title(book.title);
    });
}

--------------------------------- program output --------------------------=
-------

<!-- 4 favorite books of mine. -->
<book year=3D"1999">
  <authorName><first>Jerzy</first><middle/><last>Gr=EAbosz</last></authorNa=
me>
  <title>Pasja C++</title>
</book>
<book year=3D"2007">
  <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></a=
uthorName>
  <title>Mathemetics of Derivatives</title>
</book>
<book year=3D"1996">
  <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorNam=
e>
  <title>Podr=F3=BF ludzi Ksi=EAgi</title>
</book>
<book year=3D"2008">
  <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></=
authorName>
  <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></a=
uthorName>
  <authorName><first>Oren</first><middle/><last>Patashnik</last></authorNam=
e>
  <title>Matematyka Konkretna</title>
</book>


Questions and comments?

--=20
Tomek
Feb 06 2011
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-02-06 15:43, Tomek Sowiński wrote:
 While I'm circling the problem of parsing, I took a quick look at writing not
to get stuck in analysis-paralysis. Writing XML is pretty independent from
parsing and an order of magnitude easier to solve. It was perfect to get myself
coding.

 These are the guidelines I followed:

   * Memory minimalism: don't force allocating an intermediate node structure
just to push a few tags down the wire.

   * Composability: operating on an arbitrary string output range.

   * Robustness: tags should not be left open, even if the routine producing
tag interior throws.

   * Simplicity of syntax: resembling real XML if possible.

   * Space efficiency / readability: can write tightly (without indents and
newlines) for faster network transfer and, having easy an means for temporary
tight writing, for better readability.

   * Ease of use:
     - automatic to!string of non-string values,
     - automatic string escaping according to XML standard,
     - handle nulls: close the tags short (<tag/>), don't write attributes with
null values at all.

   * anything else?


 The new writer meets pretty much all of the above. Here's an example to get a
feel of it:

 auto books = [
      Book([Name("Grębosz", "Jerzy")], "Pasja C++", 1999),
      Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 2007),
      Book([Name("Tokarczuk", "Olga")], "Podróż ludzi Księgi", 1996),
      Book([Name("Graham", "Ronald", "L."),
           Name("Knuth", "Donald", "E."),
           Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008)
 ];

 auto outputRange = ... ;
 auto xml = xmlWriter(outputRange);

 xml.comment(books.length, " favorite books of mine.");
 foreach (book; books) {
      xml.book("year", book.year, {
           foreach (author; book.authors) {
               xml.tight.authorName({
                   xml.first(author.first);
                   xml.middle(author.middle);
                   xml.last(author.last);
               });
           }
           xml.tight.title(book.title);
      });
 }

 --------------------------------- program output
---------------------------------

 <!-- 4 favorite books of mine. -->
 <book year="1999">
    <authorName><first>Jerzy</first><middle/><last>Grębosz</last></authorName>
    <title>Pasja C++</title>
 </book>
 <book year="2007">
    <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></authorName>
    <title>Mathemetics of Derivatives</title>
 </book>
 <book year="1996">
    <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorName>
    <title>Podróż ludzi Księgi</title>
 </book>
 <book year="2008">
    <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></authorName>
    <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></authorName>
    <authorName><first>Oren</first><middle/><last>Patashnik</last></authorName>
    <title>Matematyka Konkretna</title>
 </book>


 Questions and comments?

This seems to be like the Ruby "builder" library, which is a library I like. This is a perfect candidate for the syntax sugar I've proposed for passing delegates to a function after the parameter list. -- /Jacob Carlborg
Feb 06 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/6/11 9:43 AM, Tomek Sowiński wrote:
 While I'm circling the problem of parsing, I took a quick look at writing not
to get stuck in analysis-paralysis.

That's great. I won't be able to add much because I haven't worked with XML so I don't know what people need. The example looks nice and clean. Generally laziness may be a tactics you could use to help with memory use. A good example is split() vs. splitter(). The "er" version offers one element at a time thus never forcing an allocation. The split() version must do all work upfront and also allocate a structure for depositing the output. Andrei
Feb 06 2011
prev sibling next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
Tomek Sowiński wrote:
 auto xml = xmlWriter(outputRange);
 
 xml.comment(books.length, " favorite books of mine.");
 foreach (book; books) {
     xml.book("year", book.year, {
          foreach (author; book.authors) {
              xml.tight.authorName({
                  xml.first(author.first);
                  xml.middle(author.middle);
                  xml.last(author.last);
              });
          }
          xml.tight.title(book.title);
     });
 }

This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? How do you write a tag named "tight"? Or a tag calculated at runtime? Something more conventional would be xml.tag("book", attr("year", book.year), { ... but I'm not sure that pairing the attribute name and value adds readability or mere noise. Rainer
Feb 06 2011
parent Christopher Nicholson-Sauls <ibisbasenji gmail.com> writes:
On 02/06/11 18:18, Tomek Sowiński wrote:
 Rainer Schuetze napisał:
 
 This looks nice and compact Using opDispatch to specify the tag (I guess 
 that is what you are using to create a tag "book" by calling xml.book()) 
 feels like misusing opDispatch, though. Does it add readability in 
 contrast to passing the tag as a string to some function?

 How do you write a tag named "tight"? Or a tag calculated at runtime?

xml.tag("tight", attributes..., { make content }); That's the base implementation. opDispatch is just syntax sugar over it.

Might I suggest changing the sugar to have a suffix? Ie, instead of xml.book(...) as sugar for xml.tag("book",...) make it xml.bookTag(...) instead (or something similar). Very easy to check for using an if condition, and in the event that some XML application actually has a "tag" tag... well, xml.tagTag() might look funny, but at least it'd work. Could also support "_tag" as an alternate suffix for those with such sensibilities; xml.book_tag(...). -- Chris N-S
Feb 08 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday 06 February 2011 13:59:19 Rainer Schuetze wrote:
 Tomek Sowi=F1ski wrote:
 auto xml =3D xmlWriter(outputRange);
=20
 xml.comment(books.length, " favorite books of mine.");
 foreach (book; books) {
=20
     xml.book("year", book.year, {
    =20
          foreach (author; book.authors) {
         =20
              xml.tight.authorName({
             =20
                  xml.first(author.first);
                  xml.middle(author.middle);
                  xml.last(author.last);
             =20
              });
         =20
          }
          xml.tight.title(book.title);
    =20
     });
=20
 }

This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag "book" by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? =20 How do you write a tag named "tight"? Or a tag calculated at runtime? =20 Something more conventional would be =20 xml.tag("book", attr("year", book.year), { ... =20 but I'm not sure that pairing the attribute name and value adds readability or mere noise.

Actually, using opDispatch in that manner would become a big problem once y= ou=20 tried to have an xml tag with any function name that xml would have on it. = It=20 really doesn't sound like a good idea and really doesn't provide much benef= it -=20 if any - as far as I can see. It's so simple to just take the tag name as a= =20 string that I see no reason to do otherwise. =2D Jonathan M Davis
Feb 06 2011
prev sibling next sibling parent spir <denis.spir gmail.com> writes:
On 02/06/2011 10:59 PM, Rainer Schuetze wrote:
 This looks nice and compact Using opDispatch to specify the tag (I guess that
 is what you are using to create a tag "book" by calling xml.book()) feels like
 misusing opDispatch, though. Does it add readability in contrast to passing the
 tag as a string to some function?

About readability, I for one had to thnk really hard ;-)
 How do you write a tag named "tight"? Or a tag calculated at runtime?

Call opDispatch directly ;-)
 Something more conventional would be

      xml.tag("book", attr("year", book.year), { ...

Would prefere that by far. (even if a few chars more verbose: who cares, the code here is actually data description, by definition done once and for all?)
 but I'm not sure that pairing the attribute name and value adds readability or
 mere noise.

This raises the same famous issue (repetedly pointed to) as by Lua's tables: since they are both objects and collections (and both arrays and AAs, bu the way), then there is no way to tell apart attributes (members) from elements (coll data), when needed. t.count & t["count"] could /both/ mean attribute 'count' or element which key is "count". Too bad. This is for me the issue #1 in Lua's design (#2 beeing precisely non-distinction of arrays and AAs, which prevents development of good builtin functionality for each, because of conflicting requirements.) Denis -- _________________ vita es estrany spir.wikidot.com
Feb 06 2011
prev sibling next sibling parent reply spir <denis.spir gmail.com> writes:
On 02/06/2011 03:43 PM, Tomek Sowiński wrote:
 While I'm circling the problem of parsing, I took a quick look at writing not
to get stuck in analysis-paralysis. Writing XML is pretty independent from
parsing and an order of magnitude easier to solve. It was perfect to get myself
coding.

 These are the guidelines I followed:

   * Memory minimalism: don't force allocating an intermediate node structure
just to push a few tags down the wire.

   * Composability: operating on an arbitrary string output range.

   * Robustness: tags should not be left open, even if the routine producing
tag interior throws.

   * Simplicity of syntax: resembling real XML if possible.

   * Space efficiency / readability: can write tightly (without indents and
newlines) for faster network transfer and, having easy an means for temporary
tight writing, for better readability.

   * Ease of use:
     - automatic to!string of non-string values,
     - automatic string escaping according to XML standard,
     - handle nulls: close the tags short (<tag/>), don't write attributes with
null values at all.

   * anything else?


 The new writer meets pretty much all of the above. Here's an example to get a
feel of it:

 auto books = [
      Book([Name("Grębosz", "Jerzy")], "Pasja C++", 1999),
      Book([Name("Navin", "Robert", "N.")], "Mathemetics of Derivatives", 2007),
      Book([Name("Tokarczuk", "Olga")], "Podróż ludzi Księgi", 1996),
      Book([Name("Graham", "Ronald", "L."),
           Name("Knuth", "Donald", "E."),
           Name("Patashnik", "Oren")], "Matematyka Konkretna", 2008)
 ];

 auto outputRange = ... ;
 auto xml = xmlWriter(outputRange);

 xml.comment(books.length, " favorite books of mine.");
 foreach (book; books) {
      xml.book("year", book.year, {
           foreach (author; book.authors) {
               xml.tight.authorName({
                   xml.first(author.first);
                   xml.middle(author.middle);
                   xml.last(author.last);
               });
           }
           xml.tight.title(book.title);
      });
 }

 --------------------------------- program output
---------------------------------

 <!-- 4 favorite books of mine. -->
 <book year="1999">
    <authorName><first>Jerzy</first><middle/><last>Grębosz</last></authorName>
    <title>Pasja C++</title>
 </book>
 <book year="2007">
    <authorName><first>Robert</first><middle>N.</middle><last>Navin</last></authorName>
    <title>Mathemetics of Derivatives</title>
 </book>
 <book year="1996">
    <authorName><first>Olga</first><middle/><last>Tokarczuk</last></authorName>
    <title>Podróż ludzi Księgi</title>
 </book>
 <book year="2008">
    <authorName><first>Ronald</first><middle>L.</middle><last>Graham</last></authorName>
    <authorName><first>Donald</first><middle>E.</middle><last>Knuth</last></authorName>
    <authorName><first>Oren</first><middle/><last>Patashnik</last></authorName>
    <title>Matematyka Konkretna</title>
 </book>


 Questions and comments?

When does one need to write by hand, in source, structured data needing to be serialised into XML (or any other format)? In my (admittedly very limited), such data always are outputs of some processing (if only reading from other file). denis -- _________________ vita es estrany spir.wikidot.com
Feb 06 2011
next sibling parent Ary Manzana <ary esperanto.org.ar> writes:
On 2/6/11 8:32 PM, spir wrote:
 When does one need to write by hand, in source, structured data needing
 to be serialised into XML (or any other format)? In my (admittedly very
 limited), such data always are outputs of some processing (if only
 reading from other file).

 denis

If you have a website API that exposes its data via XML, you would like to generate it like that. What do you mean "outputs of some processing"? As far as I know, the code that Tomek showed is "some processing". :-P
Feb 08 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 02/08/2011 07:44 PM, Ary Manzana wrote:
 On 2/6/11 8:32 PM, spir wrote:
 When does one need to write by hand, in source, structured data needing
 to be serialised into XML (or any other format)? In my (admittedly very
 limited), such data always are outputs of some processing (if only
 reading from other file).

 denis

If you have a website API that exposes its data via XML, you would like to generate it like that. What do you mean "outputs of some processing"? As far as I know, the code that Tomek showed is "some processing". :-P

No, in his example, the data were hardcoded as plain constant in source code. Denis -- _________________ vita es estrany spir.wikidot.com
Feb 08 2011
prev sibling next sibling parent Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:
Rainer Schuetze napisa=B3:

 This looks nice and compact Using opDispatch to specify the tag (I guess=

 that is what you are using to create a tag "book" by calling xml.book())=

 feels like misusing opDispatch, though. Does it add readability in=20
 contrast to passing the tag as a string to some function?
=20
 How do you write a tag named "tight"? Or a tag calculated at runtime?

xml.tag("tight", attributes..., { make content }); =20 That's the base implementation. opDispatch is just syntax sugar over it.
 Something more conventional would be
=20
 	xml.tag("book", attr("year", book.year), { ...
=20
 but I'm not sure that pairing the attribute name and value adds=20
 readability or mere noise.

Putting name and value without a wrapper tuple is just sugar. Having some s= ort of structure representing an attribute is inevitable as we come at name= spaces. In the end it should accept any range of (namespace-)name-value tup= les as attributes. --=20 Tomek
Feb 06 2011
prev sibling next sibling parent Russel Winder <russel russel.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I am coming in half way through a thread, apologies if I am saying
something that has already been said or is not relevant.

On Sun, 2011-02-06 at 22:59 +0100, Rainer Schuetze wrote:
 Tomek Sowi=C5=84ski wrote:
 auto xml =3D xmlWriter(outputRange);
=20
 xml.comment(books.length, " favorite books of mine.");
 foreach (book; books) {
     xml.book("year", book.year, {
          foreach (author; book.authors) {
              xml.tight.authorName({
                  xml.first(author.first);
                  xml.middle(author.middle);
                  xml.last(author.last);
              });
          }
          xml.tight.title(book.title);
     });
 }


This looks to be heading down the road Groovy trod 6 years ago with the MarkupBuilder, indeed the whole builders concept. Validation that Groovy's builders framework is a good idea is that Ruby took up the idea wholesale and Python is starting to as well. It seems the idea may fly in D as well even though it is a very different form of meta-object protocol (MOP). Basically Groovy (, Ruby, and Python) allow you to get rid of the xml. in the above code and it makes the function calls and closures work much better as a DSL for describing the markup. This relies on a MOP of course since it relies on the function despatch being redefinable.=20
 This looks nice and compact Using opDispatch to specify the tag (I guess=

 that is what you are using to create a tag "book" by calling xml.book())=

 feels like misusing opDispatch, though. Does it add readability in=20
 contrast to passing the tag as a string to some function?

Experience from Groovy, Ruby and Python is a strong yes. Having the tag name as the name of the function with attributes as keyword parameters, string content as an unnamed parameter and nested tag content in a closure leads to beautiful HTML, XHTML, XML, . . . production. well-formedness is guaranteed, though you can still generate invalid documents. Groovy's MarkupBuilder makes for very nice computation of webpages. Here is a real example of generating a part of my website: def writer =3D new StringWriter ( ) ( new MarkupBuilder ( writer ) ).html { head { 'meta' ( 'http-equiv' : 'Content-Type' , content : 'text/html; charse= t=3DUTF-8' ) title ( 'Dr Russel Winder &mdash; A Short Biography' ) link ( rel : 'stylesheet' , href : 'style.css' , type : 'text/css' ) } body { div ( id : 'main' ) h1 ( 'Concertant Articles by Russel Winder' ) ul { evaluate ( new File ( 'concertantArticles.groovy' ) ).each { item -=

home' + concertantWebpagesSourceDirectory + '/Articles/' + it em[0] ) ).text ) }, ${item[2][0..9]}, " ) { a ( href : "http://www.concertant.com/Articles/${item[0]}" , it= em[1] ) } } } h1 ( 'Articles about SC08 for Concertant by Russel Winder' ) ul { evaluate ( new File ( 'concertantSupercomputing2008Articles.groovy'= ) ).each { item -> li ( "${ extractPageTitle ( ( new File ( System.properties.'user.= home' + concertantWebpagesSourceDirectory + '/Supercomputing2 008Articles/' + item[0] ) ).text ) }, ${item[2][0..9]}, " ) { a ( href : "http://www.concertant.com/Supercomputing2008Article= s/${item[0]}", item[1] ) } } } } }
 How do you write a tag named "tight"? Or a tag calculated at runtime?
=20
 Something more conventional would be
=20
 	xml.tag("book", attr("year", book.year), { ...
=20
 but I'm not sure that pairing the attribute name and value adds=20
 readability or mere noise.

I don't see this being anything like as useful as what Groovy et al. already has via the MarkupBuilder. "Conventional" is not really the way to go for this you need the full DSL approach. At least in my opinion which I think is becoming the norm in the dynamic programming language arena. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 07 2011
prev sibling next sibling parent Russel Winder <russel russel.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

A couple of other random thoughts regarding XML:

1.  Groovy has XMLSlurper which is an surprisingly fast way of reading
XML and processing it as needed.  It was developed for fast
SAX-underneath, document-based but not-W3C-DOM  processing of
multi-Gigabyte XML documents.
http://groovy.codehaus.org/api/groovy/util/XmlSlurper.html

2.  Python seems to be going down the lxml route.  lxml is a Python
binding to libxml2 and libxslt.  http://codespeak.net/lxml/

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
Feb 07 2011
prev sibling parent Russel Winder <russel russel.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 2011-02-09 at 00:16 -0600, Christopher Nicholson-Sauls wrote:
[ . . . ]
 xml.book(...) as sugar for xml.tag("book",...) make it xml.bookTag(...)

using xml.bookTag would ruin it for me. I'd use something that allowed just book, xml.book is already not enough of a DSL. =20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Feb 08 2011