www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to simply parse and print the XML with dxml?

reply tastyminerals <tastyminerals gmail.com> writes:
Maybe I missed something obvious in the docs but how can I just 
parse the XML and print its content?

```
import dxml.parser;

auto xml = parseXML!simpleXML(layout);
xml.map!(e => e.text).join.writeln;
```

throws 
`core.exception.AssertError ../../../.dub/packages/dxml-0.4.3/dxml/source/
xml/parser.d(1457): text cannot be called with elementStart`.
Sep 09 2021
next sibling parent Ferhat =?UTF-8?B?S3VydHVsbXXFnw==?= <aferust gmail.com> writes:
On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
wrote:
 Maybe I missed something obvious in the docs but how can I just 
 parse the XML and print its content?

 ```
 import dxml.parser;

 auto xml = parseXML!simpleXML(layout);
 xml.map!(e => e.text).join.writeln;
 ```

 throws 
 `core.exception.AssertError ../../../.dub/packages/dxml-0.4.3/dxml/source/
xml/parser.d(1457): text cannot be called with elementStart`.
I am not fully experienced with it, but once I used it for reading glade files [1]. I used dxml.dom. Hope it helps. 1: https://github.com/aferust/makegtkdclass/blob/master/source/gladeparser.d#L43
Sep 09 2021
prev sibling next sibling parent Adam D Ruppe <destructionator gmail.com> writes:
On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
wrote:
 Maybe I missed something obvious in the docs but how can I just 
 parse the XML and print its content?
idk how to use dxml but my dom.d makes these things trivial http://arsd-official.dpldocs.info/arsd.dom.html https://github.com/adamdruppe/arsd/blob/master/dom.d https://code.dlang.org/packages/arsd-official%3Adom if you're familiar with javascript you'll find a lot of similarities with my api there. for strict xml mode you just use `new XmlDocument` instead of `new Document`
Sep 09 2021
prev sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
wrote:
 Maybe I missed something obvious in the docs but how can I just 
 parse the XML and print its content?

 ```
 import dxml.parser;

 auto xml = parseXML!simpleXML(layout);
 xml.map!(e => e.text).join.writeln;
 ```

 throws 
 `core.exception.AssertError ../../../.dub/packages/dxml-0.4.3/dxml/source/
xml/parser.d(1457): text cannot be called with elementStart`.
dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over `xml`, you're not getting `<a>some text</a>` at a time, but `<a>`, `some text`, and `</a>` separately, as they're parsed. The `<a>` there is an `elementStart` which lacks a `text`, hence the error. Here's a script: ```d /++ dub.sdl: dependency "dxml" version="0.4.0" stringImportPaths "." +/ import dxml.parser; import std; enum text = import(__FILE__) .splitLines .find("__EOF__") .drop(1) .join("\n"); void main() { foreach (entity; parseXML!simpleXML(text)) { if (entity.type == EntityType.text) writeln(entity.text.strip); } } __EOF__ <!-- comment --> <root> <foo>some text<whatever/></foo> <bar/> <baz></baz> more text </root> ``` that runs with this output: ``` some text more text ```
Sep 09 2021
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:
 On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
 wrote:
 [...]
dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over `xml`, you're not getting `<a>some text</a>` at a time, but `<a>`, `some text`, and `</a>` separately, as they're parsed. The `<a>` there is an `elementStart` which lacks a `text`, hence the error. [...]
That's a nice trick you did there
Sep 09 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Thursday, 9 September 2021 at 23:29:56 UTC, Imperatorn wrote:
 On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:
 On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
 wrote:
 [...]
dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over `xml`, you're not getting `<a>some text</a>` at a time, but `<a>`, `some text`, and `</a>` separately, as they're parsed. The `<a>` there is an `elementStart` which lacks a `text`, hence the error. [...]
That's a nice trick you did there
Something in the quoted text? Or if you mean the self-string-importing script that uses the content after `__EOF__` , yeah, that's in imitation of Perl's `__DATA__` https://perldoc.perl.org/perldata#Special-Literals
Sep 09 2021
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 9 September 2021 at 23:42:42 UTC, jfondren wrote:
 On Thursday, 9 September 2021 at 23:29:56 UTC, Imperatorn wrote:
 On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:
 [...]
That's a nice trick you did there
Something in the quoted text? Or if you mean the self-string-importing script that uses the content after `__EOF__` , yeah, that's in imitation of Perl's `__DATA__` https://perldoc.perl.org/perldata#Special-Literals
Yeah, the import thing
Sep 10 2021
prev sibling parent reply tastyminerals <tastyminerals gmail.com> writes:
On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:
 On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals 
 wrote:
 [...]
dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over `xml`, you're not getting `<a>some text</a>` at a time, but `<a>`, `some text`, and `</a>` separately, as they're parsed. The `<a>` there is an `elementStart` which lacks a `text`, hence the error. Here's a script: ```d /++ dub.sdl: dependency "dxml" version="0.4.0" stringImportPaths "." +/ import dxml.parser; import std; enum text = import(__FILE__) .splitLines .find("__EOF__") .drop(1) .join("\n"); void main() { foreach (entity; parseXML!simpleXML(text)) { if (entity.type == EntityType.text) writeln(entity.text.strip); } } __EOF__ <!-- comment --> <root> <foo>some text<whatever/></foo> <bar/> <baz></baz> more text </root> ``` that runs with this output: ``` some text more text ```
Ok, that makes sense now. Thank you. As for the dxml, I believe adding a small quick start example would be very beneficial for the newcomers. Especially, ppl like me who are not aware of the XML parser types and just need to extract text from an XML file.
Sep 10 2021
parent Mike Parker <aldacron gmail.com> writes:
On Friday, 10 September 2021 at 07:50:29 UTC, tastyminerals wrote:

 As for the dxml, I believe adding a small quick start example 
 would be very beneficial for the newcomers. Especially, ppl 
 like me who are not aware of the XML parser types and just need 
 to extract text from an XML file.
Submit a request: https://github.com/jmdavis/dxml/issues I don't know how active Jonathan is these days, but it won't get implemented at all if no one requests it.
Sep 10 2021