www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parsing with dxml

reply Joel <joelcnz gmail.com> writes:
I can only parse one row successfully. I tried increasing the 
popFronts, till it said I'd gone off the end.

Running ./app
core.exception.AssertError ../../../../.dub/packages/dxml-0.4.1/dxml/source/
xml/parser.d(1457): text cannot be called with elementEnd
----------------
??:? _d_assert_msg [0x104b3981a]
../../JMiscLib/source/jmisc/base.d:161 pure  property  safe 
immutable(char)[] dxml.parser.EntityRange!(dxml.parser.Config(1, 
1, 1, 1), immutable(char)[]).EntityRange.Entity.text() 
[0x104b2297b]
source/app.d:26 _Dmain [0x104aeb46e]
Program exited with code 1

```
<?xml version="1.0"?>

<resultset statement="SELECT * FROM bible.t_asv
" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <row>
	<field name="id">01001001</field>
	<field name="b">1</field>
	<field name="c">1</field>
	<field name="v">1</field>
	<field name="t">In the beginning God created the heavens and the 
earth.</field>
   </row>

   <row>
	<field name="id">01001002</field>
	<field name="b">1</field>
	<field name="c">1</field>
	<field name="v">2</field>
	<field name="t">And the earth was waste and void; and darkness 
was upon the face of the deep: and the Spirit of God moved upon 
the face of the waters.</field>
   </row>

```

```d
void main() {
     import std.stdio;
     import std.file : readText;
     import dxml.parser;
     import std.conv : to;

     struct Verse {
         string id;
         int b, c, v;
         string t;
     }

     auto range = parseXML!simpleXML(readText("xmltest.xml"));

     // simpleXML skips comments

     void pops(int c) {
         foreach(_; 0 .. c)
             range.popFront();
     }
     pops(3);

     Verse[] vers;
     foreach(_; 0 .. 2) {
         Verse ver;
         ver.id = range.front.text;
         pops(3);
         ver.b = range.front.text.to!int;
         pops(3);
         ver.c = range.front.text.to!int;
         pops(3);
         ver.v = range.front.text.to!int;
         pops(3);
         ver.t = range.front.text;

         with(ver)
             vers ~= Verse(id,b,c,v,t);

         pops(2);
     }
     foreach(verse; vers) with(verse)
         writeln(id, " Book: ", b, " ", c, ":", v, " -> ", t);
}
```
Nov 17 2019
next sibling parent Joel <joelcnz gmail.com> writes:
On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
         with(ver)
             vers ~= Verse(id,b,c,v,t);
Or, vers ~= ver;
Nov 18 2019
prev sibling next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, November 17, 2019 11:44:43 PM MST Joel via Digitalmars-d-learn 
wrote:
 I can only parse one row successfully. I tried increasing the
 popFronts, till it said I'd gone off the end.

 Running ./app
 core.exception.AssertError ../../../../.dub/packages/dxml-0.4.1/dxml/sourc
 e/dxml/parser.d(1457): text cannot be called with elementEnd
 ----------------
 ??:? _d_assert_msg [0x104b3981a]
 ../../JMiscLib/source/jmisc/base.d:161 pure  property  safe
 immutable(char)[] dxml.parser.EntityRange!(dxml.parser.Config(1,
 1, 1, 1), immutable(char)[]).EntityRange.Entity.text()
 [0x104b2297b]
 source/app.d:26 _Dmain [0x104aeb46e]
 Program exited with code 1

 ```
 <?xml version="1.0"?>

 <resultset statement="SELECT * FROM bible.t_asv
 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <row>
   <field name="id">01001001</field>
   <field name="b">1</field>
   <field name="c">1</field>
   <field name="v">1</field>
   <field name="t">In the beginning God created the heavens and the
 earth.</field>
    </row>

    <row>
   <field name="id">01001002</field>
   <field name="b">1</field>
   <field name="c">1</field>
   <field name="v">2</field>
   <field name="t">And the earth was waste and void; and darkness
 was upon the face of the deep: and the Spirit of God moved upon
 the face of the waters.</field>
    </row>

 ```

 ```d
 void main() {
      import std.stdio;
      import std.file : readText;
      import dxml.parser;
      import std.conv : to;

      struct Verse {
          string id;
          int b, c, v;
          string t;
      }

      auto range = parseXML!simpleXML(readText("xmltest.xml"));

      // simpleXML skips comments

      void pops(int c) {
          foreach(_; 0 .. c)
              range.popFront();
      }
      pops(3);

      Verse[] vers;
      foreach(_; 0 .. 2) {
          Verse ver;
          ver.id = range.front.text;
          pops(3);
          ver.b = range.front.text.to!int;
          pops(3);
          ver.c = range.front.text.to!int;
          pops(3);
          ver.v = range.front.text.to!int;
          pops(3);
          ver.t = range.front.text;

          with(ver)
              vers ~= Verse(id,b,c,v,t);

          pops(2);
      }
      foreach(verse; vers) with(verse)
          writeln(id, " Book: ", b, " ", c, ":", v, " -> ", t);
 }
 ```
You need to be checking the type of the entity before you call either name or text on it, because not all entities have a name, and not all entities have text - e.g. <field name="id"> is an EntityType.elementStart, so it has a name (which is "field"), but it doesn't have text, whereas the 01001001 between the <field name="id"> and </field> tags has no name but does have text, because it's an EntityType.text. If you call name or text without verifying the type first, then you're almost certainly going to get an assertion failure at some point (assuming that you don't compile with -release anyway), since you're bound to end up with an entity that you don't expect at some point (either because you were wrong about where you were in the document, or because the document didn't match the layout that was expected). Per the assertion's message, you managed to call text on an EntityType.elementEnd, and per the stack trace, text was called on this line ver.id = range.front.text; If I add if(range.front.type == EntityType.elementEnd) { writeln(range.front.name); writeln(range.front.pos); } right above that, I get row TextPos(11, 4) indicating that the end tag was </row> and that it was on line 11, 4 code units in (and since this is ASCII, that would be 4 characters). So, you managed to parse all of the <field>***</field> lines but didn't correctly deal with the end of that section. If I add writeln(range.front); right before pops(2); then I get: Entity(text, TextPos(10, 25), , Text!(ByCodeUnitImpl)(In the beginning God created the heavens and the earth., TextPos(10, 25))) So, prior to popping twice, it's on the text between <field name="t"> and </field>, which looks like it's what you intended. If you look at the XML after that, it should be clear why you're in the wrong place afterwards. Since at that point, range.front is on the EntityType.text between <field name="t"> and </field>, popping once makes it so that range.front is </field>. And popping a second time makes range.front </row>, which is where the range is when it the tries to call text at the top of the loop. Presumably, you want it to be on the EntityType.text in <field name="id">01001002</field> To get there from </row>, you'd have to pop once to get to <row>, a second time to get to <field>, and a third time to get to 01001002. So, if you had pops(5); instead of pops(2); the range would be at the correct place at the top of the loop - though it would then be the wrong number of times to pop the second time around. With the text as provided, it would throw an XMLParsingException when it reached the end of the loop the second time, because the XML document doesn't have the matching </resultset> tag, and with that fixed, you end up with an assertion failure, because popFront was called on an empty range (since there aren't 7 elements left in the range at that point): core.exception.AssertError ../../.dub/packages/dxml-0.4.0/dxml/source/dxml /parser.d(1746): It's illegal to call popFront() on an empty EntityRange. So, you'd need to adjust the end of the loop so that it only pops what it needs to pop on the second loop. If you don't care about any data after that point, you could just make it not pop on the last iteration, or what would probably be better would be to write the loop so that it expects to start on <row>, and it will exit the loop if it's instead on an end tag (since that would indicate the end of that section, and in this case, it would mean that it was no the last entity in the document). Regardless, if you're actually looking to parse a document like this in production code instead of in something that's just thrown together to get something done, you'd actually need to be checking the EntityType of each element to make sure that it was what was expected so that you can provide an error to the user when the document is malformed. dxml expects that you will only ever call a property of an EntityRange.Entity which is valid for that EntityType, and it asserts that it's not called on the wrong type. So, if you don't check the EntityType, unless you can guarantee that the XML document is as expected, you're going to get assertion failures when not compiling with -release, and you'll get weird results when the assertions are complied out with -release. On an unrelated note, std.range.primitives.popFrontN (or std.range.popFrontN, since std.range publicly imports std.range.primitives) does what your pops function does - and it does it more efficiently for ranges which have slicing (which dxml's EntityRange doesn't, but either way, you can just use the function from Phobos instead of writing your own). - Jonathan M Davis
Nov 18 2019
parent reply Joel <joelcnz gmail.com> writes:
On Tuesday, 19 November 2019 at 02:45:29 UTC, Jonathan M Davis 
wrote:
 On Sunday, November 17, 2019 11:44:43 PM MST Joel via 
 Digitalmars-d-learn wrote:
 [...]
You need to be checking the type of the entity before you call either name or text on it, because not all entities have a name, and not all entities have text - e.g. <field name="id"> is an EntityType.elementStart, so it has a name (which is "field"), but it doesn't have text, whereas the 01001001 between the <field name="id"> and </field> tags has no name but does have text, because it's an EntityType.text. If you call name or text without verifying the type first, then you're almost certainly going to get an assertion failure at some point (assuming that you don't compile with -release anyway), since you're bound to end up with an entity that you don't expect at some point (either because you were wrong about where you were in the document, or because the document didn't match the layout that was expected). [...]
Thanks for taking the time to reply. I have had another xml Bible version text in the past [1]. It had a different format. And Adam Ruppe helped me by writing code that worked (with just one tweak). I think I want another example that I can just paste into my program, using the same structs as the last xml version (see link). [1] https://forum.dlang.org/thread/j7ljs5$24r2$1 digitalmars.com
Nov 18 2019
parent Joel <joelcnz gmail.com> writes:
On Tuesday, 19 November 2019 at 04:43:31 UTC, Joel wrote:
 On Tuesday, 19 November 2019 at 02:45:29 UTC, Jonathan M Davis 
 wrote:
 [...]
Thanks for taking the time to reply. I have had another xml Bible version text in the past [1]. It had a different format. And Adam Ruppe helped me by writing code that worked (with just one tweak). I think I want another example that I can just paste into my program, using the same structs as the last xml version (see link). [1] https://forum.dlang.org/thread/j7ljs5$24r2$1 digitalmars.com
-class's (not structs)
Nov 18 2019
prev sibling parent reply Kagamin <spam here.lot> writes:
On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
 ```
 <?xml version="1.0"?>

 <resultset statement="SELECT * FROM bible.t_asv
 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   </row>

 ```
You're missing a closing tag.
Nov 19 2019
parent reply Joel <joelcnz gmail.com> writes:
On Tuesday, 19 November 2019 at 14:20:39 UTC, Kagamin wrote:
 On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
 ```
 <?xml version="1.0"?>

 <resultset statement="SELECT * FROM bible.t_asv
 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   </row>

 ```
You're missing a closing tag.
I can store the ASV Bible in an array (I check for if the last book, chapter, and verse number instead of a closing tag). But I haven't figured out how to get it into the class's setup I've got.
Nov 19 2019
parent Joel <joelcnz gmail.com> writes:
On Wednesday, 20 November 2019 at 00:07:53 UTC, Joel wrote:
 On Tuesday, 19 November 2019 at 14:20:39 UTC, Kagamin wrote:
 On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
 ```
 <?xml version="1.0"?>

 <resultset statement="SELECT * FROM bible.t_asv
 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   </row>

 ```
You're missing a closing tag.
I can store the ASV Bible in an array (I check for if the last book, chapter, and verse number instead of a closing tag). But I haven't figured out how to get it into the class's setup I've got.
Ok, got it working. Though didn't use any xml tools, just split the xml file into lines, and went from there. I used my trace function in a mixin for tracing what was happening, from simple code I reuse in my programs - I shows the variable and its value without having to write the variable twice. ``` g_bible = new Bible; int b, c, v; size_t j; break0: do { b = verses[j].b; g_bible.m_books ~= new Book(bookNames[b-1]); version(asvtrace) mixin(trace("g_bible.m_books[$-1].m_bookTitle")); do { c = verses[j].c; g_bible.m_books[$-1].m_chapters ~= new Chapter(c.to!string); version(asvtrace) mixin(trace("j g_bible.m_books[$-1].m_chapters[$-1].m_chapterTitle".split)); do { v = verses[j].v; g_bible.m_books[$-1].m_chapters[$-1].m_verses ~= new Verse(v.to!string); g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].verse = verses[j].t; version(asvtrace) mixin(trace(("j g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].m_verseTitle" ~ " g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].verse").split)); j += 1; if (j == verses.length) break break0; } while(verses[j].v != 1); } while(verses[j+1].c != 1); } while(true); ```
Nov 20 2019