www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - XML D2.x parsing &

reply Jesse Phillips <jessekphillips+d gmail.com> writes:
According to the documentation having &amp; in a tag will be turned to &

http://digitalmars.com/d/2.0/phobos/std_xml.html#text

I observe that this is not the case. And if an attribute contains &amp; it is
turned into &amp;amp; What is the best way to receive the same output for both.
The code that follows outputs

Attr: What &amp;amp; Up
Elem: What &amp; Up



*testfile.xml:*

<?xml version="1.0" encoding="utf-8"?>
<Tests>
	<Test thing="What &amp; Up">What &amp; Up</Test>
</Tests>


*test.d:*

import std.stdio;
import std.xml;

void main() {
	auto file = "testfile.xml";

	auto s = cast(string)std.file.read(file);

	auto xml = new DocumentParser(s);

	xml.onStartTag["Test"] = (ElementParser xml) {
		writeln("Attr: ", xml.tag.attr["thing"]);
	};

	xml.onEndTag["Test"] = (in Element e) {
		writeln("Elem: ", e.text);
	};
	xml.parse();
}
Jul 20 2009
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to &
 
 http://digitalmars.com/d/2.0/phobos/std_xml.html#text
 
 I observe that this is not the case. And if an attribute contains 
 &amp; it is turned into &amp;amp; What is the best way to receive the 
 same output for both. The code that follows outputs
 
 Attr: What &amp;amp; Up
 Elem: What &amp; Up
 
 
 
 *testfile.xml:*
 
 <?xml version="1.0" encoding="utf-8"?>
 <Tests>
 	<Test thing="What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
Jul 21 2009
parent reply Jesse Phillips <jessekphillips gmail.com> writes:
On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:

 Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to
 &
 
 http://digitalmars.com/d/2.0/phobos/std_xml.html#text
 
 I observe that this is not the case. And if an attribute contains &amp;
 it is turned into &amp;amp; What is the best way to receive the same
 output for both. The code that follows outputs
 
 Attr: What &amp;amp; Up
 Elem: What &amp; Up
 
 
 
 *testfile.xml:*
 
 <?xml version="1.0" encoding="utf-8"?> <Tests>
 	<Test thing="What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
Jul 21 2009
next sibling parent Brad Roberts <braddr puremagic.com> writes:
Jesse Phillips wrote:
 On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:
 
 Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to
 &

 http://digitalmars.com/d/2.0/phobos/std_xml.html#text

 I observe that this is not the case. And if an attribute contains &amp;
 it is turned into &amp;amp; What is the best way to receive the same
 output for both. The code that follows outputs

 Attr: What &amp;amp; Up
 Elem: What &amp; Up



 *testfile.xml:*

 <?xml version="1.0" encoding="utf-8"?> <Tests>
 	<Test thing="What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
The xml parsing code in D2 could use some love and care. It was originally written by Janice who seems to have dropped off the face of the planet. It's little more than a first draft with serious performance problems and several important bugs. Anyone want to volunteer to invest some time in improving it? Later, Brad
Jul 21 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr puremagic.com> wrote:
 Jesse Phillips wrote:
 On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:

 Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to
 &

 http://digitalmars.com/d/2.0/phobos/std_xml.html#text

 I observe that this is not the case. And if an attribute contains &amp=
;
 it is turned into &amp;amp; What is the best way to receive the same
 output for both. The code that follows outputs

 Attr: What &amp;amp; Up
 Elem: What &amp; Up



 *testfile.xml:*

 <?xml version=3D"1.0" encoding=3D"utf-8"?> <Tests>
 =A0 =A0 <Test thing=3D"What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. =A0Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
http://d.puremagic.com/issues/show_bug.cgi?id=3D3200 http://d.puremagic.com/issues/show_bug.cgi?id=3D3201
The xml parsing code in D2 could use some love and care. =A0It was origin=
ally
 written by Janice who seems to have dropped off the face of the planet. =
=A0It's
 little more than a first draft with serious performance problems and seve=
ral
 important bugs.

 Anyone want to volunteer to invest some time in improving it?
I don't mean to shoot down the idea? But Tango already has three XML parsers which are, like, the fastest. Ever. http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/ I'm just saying, it'd seem like pointless duplication of effort with such parsers _already available_. If it could be relicensed, I'd say that's the best route.
Jul 21 2009
prev sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Jarrett Billingsley wrote:
 On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr puremagic.com> wrote:
 Jesse Phillips wrote:
 On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:

 Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to
 &

 http://digitalmars.com/d/2.0/phobos/std_xml.html#text

 I observe that this is not the case. And if an attribute contains &amp;
 it is turned into &amp;amp; What is the best way to receive the same
 output for both. The code that follows outputs

 Attr: What &amp;amp; Up
 Elem: What &amp; Up



 *testfile.xml:*

 <?xml version="1.0" encoding="utf-8"?> <Tests>
     <Test thing="What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
The xml parsing code in D2 could use some love and care. It was originally written by Janice who seems to have dropped off the face of the planet. It's little more than a first draft with serious performance problems and several important bugs. Anyone want to volunteer to invest some time in improving it?
I don't mean to shoot down the idea? But Tango already has three XML parsers which are, like, the fastest. Ever. http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/ I'm just saying, it'd seem like pointless duplication of effort with such parsers _already available_. If it could be relicensed, I'd say that's the best route.
Relicensed and separable from the rest of Tango. It's been way too long since I looked at that code in Tango to recall any of its details. Basically I agree with you on this one. :)
Jul 21 2009
parent Ary Borenszweig <ary esperanto.org.ar> writes:
Brad Roberts escribió:
 Jarrett Billingsley wrote:
 On Tue, Jul 21, 2009 at 11:53 PM, Brad Roberts<braddr puremagic.com> wrote:
 Jesse Phillips wrote:
 On Wed, 22 Jul 2009 01:37:38 +0100, Stewart Gordon wrote:

 Jesse Phillips wrote:
 According to the documentation having &amp; in a tag will be turned to
 &

 http://digitalmars.com/d/2.0/phobos/std_xml.html#text

 I observe that this is not the case. And if an attribute contains &amp;
 it is turned into &amp;amp; What is the best way to receive the same
 output for both. The code that follows outputs

 Attr: What &amp;amp; Up
 Elem: What &amp; Up



 *testfile.xml:*

 <?xml version="1.0" encoding="utf-8"?> <Tests>
     <Test thing="What &amp; Up">What &amp; Up</Test>
 </Tests>
Clearly std.xml is buggy. Correct behaviour would be Attr: What & Up Elem: What & Up The best place for bug reports is http://d.puremagic.com/issues/ Stewart.
http://d.puremagic.com/issues/show_bug.cgi?id=3200 http://d.puremagic.com/issues/show_bug.cgi?id=3201
The xml parsing code in D2 could use some love and care. It was originally written by Janice who seems to have dropped off the face of the planet. It's little more than a first draft with serious performance problems and several important bugs. Anyone want to volunteer to invest some time in improving it?
I don't mean to shoot down the idea? But Tango already has three XML parsers which are, like, the fastest. Ever. http://dotnot.org/blog/archives/2008/03/04/xml-benchmarks-updated-graphs/ I'm just saying, it'd seem like pointless duplication of effort with such parsers _already available_. If it could be relicensed, I'd say that's the best route.
Relicensed and separable from the rest of Tango. It's been way too long since I looked at that code in Tango to recall any of its details. Basically I agree with you on this one. :)
Can't just phobos dissappear? :-( Like... it must be the first standard library in the world that's developed by 3-5 people.
Jul 22 2009