www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.xml and Adam D Ruppe's dom module

reply Alvaro <alvaroDotSegura gmail.com> writes:
The current std.xml needs a replacement (I think it has already been 
decided so), something with a familiar DOM API with facilities to 
manipulate XML documents.

I needed to do a quick XML file transform and std.xml was of little use. 
Then I found this Adam D. Ruppe's library:

https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

Its "dom" module has that sort of Javascript-like DOM manipulation code. 
It has getElementsByTagName(), getElementById(), firstChild, nodeValue, 
innerText (read/write), toString, etc. Easy and performing. The XML 
processing I needed to make took minutes with that!

I thus suggest, if licensing allows (?) and no better exists, to base a 
newer std.xml module on that code as a starting point. Well, after 
cleaning up and fixing anything necessary. For instance, that module was 
designed for Web server programming and is targeted at HTML DOM mainly. 
It has stuff to deal with CSS styles, HTML specific elements and their 
special handling (table, a, form, audio, input, ...). A lot of that can 
be left out in XML.
Feb 06 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, February 07, 2012 00:15:37 Alvaro wrote:
 The current std.xml needs a replacement (I think it has already been
 decided so), something with a familiar DOM API with facilities to
 manipulate XML documents.
=20
 I needed to do a quick XML file transform and std.xml was of little u=

 Then I found this Adam D. Ruppe's library:
=20
 https://github.com/adamdruppe/misc-stuff-including-D-programming-lang=

 b-stuff
=20
 Its "dom" module has that sort of Javascript-like DOM manipulation co=

 It has getElementsByTagName(), getElementById(), firstChild, nodeValu=

 innerText (read/write), toString, etc. Easy and performing. The XML
 processing I needed to make took minutes with that!
=20
 I thus suggest, if licensing allows (?) and no better exists, to base=

 newer std.xml module on that code as a starting point. Well, after
 cleaning up and fixing anything necessary. For instance, that module =

 designed for Web server programming and is targeted at HTML DOM mainl=

 It has stuff to deal with CSS styles, HTML specific elements and thei=

 special handling (table, a, form, audio, input, ...). A lot of that c=

 be left out in XML.

Tomek Sowi=C5=84ski was working on a new std.xml which was intended to = become the=20 new std.xml, but he hasn't posted since June, so I have no idea where t= hat=20 stands. And I believe that xmlp is intended as a possible candidate as = well. The main issue is that we need someone to design it, implement it, and = push it=20 through the review process. Just suggesting it is not enough. Someone n= eeds to=20 champion the new module. And no one has gone all the way with that yet.= Also, two of the major requirements for an improved std.xml are that it= needs=20 to have a range-based API, and it needs to be fast. I don't know how Ad= am's=20 stacks up against that. Tango's XML parser has pretty much set the bar = on=20 speed ( http://dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fas= t-at- parsing-xml/ ), and while we may not reach that (especially with the in= itial=20 implementation), we want a design which is going to be fast and have th= e=20 potential of reaching Tango-like speeds (whether that's currently possi= ble or=20 not with a range-based interface is probably highly dependent on dmd's = current=20 optimizition capabalities - inlining in particular). So, if Adam wants to work on getting his XML module into Phobos (which = I=20 question, since if he really wanted to, I would have expected him to do= it=20 already) or if someone else wants to work on it (and the license allows= it),=20 then it may be possible to get it into Phobos. But someone needs to put= all of=20 that time and effort in. - Jonathan M Davis
Feb 06 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-02-08 02:44, Jonathan M Davis wrote:
 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis

 wrote:
 Also, two of the major requirements for an improved std.xml are
 that it needs to have a range-based API, and it needs to be
 fast.

What does range based API mean in this context? I do offer a couple ranges over the tree, but it really isn't the main thing there. Check out Element.tree() for the main one. But, if you mean taking a range for input, no, doesn't do that. I've been thinking about rewriting the parse function (if you look at it, you'll probably hate it too!). But, what I have works and is tested on a variety of input, including garbage that was a pain to get working right, so I'm in no rush to change it.
 Tango's XML parser has pretty much set the bar on speed

Yeah, I'm pretty sure Tango whips me hard on speed. I spent some time in the profiler a month or two ago and got a significant speedup over the datasets I use (html files), but I'm sure there's a whole lot more that could be done. The biggest thing is I don't think you could use my parse function as a stream.

Ideally, std.xml would operate of ranges of dchar (but obviously be optimized for strings, since there are lots of optimizations that can be done with string processing - at least as far as unicode goes) and it would return a range of some kind. The result would probably be a document type of some kind which provided a range of its top level nodes (or maybe just the root node) which each then provided ranges over their sub-nodes, etc. At least, that's the kind of thing that I would expect. Other calls on the document and nodes may not be range-based at all (e.g. xpaths should probably be supported, and that doesn't necessarily involve ranges). The best way to handle it all would probably depend on the implementation. I haven't implemented a full-blown XML parser, so I don't know what the best way to go about it would be, but ideally, you'd be able to process the nodes as a range. - Jonathan M Davis

I think there should be a pull or sax parser at the lowest level and then a XML document module on top of that parser. -- /Jacob Carlborg
Feb 07 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis 
wrote:
 Also, two of the major requirements for an improved std.xml are 
 that it needs to have a range-based API, and it needs to be 
 fast.

What does range based API mean in this context? I do offer a couple ranges over the tree, but it really isn't the main thing there. Check out Element.tree() for the main one. But, if you mean taking a range for input, no, doesn't do that. I've been thinking about rewriting the parse function (if you look at it, you'll probably hate it too!). But, what I have works and is tested on a variety of input, including garbage that was a pain to get working right, so I'm in no rush to change it.
 Tango's XML parser has pretty much set the bar on speed

Yeah, I'm pretty sure Tango whips me hard on speed. I spent some time in the profiler a month or two ago and got a significant speedup over the datasets I use (html files), but I'm sure there's a whole lot more that could be done. The biggest thing is I don't think you could use my parse function as a stream.
Feb 06 2012
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Monday, 6 February 2012 at 23:15:50 UTC, Alvaro wrote:
 Its "dom" module has that sort of Javascript-like DOM 
 manipulation code

I'm biased (obviously), but I've never seen a more convenient way to work with xml. I like my convenience functions a lot.
 I thus suggest, if licensing allows (?

It is Boost licensed, so you can use it (I put the license at the bottom of my files). I don't know if it is phobos material, but if there's demand, maybe I can make that happen.
 A lot of that can be left out in XML.

Right, though you can ignore it too. I sometimes use it to work with other kinds of xml (web apis sending and receiving), RSS, and others. It works well enough for me.
Feb 06 2012
parent reply bls <bizprac orange.fr> writes:
You know it, web stuff documentation is a weak point. web stuff looks 
very interesting ...  so a real world sample app would be nice..

I would like to see a sample RIA -  M- V-C  wise, using (say using dojo 
dijit as View layer ) in conjunction with the D web stuff . (Model- 
Controler)

I think atm your library is not made for the masses, but it would be 
nevertheless interesting to see how someone can glue backend (web stuff) 
and frontend stuff (dojo/dijit) together.
thanks for reading.
Feb 07 2012
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-02-08 02:33, James Miller wrote:
 As somebody that frequently laments the lack of documentation in
 general (I use Silverstripe at work, in which the documentation is
 patchy at best) I work hard on my documentation.

 Adam's stuff is very good, I plan to take a look at it and "borrow"
 some code for a web framework I'm working on. But I am also
 open-sourcing modules that I am using as I go along. It would be cool
 if we ended up with a set of independent modules that worked well
 together but had few dependencies.

 I guess the point is whether Adam is ok with the community extending
 his work separately, since
 "misc-stuff-including-D-programming-language-web-stuff" isn't exactly
 a catchy name :P.

 It would be unfortunate if Adam's work wasn't built upon, and that
 work was needlessly replicated, then tightly integrated into some
 other project, rather than being something that everybody could use.

Maybe Adam's code can be used as a base of implementing a library like Rack in D. http://rack.rubyforge.org/ -- /Jacob Carlborg
Feb 07 2012
parent reply Jacob Carlborg <doob me.com> writes:
On 2012-02-08 15:51, Adam D. Ruppe wrote:
 On Wednesday, 8 February 2012 at 07:37:23 UTC, Jacob Carlborg wrote:
 Maybe Adam's code can be used as a base of implementing a library like
 Rack in D.

 http://rack.rubyforge.org/

That looks like it does the same job as cgi.d. cgi.d actually offers a uniform interface across various web servers and integration methods. If you always talk through the Cgi class, and use the GenericMain mixin, you can run the same program with: 1) cgi, tested on Apache and IIS (including implementations for methods that don't work on one or the other natively) 2) fast cgi (using the C library) 3) HTTP itself (something I expanded this last weekend and still want to make better) Sometimes I think I should rename it, to reflect this, but meh, misc-stuff-including blah blah shows how good I am at names!

It seems Rack supports additional interface next to CGI. But I think we could take this one step further. I'm not entirely sure what APIs Rack provides but in Rails they have a couple of method to uniform the environment variables. For example, ENV["REQUEST_URI"] returns differently on different servers. Rails provides a method, "request_uri" on the request object that will return the same value on all different servers. I don't know if CGI already has support for something similar. -- /Jacob Carlborg
Feb 09 2012
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-02-09 15:56, Adam D. Ruppe wrote:
 On Thursday, 9 February 2012 at 08:26:25 UTC, Jacob Carlborg wrote:
 For example, ENV["REQUEST_URI"] returns differently on different
 servers. Rails provides a method, "request_uri" on the request object
 that will return the same value on all different servers.

 I don't know if CGI already has support for something similar.

Yeah, in cgi.d, you use Cgi.requestUri, which is an immutable string, instead of using the environment variable directly. requestUri = getenv("REQUEST_URI"); // Because IIS doesn't pass requestUri, we simulate it here if it's empty. if(requestUri.length == 0) { // IIS sometimes includes the script name as part of the path info - we don't want that if(pathInfo.length >= scriptName.length && (pathInfo[0 .. scriptName.length] == scriptName)) pathInfo = pathInfo[scriptName.length .. $]; requestUri = scriptName ~ pathInfo ~ (queryString.length ? ("?" ~ queryString) : ""); // FIXME: this works for apache and iis... but what about others? That's in the cgi constructor. Somewhat ugly code, but I figure better to have ugly code in the library than incompatibilities in the user program! The http constructor creates these variables from the raw headers. Here's the ddoc: http://arsdnet.net/web.d/cgi.html If you search for "requestHeaders", you'll see all the stuff following. If you use those class members instead of direct environment variables, you'll get max compatibility.

Cool, you already thought of all of this it seems. -- /Jacob Carlborg
Feb 09 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/9/12 6:56 AM, Adam D. Ruppe wrote:
 Here's the ddoc:
 http://arsdnet.net/web.d/cgi.html

Cue the choir: "Please submit to Phobos". Andrei
Feb 09 2012
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-02-08 03:29, Adam D. Ruppe wrote:
 Here's more ddocs.

 http://arsdnet.net/web.d/web.html
 http://arsdnet.net/web.d/dom.html


 Not terribly useful, I'll admit. The Javascript
 discussion at the bottom of the first link might be
 good to read though.

 The dom.html there is mostly just (incomplete) method
 listing. I didn't write most of it up at all.

 When I started doing dom.d, I was going to be strictly
 a clone of the browser implementation, so some of the
 comments still say things like "extension", but I went
 my own direction a long time ago.

I think a much better API, than the one browsers provide, can be created for operating on the DOM, especially in D. -- /Jacob Carlborg
Feb 07 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-02-08 13:11, Jose Armando Garcia wrote:
 On Wed, Feb 8, 2012 at 5:41 AM, Jacob Carlborg<doob me.com>  wrote:
 On 2012-02-08 03:29, Adam D. Ruppe wrote:
 Here's more ddocs.

 http://arsdnet.net/web.d/web.html
 http://arsdnet.net/web.d/dom.html


 Not terribly useful, I'll admit. The Javascript
 discussion at the bottom of the first link might be
 good to read though.

 The dom.html there is mostly just (incomplete) method
 listing. I didn't write most of it up at all.

 When I started doing dom.d, I was going to be strictly
 a clone of the browser implementation, so some of the
 comments still say things like "extension", but I went
 my own direction a long time ago.

I think a much better API, than the one browsers provide, can be created for operating on the DOM, especially in D.

I know very little about html programming but dart did just that. It is my understanding that they abandon JS's DOM and created their own API: http://api.dartlang.org/html.html

It seems so. I haven't looked over the docs but it's good that someone is at least trying. -- /Jacob Carlborg
Feb 08 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Tuesday, 7 February 2012 at 19:27:46 UTC, bls wrote:
 You know it, web stuff documentation is a weak point.

Yeah, I know.
 looks very interesting ...  so a real world sample app would be 
 nice..

The closest i have is a little blog like thing that I started and haven't worked on since. http://arsdnet.net/blog/my-source
 I would like to see a sample RIA -  M- V-C  wise, using (say 
 using dojo dijit as View layer ) in conjunction with the D web 
 stuff . (Model- Controler)

I haven't used one of those toolkits, but plain javascript is easy to do like this. I'm writing a little browser game on the weekends. When I finish it, I might be able to show it to you. (I'll have to replace the copyrighted images in there right now though.) Basically you just call the server functions in your event handlers. Want to replace a div with content from the other side? D === import arsd.web; class Server : ApiProvider { string hello(string name) { return "hello " ~ name; } } mixin FancyMain!Server; === HTML/Javascript === <script src="app/functions.js"></script> <div id="holder"></div> <button onclick="Server.hello(prompt('Name?')).useToReplace('holder');">Click me</button> === And when you click that button, the string D returns will be put inside as text. That's the basic of it. I'm taking this to an extreme with this: http://arsdnet.net:8080/ (that's my custom D web server, so if it doesn't respond, I probably closed the program. Will leave it open for a while though.) Click on the link and the body. You should get some text added. Take a look at the javascript though: http://arsdnet.net:8080/sse.js document.body.addEventListener("click", function(event) { var response = Test.cancelableEvent( event.type, event.target.getAttribute("data-node-index") ).getSync(); [snip] It uses synchronous ajax on the click handler to forward the event to the server, which then passes it through the server side dom. D: === /* the library implements EventResponse cancelableEvent(...), which dispatches the event on the server, so you can write: */ // body is a keyword in D, hence mainBody instead. document.mainBody.addEventListener("click"), (Event ev) { ev.target.appendText("got click!"); ev.preventDefault(); }); === sync ajax requests suck because it blocks the whole web app waiting for a response. But writing server side event handlers is a fun toy. I do write real applications with web.d, but they are almost all proprietary, so little toys is all I really have to show right now.
Feb 07 2012
prev sibling next sibling parent James Miller <james aatch.net> writes:
 On Tuesday, 7 February 2012 at 19:27:46 UTC, bls wrote:
 You know it, web stuff documentation is a weak point.

Yeah, I know.
 looks very interesting ... =C2=A0so a real world sample app would be nic=


 The closest i have is a little blog like thing that I started
 and haven't worked on since.

 http://arsdnet.net/blog/my-source


 I would like to see a sample RIA - =C2=A0M- V-C =C2=A0wise, using (say u=


 dijit as View layer ) in conjunction with the D web stuff . (Model-
 Controler)

I haven't used one of those toolkits, but plain javascript is easy to do like this. I'm writing a little browser game on the weekends. When I finish it, I might be able to show it to you. (I'll have to replace the copyrighted images in there right now though.) Basically you just call the server functions in your event handlers. Want to replace a div with content from the other side? D =3D=3D=3D import arsd.web; class Server : ApiProvider { =C2=A0string hello(string name) { return "hello " ~ name; } } mixin FancyMain!Server; =3D=3D=3D HTML/Javascript =3D=3D=3D <script src=3D"app/functions.js"></script> <div id=3D"holder"></div> <button onclick=3D"Server.hello(prompt('Name?')).useToReplace('holder');">Click me</button> =3D=3D=3D And when you click that button, the string D returns will be put inside as text. That's the basic of it. I'm taking this to an extreme with this: http://arsdnet.net:8080/ (that's my custom D web server, so if it doesn't respond, I probably closed the program. Will leave it open for a while though.) Click on the link and the body. You should get some text added. Take a look at the javascript though: http://arsdnet.net:8080/sse.js =C2=A0 =C2=A0 =C2=A0 =C2=A0document.body.addEventListener("click", functi=

 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0var response =3D T=

 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=

 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0).getSync();
 =C2=A0[snip]



 It uses synchronous ajax on the click handler to forward the
 event to the server, which then passes it through the server
 side dom.


 D:
 =3D=3D=3D

 /* the library implements EventResponse cancelableEvent(...),
 which dispatches the event on the server, so you can write: */

 =C2=A0 =C2=A0 =C2=A0 =C2=A0// body is a keyword in D, hence mainBody inst=

 document.mainBody.addEventListener("click"), (Event ev) {
 =C2=A0 =C2=A0ev.target.appendText("got click!");
 =C2=A0 =C2=A0ev.preventDefault();
 });

 =3D=3D=3D


 sync ajax requests suck because it blocks the whole web app
 waiting for a response. But writing server side event handlers
 is a fun toy.








 I do write real applications with web.d, but they are almost
 all proprietary, so little toys is all I really have to show
 right now.

As somebody that frequently laments the lack of documentation in general (I use Silverstripe at work, in which the documentation is patchy at best) I work hard on my documentation. Adam's stuff is very good, I plan to take a look at it and "borrow" some code for a web framework I'm working on. But I am also open-sourcing modules that I am using as I go along. It would be cool if we ended up with a set of independent modules that worked well together but had few dependencies. I guess the point is whether Adam is ok with the community extending his work separately, since "misc-stuff-including-D-programming-language-web-stuff" isn't exactly a catchy name :P. It would be unfortunate if Adam's work wasn't built upon, and that work was needlessly replicated, then tightly integrated into some other project, rather than being something that everybody could use.
Feb 07 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
 
 wrote:
 Also, two of the major requirements for an improved std.xml are
 that it needs to have a range-based API, and it needs to be
 fast.

What does range based API mean in this context? I do offer a couple ranges over the tree, but it really isn't the main thing there. Check out Element.tree() for the main one. But, if you mean taking a range for input, no, doesn't do that. I've been thinking about rewriting the parse function (if you look at it, you'll probably hate it too!). But, what I have works and is tested on a variety of input, including garbage that was a pain to get working right, so I'm in no rush to change it.
 Tango's XML parser has pretty much set the bar on speed

Yeah, I'm pretty sure Tango whips me hard on speed. I spent some time in the profiler a month or two ago and got a significant speedup over the datasets I use (html files), but I'm sure there's a whole lot more that could be done. The biggest thing is I don't think you could use my parse function as a stream.

Ideally, std.xml would operate of ranges of dchar (but obviously be optimized for strings, since there are lots of optimizations that can be done with string processing - at least as far as unicode goes) and it would return a range of some kind. The result would probably be a document type of some kind which provided a range of its top level nodes (or maybe just the root node) which each then provided ranges over their sub-nodes, etc. At least, that's the kind of thing that I would expect. Other calls on the document and nodes may not be range-based at all (e.g. xpaths should probably be supported, and that doesn't necessarily involve ranges). The best way to handle it all would probably depend on the implementation. I haven't implemented a full-blown XML parser, so I don't know what the best way to go about it would be, but ideally, you'd be able to process the nodes as a range. - Jonathan M Davis
Feb 07 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 8 February 2012 at 01:33:51 UTC, James Miller wrote:
 As somebody that frequently laments the lack of documentation

I'm a bit of a hypocrite here; I'll complain until the cows come home about /other/ people's crappy documentation... then turn around and do a bad job at it myself. I tried to do cgi.d overview: http://arsdnet.net/web.d/cgi.d.html ddoc: http://arsdnet.net/web.d/cgi.html But I still have a lot to do on web.d documentation. The best source there is probably random newsgroup posts, (better than the comments in the code!) and I've changed things more than once too.
 Adam's stuff is very good

Thanks!
 It would be cool if we ended up with a set of independent
 modules that worked well together but had few dependencies.

Yeah, I think that would be great. If you want to modify my files, you can always fork it, but the best way is probably to do a pull request on the github. I try to reply to those quickly, and we can try to keep one copy there to avoid duplication of effort. Then, add on modules outside the scope of the code can be done by anyone. I'd put links to stuff in the documentation, so it is easy to find extended functionality.
Feb 07 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
Here's more ddocs.

http://arsdnet.net/web.d/web.html
http://arsdnet.net/web.d/dom.html


Not terribly useful, I'll admit. The Javascript
discussion at the bottom of the first link might be
good to read though.

The dom.html there is mostly just (incomplete) method
listing. I didn't write most of it up at all.

When I started doing dom.d, I was going to be strictly
a clone of the browser implementation, so some of the
comments still say things like "extension", but I went
my own direction a long time ago.

I don't know... I think most of the dom is self-explanatory
if you're already experienced with javascript anyway though.


BTW here's something I wrote up on aother site as an
example of how awesome D is:




examples:
[code="D ROX"]

// don't need a document to create elements
auto div = Element.make("div", "Hello <there>"); // convenience 
params do innerText
assert(div.innerText == "Hello <there>"); // text set
assert(div.innerHTML == "Hello &lt;there&gt;"); // html properly 
encoded mindlessly

// getting and setting attributes works w/ property syntax
// of course they are always properly encoded for charset and 
html stuffs
div.id = "whoa";
assert(div.id == "whoa");
div.customAtrribute = div.id ~ "works";
assert(div.customAttribute == "whoaworks");

// i also support the dataset sugar over attributes

div.dataset.myData = "cool";
assert(div.getAttribute("data-my-data") == "cool" == 
div.dataset.myData);

// as well as with the style..

div.style = "top: 10px;" // works with strings just like any 
other attribute

div.style.top = "15px"; // or with property syntax like in 
javascript

assert(div.style.top == "15px"); // reading works no matter how 
you set it
assert(div.style == "top: 15px;"); // or you can read it as a 
string


// i have convenience methods too

div.innerText = "sweet"; // worx, w/ proper encoding

// calls Element.make and appendChild in one go.
// can easily set text and/or a tag specific attribute
auto a = div.addChild("a", "Click me!", "link.html"); // tagName, 
value, value2, dependent on tag

a.addClass("cool").addClass("beans"); // convenience methods 
(chainable btw) for the class attribute
assert(a.hasClass("cool") && a.hasClass("beans") && 
!a.hasClass("coolbeans"));
assert(a.className == "cool beans");

// subclasses rock too, especially with automatic checked casts
Link link = div.requireSelector!Link("a"); // css selector syntax 
supported
// alternatively:
link = cast(Link) a; // but if a isn't actually a Link, this can 
be null

// easy working with the link url
a.setValue("cool", "param");
assert(a.href == "link.html?cool=param");

// arsd.html also includes functions to convert links into POST 
forms btw

// the Form subclass rox too

auto form = cast(Form) Element.make("form");
form.method = "POST";
// convenience functions from the browsers are here but who needs 
them when the dom rox?
form.innerHTML = "<select 
name=\"cool\"><option>Whoa</option><option>WTF</option></select>";

// similarly to the Link class, you can set values with ease
form.setValue("cool", "WTF"); // even works on non-text form 
elements
form.setValue("sweet", "whoa"); // also can implicitly create a 
hidden element to carry a value (can lead to mistakes but so 
useful)


// and the Table class is pretty sweet

auto table = cast(Table) Element.make("table");
table.caption = "Cool"; // sets the caption element, not an 
attribute

// variadic templates rok
table.appendRow("sweet", "whoa", 10, Element.make("a")); // adds 
to the <tbody> a <tr> with appropriate children
[/code]


some people luv jquery and think it is the best thing ever

those people have never used my dom library


Speaking of jquery, what about collections of elements?

Well D has this thing called foreach which does that. But, just
to prove I could, I wrote a couple dozen lines of D to do this:

==
document["p a"].addClass("mutated").appendText("all links in 
paragraphs get this text and that class!");
==



Operator overloading template craziness!

But i'm pretty meh on that since foreach is better anyway.



foreach rox

d rox
Feb 07 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Tue, 07 Feb 2012 20:44:08 -0500
schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:

 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
 
 wrote:
 Also, two of the major requirements for an improved std.xml are
 that it needs to have a range-based API, and it needs to be
 fast.

What does range based API mean in this context? I do offer a couple ranges over the tree, but it really isn't the main thing there. Check out Element.tree() for the main one. But, if you mean taking a range for input, no, doesn't do that. I've been thinking about rewriting the parse function (if you look at it, you'll probably hate it too!). But, what I have works and is tested on a variety of input, including garbage that was a pain to get working right, so I'm in no rush to change it.
 Tango's XML parser has pretty much set the bar on speed

Yeah, I'm pretty sure Tango whips me hard on speed. I spent some time in the profiler a month or two ago and got a significant speedup over the datasets I use (html files), but I'm sure there's a whole lot more that could be done. The biggest thing is I don't think you could use my parse function as a stream.

Ideally, std.xml would operate of ranges of dchar (but obviously be optimized for strings, since there are lots of optimizations that can be done with string processing - at least as far as unicode goes) and it would return a range of some kind. The result would probably be a document type of some kind which provided a range of its top level nodes (or maybe just the root node) which each then provided ranges over their sub-nodes, etc. At least, that's the kind of thing that I would expect. Other calls on the document and nodes may not be range-based at all (e.g. xpaths should probably be supported, and that doesn't necessarily involve ranges). The best way to handle it all would probably depend on the implementation. I haven't implemented a full-blown XML parser, so I don't know what the best way to go about it would be, but ideally, you'd be able to process the nodes as a range. - Jonathan M Davis

Using ranges of dchar directly can be horribly inefficient in some cases, you'll need at least some kind off buffered dchar range. Some std.json replacement code tried to use only dchar ranges and had to reassemble strings character by character using Appender. That sucks especially if you're only interested in a small part of the data and don't care about the rest. So for pull/sax parsers: Use buffering, return strings(better: w/d/char[]) as slices to that buffer. If the user needs to keep a string, he can still copy it. (String decoding should also be done on-demand only).
Feb 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, February 08, 2012 09:12:57 Johannes Pfau wrote:
 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

That's why you accept ranges of dchar but specialize the code for strings. Then you can use any dchar range with it that you want but can get the extra efficiency of using strings if you want to do that. - Jonathan M Davis
Feb 08 2012
prev sibling next sibling parent Jose Armando Garcia <jsancio gmail.com> writes:
On Wed, Feb 8, 2012 at 5:41 AM, Jacob Carlborg <doob me.com> wrote:
 On 2012-02-08 03:29, Adam D. Ruppe wrote:
 Here's more ddocs.

 http://arsdnet.net/web.d/web.html
 http://arsdnet.net/web.d/dom.html


 Not terribly useful, I'll admit. The Javascript
 discussion at the bottom of the first link might be
 good to read though.

 The dom.html there is mostly just (incomplete) method
 listing. I didn't write most of it up at all.

 When I started doing dom.d, I was going to be strictly
 a clone of the browser implementation, so some of the
 comments still say things like "extension", but I went
 my own direction a long time ago.

I think a much better API, than the one browsers provide, can be created for operating on the DOM, especially in D.

I know very little about html programming but dart did just that. It is my understanding that they abandon JS's DOM and created their own API: http://api.dartlang.org/html.html
 --
 /Jacob Carlborg

Feb 08 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 8 February 2012 at 07:37:23 UTC, Jacob Carlborg 
wrote:
 Maybe Adam's code can be used as a base of implementing a 
 library like Rack in D.

 http://rack.rubyforge.org/

That looks like it does the same job as cgi.d. cgi.d actually offers a uniform interface across various web servers and integration methods. If you always talk through the Cgi class, and use the GenericMain mixin, you can run the same program with: 1) cgi, tested on Apache and IIS (including implementations for methods that don't work on one or the other natively) 2) fast cgi (using the C library) 3) HTTP itself (something I expanded this last weekend and still want to make better) Sometimes I think I should rename it, to reflect this, but meh, misc-stuff-including blah blah shows how good I am at names!
Feb 08 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 8 February 2012 at 07:41:54 UTC, Jacob Carlborg 
wrote:
 I think a much better API, than the one browsers provide, can 
 be created for operating on the DOM, especially in D.

I'd say I've proven that! dom.d is very, very nice IMO.
Feb 08 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 8 February 2012 at 12:11:40 UTC, Jose Armando 
Garcia wrote:
 is my understanding that they abandon JS's DOM and created 
 their own API: http://api.dartlang.org/html.html

That actually looks very similar to what the browsers do now, which is a good thing to me - you don't have to learn new stuff to sit down and get started. But, looking through the Element interface: http://api.dartlang.org/html/Element.html you can see it is very familiar to the regular browser API. It offers some of the IE extensions (which rock btw; I implemented them too. outerHTML, innerText, etc.) It doesn't actually go as far as I do in expanding the api though.
Feb 08 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 8 February 2012 at 08:12:57 UTC, Johannes Pfau 
wrote:
 Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep 
 a string, he can still copy it. (String decoding should also be 
 done on-demand only).

The way Document.parse works now in my code is with slices. I think the best way to speed mine up is to untangle the mess of recursive nested functions. Last time I attacked dom.d with the profiler, I found a lot of time was spent on string decoding, which looked like this: foreach(c; str) { if(isEntity) value ~= decoded(value); else value ~= c; } basically. This reallocation was slow... but I got a huge speedup, not by skipping decoding, but by scanning it first: bool decode = false; foreach(c; str) { if(c == '&') { decode = true; break; } } if(!decode) return str; // still uses the old decoder, which is the fastest I could find; // ~= actually did better than appender in my tests! But, quickly scanning the string and skipping the decode loop if there are no entities about IIRC tripled the parse speed. Right now, if I comment the decode call out entirely, there's very little difference in speed on the data I've tried, so I think decoding like this works well.
Feb 08 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 08 Feb 2012 00:29:55 -0800
schrieb Jonathan M Davis <jmdavisProg gmx.com>:

 On Wednesday, February 08, 2012 09:12:57 Johannes Pfau wrote:
 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

That's why you accept ranges of dchar but specialize the code for strings. Then you can use any dchar range with it that you want but can get the extra efficiency of using strings if you want to do that. - Jonathan M Davis

But spezializing for strings is not enough, you could stream XML over the network and want to parse it on the fly (think of XMPP/Jabber). Or you could read huge xml files which you do not want to load completely into ram. Data is read into buffers anyway, so the parser should be able to deal with that. (although a buffer of w/d/chars could be considered to be a string, but then the parser would need to handle incomplete input)
Feb 08 2012
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau <nospam example.com> wrote:
 Am Tue, 07 Feb 2012 20:44:08 -0500
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:
 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis



 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings != UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?
Feb 08 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 08 Feb 2012 20:49:48 -0600
schrieb "Robert Jacques" <sandford jhu.edu>:

 On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau
 <nospam example.com> wrote:
 Am Tue, 07 Feb 2012 20:44:08 -0500
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:
 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis



 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings != UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?

Regarding wstrings and dstrings: We'll JSON seems to be UTF8 in almost all cases, so it's not that important. But i think it should be possible to use templates to implement identical parsers for d/w/strings Regarding the use of Appender: Long text ahead ;-) I think pull parsers should really be as fast a possible and low-level. For easy to use highlevel stuff there's always DOM and a safe, high-level serialization API should be implemented based on the PullParser as well. The serialization API would read only the requested data, skipping the rest: ---------------- struct Data { string link; } auto Data = unserialize!Data(json); ---------------- So in the PullParser we should avoid memory allocation whenever possible, I think we can even avoid it completely: I think dchar ranges are just the wrong input type for parsers, parsers should use buffered ranges or streams (which would be basically the same). We could use a generic BufferedRange with real dchar-ranges then. This BufferedRange could use a static buffer, so there's no need to allocate anything. The pull parser should return slices to the original string (if the input is a string) or slices to the Range/Stream's buffer. Of course, such a slice is only valid till the pull parser is called again. The slice also wouldn't be decoded yet. And a slice string could only be as long as the buffer, but I don't think this is an issue, a 512KB buffer can already store 524288 characters. If the user wants to keep a string, he should really do decodeJSONString(data).idup. There's a little more opportunity for optimization: As long as a decoded json string is always smaller than the encoded one(I don't know if it is), we could have a decodeJSONString function which overwrites the original buffer --> no memory allocation. If that's not the case, decodeJSONString has to allocate iff the decoded string is different. So we need a function which always returns the decoded string as a safe too keep copy and a function which returns the decoded string as a slice if the decoded string is the same as the original. An example: string json = { "link":"http://www.google.com", "useless_data":"lorem ipsum", "more":{ "not interested":"yes" } } now I'm only interested in the link. I should be possible to parse that with zero memory allocations: auto parser = Parser(json); parser.popFront(); while(!parser.empty) { if(parser.front.type == KEY && tempDecodeJSON(parser.front.value) == "link") { parser.popFront(); assert(!parser.empty && parser.front.type == VALUE); return decodeJSON(parser.front.value); //Should return a slice } //Skip everything else; parser.popFront(); } tempDecodeJSON returns a decoded string, which (usually) isn't safe to store(it can/should be a slice to the internal buffer, here it's a slice to the original string, so it could be stored, but there's no guarantee). In this case, the call to tempDecodeJSON could even be left out, as we only search for "link" wich doesn't need encoding.
Feb 09 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 08 Feb 2012 20:49:48 -0600
schrieb "Robert Jacques" <sandford jhu.edu>:
 
 Speaking as the one proposing said Json replacement, I'd like to
 point out that JSON strings != UTF strings: manual conversion is
 required some of the time. And I use appender as a dynamic buffer in
 exactly the manner you suggest. There's even an option to use a
 string cache to minimize total memory usage. (Hmm... that
 functionality should probably be re-factored out and made into its
 own utility) That said, I do end up doing a bunch of useless encodes
 and decodes, so I'm going to special case those away and add slicing
 support for strings. wstrings and dstring will still need to be
 converted as currently Json values only accept strings and therefore
 also Json tokens only support strings. As a potential user of the
 sax/pull interface would you prefer the extra clutter of special side
 channels for zero-copy wstrings and dstrings?

BTW: Do you know DYAML? https://github.com/kiith-sa/D-YAML I think it has a pretty nice DOM implementation which doesn't require any changes to phobos. As YAML is a superset of JSON, adapting it for std.json shouldn't be too hard. The code is boost licensed and well documented. I think std.json would have better chances of being merged into phobos if it didn't rely on changes to std.variant.
Feb 09 2012
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 09 Feb 2012 05:13:52 -0600, Johannes Pfau <nospam example.com> wrote:
 Am Wed, 08 Feb 2012 20:49:48 -0600
 schrieb "Robert Jacques" <sandford jhu.edu>:
 Speaking as the one proposing said Json replacement, I'd like to
 point out that JSON strings != UTF strings: manual conversion is
 required some of the time. And I use appender as a dynamic buffer in
 exactly the manner you suggest. There's even an option to use a
 string cache to minimize total memory usage. (Hmm... that
 functionality should probably be re-factored out and made into its
 own utility) That said, I do end up doing a bunch of useless encodes
 and decodes, so I'm going to special case those away and add slicing
 support for strings. wstrings and dstring will still need to be
 converted as currently Json values only accept strings and therefore
 also Json tokens only support strings. As a potential user of the
 sax/pull interface would you prefer the extra clutter of special side
 channels for zero-copy wstrings and dstrings?

BTW: Do you know DYAML? https://github.com/kiith-sa/D-YAML I think it has a pretty nice DOM implementation which doesn't require any changes to phobos. As YAML is a superset of JSON, adapting it for std.json shouldn't be too hard. The code is boost licensed and well documented. I think std.json would have better chances of being merged into phobos if it didn't rely on changes to std.variant.

I know about D-YAML, but haven't taken a deep look at it; it was developed long after I wrote my own JSON library. I did look into YAML before deciding to use JSON for my application; I just didn't need the extra features and implementing them would've taken extra dev time. As for reliance on changes to std.variant, this was a change *suggested* by Andrei. And while it is the slower route to go, I believe it is the correct software engineering choice; prior to the change I was implementing my own typed union (i.e. I poorly reinvented std.variant) Actually, most of my initial work on Variant was to make its API just as good as my home-rolled JSON type. Furthermore, a quick check of the YAML code-base seems to indicate that underneath the hood, Variant is being used. I'm actually a little curious about what prevented YAML from being expressed using std.variant directly and if those limitations can be removed. * The other thing slowing both std.variant and std.json down is my thesis writing :)
Feb 09 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 February 2012 at 08:26:25 UTC, Jacob Carlborg 
wrote:
 For example, ENV["REQUEST_URI"] returns differently on 
 different servers. Rails provides a method, "request_uri" on 
 the request object that will return the same value on all 
 different servers.

 I don't know if CGI already has support for something similar.

Yeah, in cgi.d, you use Cgi.requestUri, which is an immutable string, instead of using the environment variable directly. requestUri = getenv("REQUEST_URI"); // Because IIS doesn't pass requestUri, we simulate it here if it's empty. if(requestUri.length == 0) { // IIS sometimes includes the script name as part of the path info - we don't want that if(pathInfo.length >= scriptName.length && (pathInfo[0 .. scriptName.length] == scriptName)) pathInfo = pathInfo[scriptName.length .. $]; requestUri = scriptName ~ pathInfo ~ (queryString.length ? ("?" ~ queryString) : ""); // FIXME: this works for apache and iis... but what about others? That's in the cgi constructor. Somewhat ugly code, but I figure better to have ugly code in the library than incompatibilities in the user program! The http constructor creates these variables from the raw headers. Here's the ddoc: http://arsdnet.net/web.d/cgi.html If you search for "requestHeaders", you'll see all the stuff following. If you use those class members instead of direct environment variables, you'll get max compatibility.
Feb 09 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
For XML, template the parser on char type so transcoding is unnecessary. Sin=
ce JSON is UTF-8 I'd use char there, and at least for the event parser don't=
 proactively decode strings--let the user do this. In fact, don't proactivel=
y decode anything. Give me the option of getting a number via its string rep=
resentation directly from the input buffer. Roughly, JSON events should be:

Enter object
Object key
Int value (as string)
Float value (as string)
Null
True
False
Etc.=20

On Feb 8, 2012, at 6:49 PM, "Robert Jacques" <sandford jhu.edu> wrote:

 On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau <nospam example.com> wro=

 Am Tue, 07 Feb 2012 20:44:08 -0500
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:
 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis



=20
 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

Speaking as the one proposing said Json replacement, I'd like to point out=

he time. And I use appender as a dynamic buffer in exactly the manner you su= ggest. There's even an option to use a string cache to minimize total memory= usage. (Hmm... that functionality should probably be re-factored out and ma= de into its own utility) That said, I do end up doing a bunch of useless enc= odes and decodes, so I'm going to special case those away and add slicing su= pport for strings. wstrings and dstring will still need to be converted as c= urrently Json values only accept strings and therefore also Json tokens only= support strings. As a potential user of the sax/pull interface would you pr= efer the extra clutter of special side channels for zero-copy wstrings and d= strings?
Feb 09 2012
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
This. And decoded JSON strings are always smaller than encoded strings--JSON=
 uses escaping to encode non UTF-8 stuff, so in the case where someone sends=
 a surrogate pair (legal in JSON) it's encoded as \u0000\u0000. In short, it=
's absolutely possible to create a pull parser that never allocates, even fo=
r decoding. As proof, I've done it before. :-p

On Feb 9, 2012, at 3:07 AM, Johannes Pfau <nospam example.com> wrote:

 Am Wed, 08 Feb 2012 20:49:48 -0600
 schrieb "Robert Jacques" <sandford jhu.edu>:
=20
 On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau
 <nospam example.com> wrote:
 Am Tue, 07 Feb 2012 20:44:08 -0500
 schrieb "Jonathan M Davis" <jmdavisProg gmx.com>:
 On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
 On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis



=20
 Using ranges of dchar directly can be horribly inefficient in some
 cases, you'll need at least some kind off buffered dchar range. Some
 std.json replacement code tried to use only dchar ranges and had to
 reassemble strings character by character using Appender. That sucks
 especially if you're only interested in a small part of the data and
 don't care about the rest.
 So for pull/sax parsers: Use buffering, return strings(better:
 w/d/char[]) as slices to that buffer. If the user needs to keep a
 string, he can still copy it. (String decoding should also be done
 on-demand only).

Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings !=3D UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?

Regarding wstrings and dstrings: We'll JSON seems to be UTF8 in almost all cases, so it's not that important. But i think it should be possible to use templates to implement identical parsers for d/w/strings =20 Regarding the use of Appender: Long text ahead ;-) =20 I think pull parsers should really be as fast a possible and low-level. For easy to use highlevel stuff there's always DOM and a safe, high-level serialization API should be implemented based on the PullParser as well. The serialization API would read only the requested data, skipping the rest: ---------------- struct Data { string link; } auto Data =3D unserialize!Data(json); ---------------- =20 So in the PullParser we should avoid memory allocation whenever possible, I think we can even avoid it completely: =20 I think dchar ranges are just the wrong input type for parsers, parsers should use buffered ranges or streams (which would be basically the same). We could use a generic BufferedRange with real dchar-ranges then. This BufferedRange could use a static buffer, so there's no need to allocate anything. =20 The pull parser should return slices to the original string (if the input is a string) or slices to the Range/Stream's buffer. Of course, such a slice is only valid till the pull parser is called again. The slice also wouldn't be decoded yet. And a slice string could only be as long as the buffer, but I don't think this is an issue, a 512KB buffer can already store 524288 characters. =20 If the user wants to keep a string, he should really do decodeJSONString(data).idup. There's a little more opportunity for optimization: As long as a decoded json string is always smaller than the encoded one(I don't know if it is), we could have a decodeJSONString function which overwrites the original buffer --> no memory allocation. =20 If that's not the case, decodeJSONString has to allocate iff the decoded string is different. So we need a function which always returns the decoded string as a safe too keep copy and a function which returns the decoded string as a slice if the decoded string is the same as the original. =20 An example: string json =3D=20 { "link":"http://www.google.com", "useless_data":"lorem ipsum", "more":{ "not interested":"yes" } } =20 now I'm only interested in the link. I should be possible to parse that with zero memory allocations: =20 auto parser =3D Parser(json); parser.popFront(); while(!parser.empty) { if(parser.front.type =3D=3D KEY && tempDecodeJSON(parser.front.value) =3D=3D "link") { parser.popFront(); assert(!parser.empty && parser.front.type =3D=3D VALUE); return decodeJSON(parser.front.value); //Should return a slice } //Skip everything else; parser.popFront(); } =20 tempDecodeJSON returns a decoded string, which (usually) isn't safe to store(it can/should be a slice to the internal buffer, here it's a slice to the original string, so it could be stored, but there's no guarantee). In this case, the call to tempDecodeJSON could even be left out, as we only search for "link" wich doesn't need encoding.

Feb 09 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Thu, 09 Feb 2012 08:18:15 -0600
schrieb "Robert Jacques" <sandford jhu.edu>:

 On Thu, 09 Feb 2012 05:13:52 -0600, Johannes Pfau
 <nospam example.com> wrote:
 Am Wed, 08 Feb 2012 20:49:48 -0600
 schrieb "Robert Jacques" <sandford jhu.edu>:
 Speaking as the one proposing said Json replacement, I'd like to
 point out that JSON strings != UTF strings: manual conversion is
 required some of the time. And I use appender as a dynamic buffer
 in exactly the manner you suggest. There's even an option to use a
 string cache to minimize total memory usage. (Hmm... that
 functionality should probably be re-factored out and made into its
 own utility) That said, I do end up doing a bunch of useless
 encodes and decodes, so I'm going to special case those away and
 add slicing support for strings. wstrings and dstring will still
 need to be converted as currently Json values only accept strings
 and therefore also Json tokens only support strings. As a
 potential user of the sax/pull interface would you prefer the
 extra clutter of special side channels for zero-copy wstrings and
 dstrings?

BTW: Do you know DYAML? https://github.com/kiith-sa/D-YAML I think it has a pretty nice DOM implementation which doesn't require any changes to phobos. As YAML is a superset of JSON, adapting it for std.json shouldn't be too hard. The code is boost licensed and well documented. I think std.json would have better chances of being merged into phobos if it didn't rely on changes to std.variant.

I know about D-YAML, but haven't taken a deep look at it; it was developed long after I wrote my own JSON library.

I know, I didn't mean to criticize. I just thought DYAML could give some useful inspiration for the DOM api.
 I did look into
 YAML before deciding to use JSON for my application; I just didn't
 need the extra features and implementing them would've taken extra
 dev time.

Sure, I was only referring to DYAML cause the DOM is very similar. Just remove some features and it would suit JSON very well. One problem is that DYAML uses some older YAML version which isn't 100% compatible with JSON, so it can't be used as a JSON parser. There's also no way to tell it to generate only JSON compatible output (and AFAIK that's a design decision and not simply a missing feature)
 
 As for reliance on changes to std.variant, this was a change
 *suggested* by Andrei.

didn't like some of those changes.
 And while it is the slower route to go, I
 believe it is the correct software engineering choice; prior to the
 change I was implementing my own typed union (i.e. I poorly
 reinvented std.variant) Actually, most of my initial work on Variant
 was to make its API just as good as my home-rolled JSON type.
 Furthermore, a quick check of the YAML code-base seems to indicate
 that underneath the hood, Variant is being used. I'm actually a
 little curious about what prevented YAML from being expressed using
 std.variant directly and if those limitations can be removed.

I guess the custom Node type was only added to support additional methods(isScalar, isSequence, isMapping, add, remove, removeAt) and I'm not sure if those are supported on Variant (length, foreach, opIndex, opIndexAssign), but IIRC those are supported in your new std.variant.
 
 * The other thing slowing both std.variant and std.json down is my
 thesis writing :)

Feb 09 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 February 2012 at 17:36:01 UTC, Andrei Alexandrescu 
wrote:
 Cue the choir: "Please submit to Phobos".

Perhaps when I finish the URL struct in there. (It takes a url and breaks it down into parts you can edit, and can do rebasing. Currently, the handling of the Location: header is technically wrong - the http spec says it is supposed to be an absolute url, but I don't enforce that. Now, in cgi mode, it doesn't matter, since the web server fixes it up for us. But, in http mode... well, it still doesn't matter since the browsers can all figure it out, but I'd like to do the right thing anyway.). I might change the http constructor and/or add one that takes a std.socket socket cuz that would be cool. But I just don't want to submit it when I still might be making some big changes in the near future. BTW, I spent a little time reorganizing and documenting dom.d a bit more. http://arsdnet.net/web.d/dom.html Still not great docs, but if you come from javascript, I think it is pretty self-explanatory anyway.
Feb 09 2012
prev sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Tuesday, 7 February 2012 at 20:00:26 UTC, Adam D. Ruppe wrote:
 I'm taking this to an extreme with this:

 http://arsdnet.net:8080/

hehehe, I played with this a little bit more tonight. http://arsdnet.net/dcode/sse/ needs the bleeding edge dom.d from my github. https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff Here's the code, not very long. http://arsdnet.net/dcode/sse/test.d The best part is this: document.mainBody.addEventListener("click", (Element thislol, Event event) { event.target.style.color = "red"; event.target.appendText(" clicked! "); event.preventDefault(); }); A html onclick handler written in D! Now, like I said before, probably not usable for real work. What this does is for each user session, it creates a server side DOM object. Using observers on the DOM, it listens for changes and forwards them to Javascript. You use the D api to change your document, and it sends them down. I've only implemented a couple mutation events, but they go a long way - appendChild and setAttribute - as they are the building blocks for many of the functions. On the client side, the javascript listens for events and forwards them to D. To sync the elements on both sides, I added a special feature to dom.d to put an attribute there that is usable on both sides. The Makefile in there shows the -version needed to enable it. Since it is a server side document btw, you can refresh the browser and keep the same document. It could quite plausible gracefully degrade! But, yeah, lots of fun. D rox.
Feb 09 2012