digitalmars.D.learn - Class for fetching a web page and parse into DOM
- breezes (2/2) Dec 15 2011 Is there a class that can fetch a web page from the internet? And is std...
- Adam D. Ruppe (33/36) Dec 15 2011 You might want to use my dom.d
- Nick Sabalausky (4/38) Dec 15 2011 Yup, I can confirm Adam's tools are great for this. At the moment, std.x...
Is there a class that can fetch a web page from the internet? And is std.xml the right module for parsing it into a DOM tree?
Dec 15 2011
On Thursday, 15 December 2011 at 09:55:22 UTC, breezes wrote:Is there a class that can fetch a web page from the internet? And is std.xml the right module for parsing it into a DOM tree?You might want to use my dom.d https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff Grab dom.d, characterencodings.d, and curl.d. Here's an example program: ==== import arsd.dom; import arsd.curl; import std.stdio; void main() { auto document = new Document(); document.parseGarbage(curl("http://digitalmars.com/")); writeln(document.querySelector("p")); } ===== Compile like this: dmd yourfile.d dom.d characterencodings.d curl.d You'll need the curl C library from an outside source. If you're on Linux, it is probably already installed. If you're on Windows, check the Internet. // this downloads a file from the web and returns a string curl(site url); // this builds a DOM tree out of html. It's called parseGarbage because // it tries to figure out really bad html - so it works on a lot of web // sites. document.parseGarbage(string); // My dom.d includes a lot of functions you might know from // javascript like getElementById, getElementsByTagName, and the // get element by CSS selector functions document.querySelector("p") // get the first paragraph And then, finally, the writeln puts out the html of an element.
Dec 15 2011
"Adam D. Ruppe" <destructionator gmail.com> wrote in message news:nlccexskkftzaapfdnti dfeed.kimsufi.thecybershadow.net...On Thursday, 15 December 2011 at 09:55:22 UTC, breezes wrote:Yup, I can confirm Adam's tools are great for this. At the moment, std.xml is known to have problems and is currently undergoing a rewrite.Is there a class that can fetch a web page from the internet? And is std.xml the right module for parsing it into a DOM tree?You might want to use my dom.d https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff Grab dom.d, characterencodings.d, and curl.d. Here's an example program: ==== import arsd.dom; import arsd.curl; import std.stdio; void main() { auto document = new Document(); document.parseGarbage(curl("http://digitalmars.com/")); writeln(document.querySelector("p")); } ===== Compile like this: dmd yourfile.d dom.d characterencodings.d curl.d You'll need the curl C library from an outside source. If you're on Linux, it is probably already installed. If you're on Windows, check the Internet. // this downloads a file from the web and returns a string curl(site url); // this builds a DOM tree out of html. It's called parseGarbage because // it tries to figure out really bad html - so it works on a lot of web // sites. document.parseGarbage(string); // My dom.d includes a lot of functions you might know from // javascript like getElementById, getElementsByTagName, and the // get element by CSS selector functions document.querySelector("p") // get the first paragraph And then, finally, the writeln puts out the html of an element.
Dec 15 2011