digitalmars.D - XML Benchmarks in D
- Scott Sanders <scott stonecobra.com> Mar 12 2008
- Sean Kelly <sean invisibleduck.org> Mar 12 2008
- Walter Bright <newshound1 digitalmars.com> Mar 12 2008
- N/A <NA NA.na> Mar 12 2008
- Sean Kelly <sean invisibleduck.org> Mar 12 2008
- N/A <NA Na.na> Mar 12 2008
- Scott Sanders <scott stonecobra.com> Mar 12 2008
- N/A <NA NA.com> Mar 13 2008
- Scott Sanders <scott stonecobra.com> Mar 13 2008
- BCS <BCS pathlink.com> Mar 13 2008
- Kris <foo bar.com> Mar 13 2008
- BCS <ao pathlink.com> Mar 13 2008
- "Kris" <foo bar.com> Mar 13 2008
- Alexander Panek <alexander.panek brainsware.org> Mar 14 2008
- "Koroskin Denis" <2korden+dmd gmail.com> Mar 14 2008
- Alexander Panek <alexander.panek brainsware.org> Mar 14 2008
- Robert Fraser <fraserofthenight gmail.com> Mar 14 2008
- "Jarrett Billingsley" <kb3ctd2 yahoo.com> Mar 14 2008
- Bruno Medeiros <brunodomedeiros+spam com.gmail> Mar 23 2008
- Christopher Wright <dhasenan gmail.com> Mar 23 2008
- Sean Kelly <sean invisibleduck.org> Mar 14 2008
- BCS <ao pathlink.com> Mar 14 2008
I have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from the graphs, D is rocking with Tango! http://dotnot.org/blog/index.php I wanted to post to let the D community know that good language and library design can really make an impact. As always, I am open to comments/changes/additions, etc. I will be happy to run any other project code through the benchmark if someone submits a patch to me containing the code. And Walter, I am trying to use "D Programming Language" everywhere I can :) Cheers, Scott Sanders
Mar 12 2008
Scott Sanders wrote:I have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from the graphs, D is rocking with Tango! http://dotnot.org/blog/index.php
Reddit link: http://reddit.com/r/programming/info/6bt6n/comments/
Mar 12 2008
== Quote from Scott Sanders (scott stonecobra.com)'s articleI have done some benchmarks of the D xml parsers alongside C/C++/Java parsers, and as you can see from the
http://dotnot.org/blog/index.php I wanted to post to let the D community know that good language and library design can really make an
As always, I am open to comments/changes/additions, etc. I will be happy to run any other project code
The charts look great. I generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input? N/A
Mar 12 2008
== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?
I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
Mar 12 2008
== Quote from Sean Kelly (sean invisibleduck.org)'s article== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs
input is a char[], do you also plan on adding file streams as
I believe the suggested approach in this case is to access the
place some restrictions on file size in 32-bit applications, but
Sean
Any examples on how to approach this using Tango? Cheers, N/A
Mar 12 2008
N/A Wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s article== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs
input is a char[], do you also plan on adding file streams as
I believe the suggested approach in this case is to access the
place some restrictions on file size in 32-bit applications, but
Sean
Any examples on how to approach this using Tango? Cheers, N/A
auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.
Mar 12 2008
Should be able to: auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.
Thanks, I was wondering on how to do it using the PullParser. Cheers
Mar 13 2008
N/A Wrote:Should be able to: auto fc = new FileConduit ("test.txt"); auto buf = new MappedBuffer(fc); auto doc = new Document!(char); doc.parse(buf.getContent()); That should do it.
Thanks, I was wondering on how to do it using the PullParser.
Scott
Mar 13 2008
Sean Kelly wrote:== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?
I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
Mar 13 2008
BCS Wrote:Sean Kelly wrote:== Quote from N/A (NA NA.na)'s articleI generally handle files that are a few hundred MB to a few gigs and I noticed that the input is a char[], do you also plan on adding file streams as input?
I believe the suggested approach in this case is to access the input as a memory mapped file. This does place some restrictions on file size in 32-bit applications, but then those are ideally in decline. Sean
what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
It would be interesting, but isn't that kinda what memory-mapped files provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Mar 13 2008
Reply to kris,BCS Wrote:what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)
Mar 13 2008
Reply to BCS: "BCS" <ao pathlink.com> wrote in message news:55391cb32a6178ca5358fd65a320 news.digitalmars.com...Reply to kris,BCS Wrote:what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB)
Doh. You're right, of course. Thank goodness for 64bit machines :)
Mar 13 2008
BCS wrote:Reply to kris,BCS Wrote:what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)
I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
Mar 14 2008
On Fri, 14 Mar 2008 11:40:20 +0300, Alexander Panek <alexander.panek brainsware.org> wrote:BCS wrote:Reply to kris,BCS Wrote:what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)
I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
Mar 14 2008
Koroskin Denis wrote:On Fri, 14 Mar 2008 11:40:20 +0300, Alexander Panek <alexander.panek brainsware.org> wrote:I've got this strange feeling in my stomach that shouts out "WTF?!" when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
That does, indeed, sound strange. :X
Mar 14 2008
Koroskin Denis wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
It's a shame the "O RLY?" owl died out years ago...
Mar 14 2008
"Robert Fraser" <fraserofthenight gmail.com> wrote in message news:freg27$1m7l$1 digitalmars.com...Koroskin Denis wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
It's a shame the "O RLY?" owl died out years ago...
O RLY? Good internet memes never die, they just go into hibernation ;)
Mar 14 2008
Robert Fraser wrote:Koroskin Denis wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
It's a shame the "O RLY?" owl died out years ago...
SRSLY? :P -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Mar 23 2008
Bruno Medeiros wrote:Robert Fraser wrote:Koroskin Denis wrote:It sounds strange, but even large companies like Google or Yahoo store their temporary search indexes in ULTRA large XML files, and many of them can easily be tens or even hundreds of GBs in size (just ordinary daily index) before they get "repacked" into compacter format.
It's a shame the "O RLY?" owl died out years ago...
SRSLY? :P
You know Sir Sly?
Mar 23 2008
== Quote from Alexander Panek (alexander.panek brainsware.org)'s articleBCS wrote:Reply to kris,BCS Wrote:what might be interesting is to make a version that works with slices of the file rather than ram. (make the current version into a template specialized on char[] and the new one on some new type?) That way only the parsed meta data needs to stay in ram. It would take a lot of games mapping stuff in and out of ram but it would be interesting to see if it could be done.
provides for? You can operate with files up to 4GB in size (on a 32bit system), even with DOM, where the slices are virtual addresses within paged file-blocks. Effectively, each paged segment of the file is a lower-level slice?
Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)
I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
It's quite possible that an XML stream could be used as the transport mechanism for the result of a database query. In such an instance, I wouldn't be at all surprised if a response were more than 3-4GB. In fact, I've designed such a system and the proper query would definitely have produced such a dataset. Sean
Mar 14 2008
Reply to Alexander,BCS wrote:Not as I understand it (I looked this up about a year ago so I'm a bit rusty). on 32bits, you can't map in 4GB because you need space for the programs code (and on windows you only get 3GB of address space as the OS gets that last GB) Also what about a 10GB file? My idea is to make some sort of lib that lest you handle larges data sets (64bit?) You would ask for a file to be "mapped in" and then you would get an object that syntactically looks like an array. Indexes ops would actually map in pieces, slices would generate new objects (with ref to the parent) that would, on demand, map stuff in. Some sort of GCish thing would start un mapping/moving stings when space gets tight. If you never have to actual convert the data to a "real" array you don't ever need to copy the stuff, you can just leave it in the file. I'm not sure it's even possible or how it would work, but it would be cool. (and highly useful)
when I read about >3-4GB XML files. I know, it's about the "if" and "whens", but still; if you find yourself needing such a beast of an XML file, you might possibly think of other forms of data structuring (a database, perhaps?).
Truth be told, I'm not that far from agreeing with you (on seeing that I'd think: "WTF?!?!.... Um... OoooK.... well..."). I can't think of a justification for the lib I described if the only thing it would be used for would be a XML parser. It might be used for managing parts of something like... a database table. <G>
Mar 14 2008









Sean Kelly <sean invisibleduck.org> 