www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Httparsed - fast native dlang HTTP 1.x message header parser

reply tchaloupka <chalucha gmail.com> writes:
Hi,
I was missing some commonly usable HTTP parser on code.dlang.org 
and after some research and work I've published httparsed[1].

It's inspired by picohttpparser[2] which is great, but instead of 
a binding, I wanted something native to D. Go has it's own 
parsers, Rust has it's own parsers, why not D?

I think we're missing other small libraries like this on the 
code.dlang.org to be commonly used in larger ones like it's so 
common in other languages - while improving the ecosystem. Vibe-d 
is just huuuge.

It is nothrow,  nogc and can work with betterC. It just parses 
the message header and calls provided callbacks with slices to 
the original buffer to be handled as needed by the caller.

Same as picohttpparser it uses SSE4.2 `_mm_cmpestri` instruction 
to speedup the invalid characters lookup (when built with ldc2 
and target that supports it).

It has pretty thorough test suite.
Can parse incomplete message headers.
Can continue parsing from the last completely parsed line.
Doesn't enforce method or protocol version on itself to be usable 
with other internet message like protocols as is for example RTSP.

Performance wise it's pretty on par with picohttpparser [3]. 
Without SSE4.2 it's a bit faster, with SSE4.2 it's a bit slower 
and I can't figure out why :/.
But overall, I'm pretty happy with the outcome.

I've tried to check and compare with two popular libraries and:

* vibe-d - performs nearly the same as http_parser[4] (but that 
itself is pretty slow and now obsolete), but as it looks, doesn't 
do much in regard of RFC conformance - some test's from [2] won't 
pass for sure

* arsd's cgi.d - I haven't expected it to be so much slower than 
vibe-d parser, it's almost 3 times slower, but on the other hand 
it's super simple idiomatic D (again doesn't check or allow what 
RFC says it should and many tests will fail)
   * I guess the main problem would be `idup` on every line and 
autodecode
   * Stripped down minimalistic version of the original [5] is 
here [6]

[1] https://code.dlang.org/packages/httparsed
[2] https://github.com/h2o/picohttpparser
[3] https://i.imgur.com/iRCDGVo.png
[4] https://github.com/nodejs/http-parser
[5] 
https://github.com/adamdruppe/arsd/blob/402ea062b81197410b05df7f75c299e5e3eef0d8/cgi.d#L1737
[6] 
https://github.com/tchaloupka/httparsed/blob/230ba9a4a280ba91267a22e97137be12269b5574/bench/bench.d#L194
Dec 14 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 14 December 2020 at 21:59:02 UTC, tchaloupka wrote:
 * arsd's cgi.d - I haven't expected it to be so much slower 
 than vibe-d parser, it's almost 3 times slower, but on the 
 other hand it's super simple idiomatic D (again doesn't check 
 or allow what RFC says it should and many tests will fail)
yeah, I think I actually wrote that about eight years ago and then never revisited it.... actually git blame says "committed on Mar 24, 2012" so almost nine! And indeed, that git blame shows the bulk of it is still the initial commit, though a few `toLower`s got changed to `asLowerCase` a few years ago... so it used to be even worse! lol But wanna see something that will make you cry? https://github.com/adamdruppe/arsd/blob/master/http2.d#L1232 I have another http header parser!!! That's for my client, and as you can see, it is... not great. The case-insensitivity for example is a mega hack and I actually need to fix that eventually. At least there's some support for line continuations there. I don't remember if I ever actually tested that though, it seems most clients and servers don't do that anyway.
Dec 14 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 15, 2020 at 12:11:44AM +0000, Adam D. Ruppe via
Digitalmars-d-announce wrote:
 On Monday, 14 December 2020 at 21:59:02 UTC, tchaloupka wrote:
 * arsd's cgi.d - I haven't expected it to be so much slower than
 vibe-d parser, it's almost 3 times slower, but on the other hand
 it's super simple idiomatic D (again doesn't check or allow what RFC
 says it should and many tests will fail)
yeah, I think I actually wrote that about eight years ago and then never revisited it.... actually git blame says "committed on Mar 24, 2012" so almost nine! And indeed, that git blame shows the bulk of it is still the initial commit, though a few `toLower`s got changed to `asLowerCase` a few years ago... so it used to be even worse! lol
Slow or not, cgi.d is totally awesome in my book, because recently it saved my life. While helping out someone, I threw together a little D script to do what he wanted; only, I run Linux and he runs a Mac, and my script is CLI-only while he's a non-poweruser and has no idea what to do at the command prompt. So naturally my thought was, let's give this a web interface so that there's a fighting chance non-programmers would know how to use it. Being a program I wrote in literally 4 hours (possibly less), I wasn't going to let it turn into a monster full of hundreds of 3rd party dependencies, so I reached for my trusty solution: arsd's cgi.d. Just a single file, no network dependencies, no complicated builds, just drop the file into my code, import it, and off I go. Better yet, it came with a built-in CLI request tester: perfect for local testing without the hassle of needing to start/stop an entire web service just to run a quick test; plus a compile-time switch to adapt it to any common webserver interface you like: CGI, FastCGI, even standalone HTTP server. Problem solved in a couple o' hours, as opposed to who knows how long it would have taken to engineer a "real" solution with vibe.d or one of the other heavyweight "frameworks" out there. It may not be the fastest web module in the D world, but it's certainly danged convenient, does the necessary job with a minimum of fuss, easily adaptable to a variety of common use cases, and best of all, requires basically no dependencies beyond just dropping the file into your code. For that alone, I think Adam deserves a salute. (But of course, if Adam improves cgi.d to be competitive with vibe.d, then it could totally rock the D world! ;-)) T -- Written on the window of a clothing store: No shirt, no shoes, no service.
Dec 14 2020
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 15 December 2020 at 00:32:42 UTC, H. S. Teoh wrote:
 It may not be the fastest web module in the D world
It actually does quite well, see: https://github.com/tchaloupka/httpbench (from the same OP here :) ) The header parser is nothing special, but since header parsing is a small part of the overall problem, it is good enough. Though I have been tempted to optimize it a bit more since in a hello world benchmark even a small thing like header parsing can be noticeable. The fact that it does some totally unnecessary GC allocations can perhaps add up too. (If I was doing all this again from scratch I'd actually be tempted to do a zero-copy, all lazy version. Read from the socket directly into the request-local buffer, then slice into it while parsing, then do decoding on-demand in that same buffer - url encoding always takes more space than the decoded version - and the result should be basically the fastest thing you can get. And if something comes in above typical size, then it can go back to the normal reallocated buffer and still win big on the average request. The problem with doing that now would be maintaining compatibility with my existing API.)
 (But of course, if Adam improves cgi.d to be competitive with 
 vibe.d
My biggest deficit compared to vibe is prolly documentation. Especially of my advanced features which are practically hidden.
Dec 14 2020
prev sibling parent reply tchaloupka <chalucha gmail.com> writes:
On Tuesday, 15 December 2020 at 00:32:42 UTC, H. S. Teoh wrote:
 For that alone, I think Adam deserves a salute.

 (But of course, if Adam improves cgi.d to be competitive with 
 vibe.d,
 then it could totally rock the D world! ;-))
 T
Yes absolutely, arsd has a bit different usecase and target audience, no one should expect it to beat top 10 of highly optimized frameworks in techempower benchmark ;-) But if these benchmarks helps Adam to make some incremental improvements it's a plus and many of that can be pretty low hanging fruit. If I take one number of arsd from the httpbench - 27469 RPS It means 36.4us per request. In http parser test it is about 2.4us per request, while httparsed is about 0.1us per request. That means that with a performant parser, arsd could go up to around 27548 RPS -> not much of a difference that would be worth the hassle..
Dec 15 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 15 December 2020 at 10:04:42 UTC, tchaloupka wrote:
 But if these benchmarks helps Adam to make some incremental 
 improvements it's a plus and many of that can be pretty low 
 hanging fruit.
Yeah, I think the biggest benefit to changing this around is to just avoid creating unnecessary garbage. On the individual item, it doesn't really matter, but it can build up to a totally wasted collection cycle as time goes on. Just on the other hand, in any non-trivial real world application there's likely to be some garbage generated anyway and this will disappear into the noise. Though in the hello world benches it could bring the "max" column down since I'm p sure that is caused by a GC cycle and hello world can potentially avoid having even one :P
 That means that with a performant parser, arsd could go up to 
 around 27548 RPS -> not much of a difference that would be 
 worth the hassle..
Yeah, that one is basically entirely the result of the thread work queue. If everything else was perfect, the thread stuff would still dominate. (My evidence for this is the hybrid and process dispatchers doing pretty consistently better. The thread one though is simple and cross-platform which is nice - like without it, that Mac version probably wouldn't have worked at all since I've written no mac-specific code in this module.)
Dec 15 2020
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2020-12-14 22:59, tchaloupka wrote:
 Hi,
 I was missing some commonly usable HTTP parser on code.dlang.org and 
 after some research and work I've published httparsed[1].
This is awesome. I wanted to use picohttpparser myself and used the C version. But if you already have created a HTTP parser with the same properties in D, that's even better. -- /Jacob Carlborg
Dec 15 2020