www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Crash my webserver!

reply Andrea Fontana <nospam example.org> writes:
Hi everyone, as I had already announced in the discord channel, I 
was wondering if any of you would like to try and do some tests 
on my http server (serverino). I don't mean a stress test/ddos, 
of course. I'm interested in request parsing errors or any bug 
that can crash the server (5xx error).

Source: https://github.com/trikko/serverino/
Online using nginx as proxy: http://test.andreafontana.it (also 
https)
Online into the wild listening on port 57123.

Andrea
May 13 2023
next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Online into the wild listening on port 57123.
Not bad. What I found in 10 minutes: - LF line endings are not accepted - Host header is mandatory, but not for nginx - Raw UTF-8 gets mangled in URL and POST parameters, you might be decoding those twice - `multipart/form-data` encoding errors are silently discarded - The server seems to handle `application/x-www-form-urlencoded` very differently from `multipart/form-data`? Even though they're both alternative options for HTML `<form>` parameters, and one is somewhat of a superset of the other Hope this helps.
May 13 2023
parent reply Andrea Fontana <nospam example.org> writes:
On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Online into the wild listening on port 57123.
Not bad. What I found in 10 minutes:
I've seen your tests! Thank you Vladimir!
 - LF line endings are not accepted
Do you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLF
 - Host header is mandatory, but not for nginx
Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?
 - Raw UTF-8 gets mangled in URL and POST parameters, you might 
 be decoding those twice
Interesting, could you please give me an example?
 - `multipart/form-data` encoding errors are silently discarded
It is (and a warning is shown on server error log). Probably you're right and I should send back a 400 Bad Request. Or something else?
 - The server seems to handle 
 `application/x-www-form-urlencoded` very differently from 
 `multipart/form-data`? Even though they're both alternative 
 options for HTML `<form>` parameters, and one is somewhat of a 
 superset of the other
Yes, somewhat. But I can't really build a superset, that's why they are managed in two different ways.
 Hope this helps.
Sure! Thanks!
May 13 2023
next sibling parent Andrea Fontana <nospam example.org> writes:
On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir Panteleev
 - LF line endings are not accepted
Do you mean as line separator in headers? I know some (old?) clients use it but I think HTTP protocol requires CRLF
From RFC: «Although the line terminator for the start-line and fields is the sequence CRLF, a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR» Of course MAY means it is optional (rfc2119). I don't think I'm going to implement a special case for this, it is rarely used by old clients in 2023 :) Good point, anyway. Andrea
May 13 2023
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF
Ah, OK. I thought the specification allowed either.
 - Host header is mandatory, but not for nginx
Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?
I get a 400 with 1.0 too.
 - Raw UTF-8 gets mangled in URL and POST parameters, you might 
 be decoding those twice
Interesting, could you please give me an example?
printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.
May 13 2023
next sibling parent Andrea Fontana <nospam example.org> writes:
On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF
Ah, OK. I thought the specification allowed either.
 - Host header is mandatory, but not for nginx
Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?
I get a 400 with 1.0 too.
 - Raw UTF-8 gets mangled in URL and POST parameters, you 
 might be decoding those twice
Interesting, could you please give me an example?
printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.
Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something? Andrea
May 14 2023
prev sibling parent reply Andrea Fontana <nospam example.com> writes:
On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF
Ah, OK. I thought the specification allowed either.
 - Host header is mandatory, but not for nginx
Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?
I get a 400 with 1.0 too.
 - Raw UTF-8 gets mangled in URL and POST parameters, you 
 might be decoding those twice
Interesting, could you please give me an example?
printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123 It returns mojibake. However, only for URL and form parameters. Normally these get percent-encoded by user-agents though.
Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something? Andrea
May 14 2023
next sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 It returns mojibake. However, only for URL and form parameters.

 Normally these get percent-encoded by user-agents though.
Hmm I don't think you can use utf-8 encoding in your request. I think everything must be encoded as old US-ASCII. How can I understand in advance what encoding you're using, otherwise? You could use utf-8 or big5 but I couldn't tell, or am I missing something?
Well, bytes are bytes until you decide to look at them in a certain way. Yea, the input may be invalid as per the spec; however, if mojibake indicates that you're decoding them twice, you're probably doing something that's at least unnecessarily inefficient. Maybe you're passing the bytes as char arrays to std.algorithm, which produces dchars, which are then being cast into char before decoding again? I think that would produce this sort of mojibake.
May 14 2023
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 Hmm I don't think you can use utf-8 encoding in your request. I 
 think everything must be encoded as old US-ASCII.
Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless you're decoding UTF for the purpose of validating that further logic doesn't have to deal with bad UTF-8, that also indicates a potential inefficiency. Web servers don't need to do any UTF-8 decoding, but it's very easy to do it accidentally in D.
May 14 2023
parent reply Andrea Fontana <nospam example.org> writes:
On Sunday, 14 May 2023 at 11:32:46 UTC, Vladimir Panteleev wrote:
 On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 Hmm I don't think you can use utf-8 encoding in your request. 
 I think everything must be encoded as old US-ASCII.
Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless you're decoding UTF for the purpose of validating that further logic doesn't have to deal with bad UTF-8, that also indicates a potential inefficiency. Web servers don't need to do any UTF-8 decoding, but it's very easy to do it accidentally in D.
I'm doing some validations on data because that data is parsed and stored for serverino's users :) The UTF problem is actually a catched UTFException thrown by urlencode/decode of std library. And I'm trying to keep it a bit safe for user, let's say. I don't think any browser will send an invalid utf sequence as url, it sounds like you're trying to make some attack and I give you back a 400 bad request error. It's not the only check I'm doing anyway. I'm trying to understand what's wrong with mojibake, still not sure it is a bug :) Andrea
May 14 2023
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:
 The UTF problem is actually a catched UTFException thrown by 
 urlencode/decode of std library.
This doesn't throw for me: ```d void main() { import std.uri; decode("\xFF"); encode("\xFF"); } ``` But... looking at the implementation, it does have a baked-in UTF-8 decoder, which is a little ridiculous. `decode` actually decodes percent-encoded UTF-8, and then encodes it back, but makes no attempt to validate the non-encoded parts of the string. The module is pretty old though, so maybe it predates the facilities in `std.utf`.
May 14 2023
parent Andrea Fontana <nospam example.org> writes:
On Sunday, 14 May 2023 at 14:57:07 UTC, Vladimir Panteleev wrote:
 On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:
 The UTF problem is actually a catched UTFException thrown by 
 urlencode/decode of std library.
This doesn't throw for me: ```d void main() { import std.uri; decode("\xFF"); encode("\xFF"); } ``` But... looking at the implementation, it does have a baked-in UTF-8 decoder, which is a little ridiculous. `decode` actually decodes percent-encoded UTF-8, and then encodes it back, but makes no attempt to validate the non-encoded parts of the string. The module is pretty old though, so maybe it predates the facilities in `std.utf`.
You mean %ff not \xff! Andrea
May 14 2023
prev sibling next sibling parent Johan <j j.nl> writes:
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord channel, 
 I was wondering if any of you would like to try and do some 
 tests on my http server (serverino). I don't mean a stress 
 test/ddos, of course. I'm interested in request parsing errors 
 or any bug that can crash the server (5xx error).

 Source: https://github.com/trikko/serverino/
 Online using nginx as proxy: http://test.andreafontana.it (also 
 https)
 Online into the wild listening on port 57123.
Have you already fuzzed your server code? https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html cheers, Johan
May 13 2023
prev sibling parent reply psyscout <oracle.gm gmail.com> writes:
On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord 
 channel...
Hi Andrea, this question may be not completely related, but hopefully you can answer. I can see a worker concept and each worker is a completely separate application and doesn't share context with other workers. For example I have a __gshared state with some data which is being updated by separate thread. So I need all workers (threads) be able to access that state without recreating it multiple times. Is it possible to achieve it without introducing a separate cache or database, just inside single app and Serverino serving data requests through multiple threads?
May 14 2023
parent Andrea Fontana <nospam example.org> writes:
On Sunday, 14 May 2023 at 15:19:24 UTC, psyscout wrote:
 On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord 
 channel...
Hi Andrea, this question may be not completely related, but hopefully you can answer. I can see a worker concept and each worker is a completely separate application and doesn't share context with other workers. For example I have a __gshared state with some data which is being updated by separate thread. So I need all workers (threads) be able to access that state without recreating it multiple times. Is it possible to achieve it without introducing a separate cache or database, just inside single app and Serverino serving data requests through multiple threads?
No: workers are not separated threads, but isolated processes. You should consider that workers' count is dynamic; they can be created and killed if required. You can still use some ipc (sockets, pipes) etc but probably a db it's easier to manage. Andrea
May 14 2023