digitalmars.D - Crash my webserver!

Andrea Fontana (10/10) May 13 2023 Hi everyone, as I had already announced in the discord channel, I

Vladimir Panteleev (12/13) May 13 2023 Not bad. What I found in 10 minutes:

Andrea Fontana (13/27) May 13 2023 I've seen your tests! Thank you Vladimir!

Andrea Fontana (10/14) May 13 2023 From RFC:
Vladimir Panteleev (7/14) May 13 2023 I get a 400 with 1.0 too.

Andrea Fontana (8/24) May 14 2023 Hmm I don't think you can use utf-8 encoding in your request. I
Andrea Fontana (8/24) May 14 2023 Hmm I don't think you can use utf-8 encoding in your request. I

Vladimir Panteleev (9/17) May 14 2023 Well, bytes are bytes until you decide to look at them in a
Vladimir Panteleev (6/8) May 14 2023 Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless

Andrea Fontana (13/21) May 14 2023 I'm doing some validations on data because that data is parsed

Vladimir Panteleev (16/18) May 14 2023 This doesn't throw for me:

Andrea Fontana (3/21) May 14 2023 You mean %ff not \xff!

Johan (5/14) May 13 2023 Have you already fuzzed your server code?
psyscout (12/14) May 14 2023 Hi Andrea,

Andrea Fontana (7/22) May 14 2023 No: workers are not separated threads, but isolated processes.

Andrea Fontana <nospam example.org> writes:

Hi everyone, as I had already announced in the discord channel, I 
was wondering if any of you would like to try and do some tests 
on my http server (serverino). I don't mean a stress test/ddos, 
of course. I'm interested in request parsing errors or any bug 
that can crash the server (5xx error).

Source: https://github.com/trikko/serverino/
Online using nginx as proxy: http://test.andreafontana.it (also 
https)
Online into the wild listening on port 57123.

Andrea

May 13 2023

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Online into the wild listening on port 57123.

Not bad. What I found in 10 minutes:

- LF line endings are not accepted
- Host header is mandatory, but not for nginx
- Raw UTF-8 gets mangled in URL and POST parameters, you might be 
decoding those twice
- `multipart/form-data` encoding errors are silently discarded
- The server seems to handle `application/x-www-form-urlencoded` 
very differently from `multipart/form-data`? Even though they're 
both alternative options for HTML `<form>` parameters, and one is 
somewhat of a superset of the other

Hope this helps.

May 13 2023

Andrea Fontana <nospam example.org> writes:

On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Online into the wild listening on port 57123.

 Not bad. What I found in 10 minutes:

I've seen your tests! Thank you Vladimir!

 - LF line endings are not accepted

Do you mean as line separator in headers? I know some (old?) 
clients use it but I think HTTP protocol requires CRLF

 - Host header is mandatory, but not for nginx

Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?

 - Raw UTF-8 gets mangled in URL and POST parameters, you might 
 be decoding those twice

Interesting, could you please give me an example?

 - `multipart/form-data` encoding errors are silently discarded

It is (and a warning is shown on server error log). Probably 
you're right and I should send back a 400 Bad Request. Or 
something else?

 - The server seems to handle 
 `application/x-www-form-urlencoded` very differently from 
 `multipart/form-data`? Even though they're both alternative 
 options for HTML `<form>` parameters, and one is somewhat of a 
 superset of the other

Yes, somewhat. But I can't really build a superset, that's why 
they are managed in two different ways.

 Hope this helps.

Sure! Thanks!

May 13 2023

Andrea Fontana <nospam example.org> writes:

On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 On Saturday, 13 May 2023 at 11:21:53 UTC, Vladimir Panteleev
 - LF line endings are not accepted

 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF

 From RFC:

«Although the line terminator for the start-line and fields is 
the sequence CRLF, a recipient MAY recognize a single LF as a 
line terminator and ignore any preceding CR»

Of course MAY means it is optional (rfc2119). I don't think I'm 
going to implement a special case for this, it is rarely used by 
old clients in 2023 :)

Good point, anyway.

Andrea

May 13 2023

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF

Ah, OK. I thought the specification allowed either.

 - Host header is mandatory, but not for nginx

 Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?

I get a 400 with 1.0 too.

 - Raw UTF-8 gets mangled in URL and POST parameters, you might 
 be decoding those twice

 Interesting, could you please give me an example?

     printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: 
test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 57123

It returns mojibake. However, only for URL and form parameters.

Normally these get percent-encoded by user-agents though.

May 13 2023

Andrea Fontana <nospam example.org> writes:

On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF

 Ah, OK. I thought the specification allowed either.

 - Host header is mandatory, but not for nginx

 Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?

 I get a 400 with 1.0 too.

 - Raw UTF-8 gets mangled in URL and POST parameters, you 
 might be decoding those twice

 Interesting, could you please give me an example?

     printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: 
 test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 
 57123

 It returns mojibake. However, only for URL and form parameters.

 Normally these get percent-encoded by user-agents though.

Hmm I don't think you can use utf-8 encoding in your request. I 
think everything must be encoded as old US-ASCII.

How can I understand in advance what encoding you're using, 
otherwise? You could use utf-8 or big5 but I couldn't tell, or am 
I missing something?

Andrea

May 14 2023

Andrea Fontana <nospam example.com> writes:

On Saturday, 13 May 2023 at 22:25:28 UTC, Vladimir Panteleev 
wrote:
 On Saturday, 13 May 2023 at 11:32:39 UTC, Andrea Fontana wrote:
 Do you mean as line separator in headers? I know some (old?) 
 clients use it but I think HTTP protocol requires CRLF

 Ah, OK. I thought the specification allowed either.

 - Host header is mandatory, but not for nginx

 Only for HTTP/1.1. It's not mandatory for HTTP/1.0, is it?

 I get a 400 with 1.0 too.

 - Raw UTF-8 gets mangled in URL and POST parameters, you 
 might be decoding those twice

 Interesting, could you please give me an example?

     printf 'GET /?ппп=ĂÎȘȚ HTTP/1.0\r\nHost: 
 test.andreafontana.it\r\n\r\n' | nc -v test.andreafontana.it 
 57123

 It returns mojibake. However, only for URL and form parameters.

 Normally these get percent-encoded by user-agents though.

Hmm I don't think you can use utf-8 encoding in your request. I 
think everything must be encoded as old US-ASCII.

How can I understand in advance what encoding you're using, 
otherwise? You could use utf-8 or big5 but I couldn't tell, or am 
I missing something?

Andrea

May 14 2023

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 It returns mojibake. However, only for URL and form parameters.

 Normally these get percent-encoded by user-agents though.

 Hmm I don't think you can use utf-8 encoding in your request. I 
 think everything must be encoded as old US-ASCII.

 How can I understand in advance what encoding you're using, 
 otherwise? You could use utf-8 or big5 but I couldn't tell, or 
 am I missing something?

Well, bytes are bytes until you decide to look at them in a 
certain way. Yea, the input may be invalid as per the spec; 
however, if mojibake indicates that you're decoding them twice, 
you're probably doing something that's at least unnecessarily 
inefficient.

Maybe you're passing the bytes as char arrays to std.algorithm, 
which produces dchars, which are then being cast into char before 
decoding again? I think that would produce this sort of mojibake.

May 14 2023

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 Hmm I don't think you can use utf-8 encoding in your request. I 
 think everything must be encoded as old US-ASCII.

Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless 
you're decoding UTF for the purpose of validating that further 
logic doesn't have to deal with bad UTF-8, that also indicates a 
potential inefficiency. Web servers don't need to do any UTF-8 
decoding, but it's very easy to do it accidentally in D.

May 14 2023

Andrea Fontana <nospam example.org> writes:

On Sunday, 14 May 2023 at 11:32:46 UTC, Vladimir Panteleev wrote:
 On Sunday, 14 May 2023 at 10:56:29 UTC, Andrea Fontana wrote:
 Hmm I don't think you can use utf-8 encoding in your request. 
 I think everything must be encoded as old US-ASCII.

 Oh also, I noticed that bad UTF-8 in URLs is rejected. Unless 
 you're decoding UTF for the purpose of validating that further 
 logic doesn't have to deal with bad UTF-8, that also indicates 
 a potential inefficiency. Web servers don't need to do any 
 UTF-8 decoding, but it's very easy to do it accidentally in D.

I'm doing some validations on data because that data is parsed 
and stored for serverino's users :)

The UTF problem is actually a catched UTFException thrown by 
urlencode/decode of std library.

And I'm trying to keep it a bit safe for user, let's say. I don't 
think any browser will send an invalid utf sequence as url, it 
sounds like you're trying to make some attack and I give you back 
a 400 bad request error.

It's not the only check I'm doing anyway.

I'm trying to understand what's wrong with mojibake, still not 
sure it is a bug :)

Andrea

May 14 2023

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:
 The UTF problem is actually a catched UTFException thrown by 
 urlencode/decode of std library.

This doesn't throw for me:

```d
void main()
{
	import std.uri;
	decode("\xFF");
	encode("\xFF");
}
```

But... looking at the implementation, it does have a baked-in 
UTF-8 decoder, which is a little ridiculous. `decode` actually 
decodes percent-encoded UTF-8, and then encodes it back, but 
makes no attempt to validate the non-encoded parts of the string. 
The module is pretty old though, so maybe it predates the 
facilities in `std.utf`.

May 14 2023

Andrea Fontana <nospam example.org> writes:

On Sunday, 14 May 2023 at 14:57:07 UTC, Vladimir Panteleev wrote:
 On Sunday, 14 May 2023 at 13:53:49 UTC, Andrea Fontana wrote:
 The UTF problem is actually a catched UTFException thrown by 
 urlencode/decode of std library.

 This doesn't throw for me:

 ```d
 void main()
 {
 	import std.uri;
 	decode("\xFF");
 	encode("\xFF");
 }
 ```

 But... looking at the implementation, it does have a baked-in 
 UTF-8 decoder, which is a little ridiculous. `decode` actually 
 decodes percent-encoded UTF-8, and then encodes it back, but 
 makes no attempt to validate the non-encoded parts of the 
 string. The module is pretty old though, so maybe it predates 
 the facilities in `std.utf`.

You mean %ff not \xff!

Andrea

May 14 2023

Johan <j j.nl> writes:

On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord channel, 
 I was wondering if any of you would like to try and do some 
 tests on my http server (serverino). I don't mean a stress 
 test/ddos, of course. I'm interested in request parsing errors 
 or any bug that can crash the server (5xx error).

 Source: https://github.com/trikko/serverino/
 Online using nginx as proxy: http://test.andreafontana.it (also 
 https)
 Online into the wild listening on port 57123.

Have you already fuzzed your server code?

https://johanengelen.github.io/ldc/2018/01/14/Fuzzing-with-LDC.html

cheers,
   Johan

May 13 2023

psyscout <oracle.gm gmail.com> writes:

On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord 
 channel...

Hi Andrea,

this question may be not completely related, but hopefully you 
can answer. I can see a worker concept and each worker is a 
completely separate application and doesn't share context with 
other workers.

For example I have a __gshared state with some data which is 
being updated by separate thread. So I need all workers (threads) 
be able to access that state without recreating it multiple times.

Is it possible to achieve it without introducing a separate cache 
or database, just inside single app and Serverino serving data 
requests through multiple threads?

May 14 2023

Andrea Fontana <nospam example.org> writes:

On Sunday, 14 May 2023 at 15:19:24 UTC, psyscout wrote:
 On Saturday, 13 May 2023 at 09:03:22 UTC, Andrea Fontana wrote:
 Hi everyone, as I had already announced in the discord 
 channel...

 Hi Andrea,

 this question may be not completely related, but hopefully you 
 can answer. I can see a worker concept and each worker is a 
 completely separate application and doesn't share context with 
 other workers.

 For example I have a __gshared state with some data which is 
 being updated by separate thread. So I need all workers 
 (threads) be able to access that state without recreating it 
 multiple times.

 Is it possible to achieve it without introducing a separate 
 cache or database, just inside single app and Serverino serving 
 data requests through multiple threads?

No: workers are not separated threads, but isolated processes.
You should consider that workers' count is dynamic; they can be 
created and killed if required.

You can still use some ipc (sockets, pipes) etc but probably a db 
it's easier to manage.

Andrea

May 14 2023

D Programming

C/C++ Programming

Other

digitalmars.D - Crash my webserver!