www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.file functions and embedded NUL characters [CWE-158]

reply kdevel <kdevel vogtner.de> writes:
Some python code:



    def myfun (filename):
       open (filename, 'w')

    myfun ("a\0c")

which when executed behaves in an exemplary manner:

    Traceback (most recent call last):
      File "./test.py", line 6, in <module>
        myfun ("a\0c")
      File "./test.py", line 4, in myfun
        open (filename, 'w')
    TypeError: file() argument 1 must be encoded string without 
null
    bytes, not str

Other languages like D, perl or even Ada seem to let the embedded 
NUL
character silently truncate the filename. This poses a 
considerable
risk when the input to std.file functions is not controlled by the
program author [1]. E.g.

    rmdirRecurse ("/\0/home/user/subdir");

[1] https://cwe.mitre.org/data/definitions/158.html
Jul 31
next sibling parent reply Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:
 Some python code:



    def myfun (filename):
       open (filename, 'w')

    myfun ("a\0c")

 which when executed behaves in an exemplary manner:

    Traceback (most recent call last):
      File "./test.py", line 6, in <module>
        myfun ("a\0c")
      File "./test.py", line 4, in myfun
        open (filename, 'w')
    TypeError: file() argument 1 must be encoded string without 
 null
    bytes, not str

 Other languages like D, perl or even Ada seem to let the 
 embedded NUL
 character silently truncate the filename. This poses a 
 considerable
 risk when the input to std.file functions is not controlled by 
 the
 program author [1]. E.g.

    rmdirRecurse ("/\0/home/user/subdir");

 [1] https://cwe.mitre.org/data/definitions/158.html
Do you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it. Try this code on run.dlang.io and report back with your apology: import std.stdio; void main() { writeln("Hello\0World\0!"); }
Jul 31
next sibling parent reply Alex Bryan <abryancs gmail.com> writes:
On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:
 On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:
 Some python code:



    def myfun (filename):
       open (filename, 'w')

    myfun ("a\0c")

 which when executed behaves in an exemplary manner:

    Traceback (most recent call last):
      File "./test.py", line 6, in <module>
        myfun ("a\0c")
      File "./test.py", line 4, in myfun
        open (filename, 'w')
    TypeError: file() argument 1 must be encoded string without 
 null
    bytes, not str

 Other languages like D, perl or even Ada seem to let the 
 embedded NUL
 character silently truncate the filename. This poses a 
 considerable
 risk when the input to std.file functions is not controlled by 
 the
 program author [1]. E.g.

    rmdirRecurse ("/\0/home/user/subdir");

 [1] https://cwe.mitre.org/data/definitions/158.html
Do you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it. Try this code on run.dlang.io and report back with your apology: import std.stdio; void main() { writeln("Hello\0World\0!"); }
D does nothing with the NULL terminator, but I think the argument here is that for many file operations, the filename string eventually gets converted to a C string before it is passed to an API (C stdlib? syscall?) where the null terminator is significant. It is best NOT to be rude and nasty, but if you are, you should at least understand the technical argument
Jul 31
parent Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 21:35:50 UTC, Alex Bryan wrote:
 It is best NOT to be rude and nasty, but if you are, you should 
 at least understand the technical argument
I understand the entire argument of OP. OP is not arguing anything. They are fishing. They know nothing about D. If you don't find such individuals problematic, then perhaps I should start polluting your repos with imaginary problems. I'm bound to guess one right. The solution to lazy, no research, generic, python supremacy horseshit is to reply in a way that scares them away. Like the telemarketer at the door, you slam in it in their face with a big "fuck off my lawn". If they were genuine, they wouldn't have done what they did, especially not in the way they did it. They were fishing and they hooked on a fish alright...
Jul 31
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Jul 31, 2025 at 09:25:44PM +0000, Doigt via Digitalmars-d wrote:
 On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:
[...]
 Other languages like D, perl or even Ada seem to let the embedded
 NUL character silently truncate the filename. This poses a
 considerable risk when the input to std.file functions is not
 controlled by the program author [1]. E.g.
 
    rmdirRecurse ("/\0/home/user/subdir");
 
 [1] https://cwe.mitre.org/data/definitions/158.html
Do you actually have a source or an example program that D actually does anything with the null terminator or are you talking out of your ass? D doesn't care about the null terminator and doesn't do anything with it.
This tone is uncalled for. OP is specifically pointing the issue with passing NUL-containing strings to underlying OS calls. For example: ``` void main() { auto fp = File("/tmp/x\0reallynow", "w"); // What's the filename of the created file? } ``` T -- Last night, I dreamed about my pet rabbits all lined up like in a parade, and hopping backwards. Then I woke up and realized that it was my receding hare line!
Jul 31
parent reply Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:
 This tone is uncalled for.  OP is specifically pointing the 
 issue with passing NUL-containing strings to underlying OS 
 calls.  T
No, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
Jul 31
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 01/08/2025 11:14 AM, Doigt wrote:
 On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:
 This tone is uncalled for.  OP is specifically pointing the issue with 
 passing NUL-containing strings to underlying OS calls.  T
No, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
This is a valid security problem with std.stdio and std.file. They do not have to produce an example showing that bad behavior exists on our end, this is documented as being possible on the system API's which we wrap without validation. https://github.com/dlang/phobos/ blob/2d8ae67b396f5cc5f4633695c1c5ac67d2bf448e/std/stdio.d#L432
Jul 31
parent Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 23:20:28 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 01/08/2025 11:14 AM, Doigt wrote:
 On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:
 This tone is uncalled for.  OP is specifically pointing the 
 issue with passing NUL-containing strings to underlying OS 
 calls.  T
No, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
This is a valid security problem with std.stdio and std.file. They do not have to produce an example showing that bad behavior exists on our end, this is documented as being possible on the system API's which we wrap without validation. https://github.com/dlang/phobos/ blob/2d8ae67b396f5cc5f4633695c1c5ac67d2bf448e/std/stdio.d#L432
you provided the proof for them and you are correct in pointing it out that it exists on our end. But if their guess had been wrong, it would have all wasted our time and energy. This kind of fishing without proof shouldn't be acceptable
Jul 31
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 31 July 2025 at 23:14:46 UTC, Doigt wrote:
 On Thursday, 31 July 2025 at 21:34:36 UTC, H. S. Teoh wrote:
 This tone is uncalled for.  OP is specifically pointing the 
 issue with passing NUL-containing strings to underlying OS 
 calls.  T
No, OP isn't doing that. OP is fishing and it's obvious: 1. No problematic example that is specific to D 2. Generic source that concerns multiple languages but doesn't cite D 3. Immediately pulled out an "examplary" python program and touting the superiority of that language. Any reasonable person should therefore conclude that it's a generalizing statement that was made without prior knowledge about D and that the OP didn't make any substantial testing to prove the problem. They hoped instead that it would be true. Therefore, you are a fish and you just took the bait.
I bet op is a enterprise-grade security researcher; I suggest light mocking when they claim rust is the future or safe/immutable is very important for game dev
Jul 31
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:
 [...]
 Do you actually have a source or an example program that D 
 actually does anything with the null terminator
I did not argue that D "does" something "with the null terminator". D like perl or Ada ignores the potential occurrence of NUL in the string data when calling the underlying C functions for file system operations. import std; void main () { File ("abc", "w"); std.file.rename ("abc", "ab\0c"); } Instead of throwing as the python code does this D program leaves a file named "ab" in the filesystem.
Jul 31
parent Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 21:55:03 UTC, kdevel wrote:
 On Thursday, 31 July 2025 at 21:25:44 UTC, Doigt wrote:
 [...]
 Do you actually have a source or an example program that D 
 actually does anything with the null terminator
I did not argue that D "does" something "with the null terminator". D like perl or Ada
and this is my entire problem with your entire argument. You are generalizing based on preconceived notions. You are guessing. You aren't actually informed about D. You don't really care what D does and how we program in D. You immediately tout python's superiority in our faces. You impose a solution just because python does it. What about trying out D for real? Have you programmed in D even once? Do you know what is the stance of the community on exceptions? Do you know how we usually (do or don't) handle the null terminator? No, you don't. You're fishing and you have two big ones on your line, but that doesn't mean I'll bite. Like barging into someone's home suddenly without ever meeting them before and demanding they do things for you even though you don't even care that they do it.
Jul 31
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
This is quite a good example of why for PhobosV3 I want us to go through 
a FilePath abstraction, rather than accepting random strings for file names.

This is indeed a security vulnerability, but it isn't on D's side.
All system API's take in a null terminated string, when it should've 
been pointer + length.

If someone has a problem with this currently, you can call 
``isValidPath`` in ``std.path``, which will check for the null character.

https://dlang.org/phobos/std_path.html#isValidPath
Jul 31
next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 This is quite a good example of why for PhobosV3 I want us to 
 go through a FilePath abstraction, rather than accepting random 
 strings for file names.
whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char b) replace null with something c) (bad idea but youll love it) assert when detecting extra null
Jul 31
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Jul 31, 2025 at 10:27:56PM +0000, monkyyy via Digitalmars-d wrote:
 On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew Cattermole
 wrote:
 This is quite a good example of why for PhobosV3 I want us to go
 through a FilePath abstraction, rather than accepting random strings
 for file names.
whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char
[...] This is precisely what it currently does, and where the problem is. Consider: you have user input specifying a filename, and your code checks to make sure it doesn't overwrite a file it's not supposed to. As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all. Another example: maybe you have a database management program and your user specifies a filename to save the database, and you keep an AA of previously-used names which you check, so that the user doesn't stomp over another user's database file. Now consider what happens if user A inputs "database\01.db" and user B inputs "database\02.db". The OS considers the NUL as the end of the filename, so user A's file is actually "database", and so is user B's file, so user B can now overwrite user A's data. These are just some mild cases. I'm pretty sure if you think hard enough, you'll be able to come up with something that will open a security hole by exploiting the fact that what the D code thinks is the filename is different from what the OS thinks is the filename. T -- May you live all the days of your life. -- Jonathan Swift
Jul 31
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 31 July 2025 at 23:27:42 UTC, H. S. Teoh wrote:
 On Thu, Jul 31, 2025 at 10:27:56PM +0000, monkyyy via 
 Digitalmars-d wrote:
 On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) 
 Andrew Cattermole
 wrote:
 This is quite a good example of why for PhobosV3 I want us 
 to go through a FilePath abstraction, rather than accepting 
 random strings for file names.
whats wrong with just changing toStringz? I dont understand the threat profile imagined by the infinitely wise cve org; but you could make toStringz: a) shorten it to the first null char
[...] This is precisely what it currently does, and where the problem is. Consider: you have user input specifying a filename, and your code checks to make sure it doesn't overwrite a file it's not supposed to. As a contrived example, say you prohibit "/etc/passwd" as a filename. Now what happens when the user inputs "/etc/passwd\0ha_you_missed_me" as filename? The OS considers the NUL as the end of the filename, so your user gets access to "/etc/passwd" after all. Another example: maybe you have a database management program and your user specifies a filename to save the database, and you keep an AA of previously-used names which you check, so that the user doesn't stomp over another user's database file. Now consider what happens if user A inputs "database\01.db" and user B inputs "database\02.db". The OS considers the NUL as the end of the filename, so user A's file is actually "database", and so is user B's file, so user B can now overwrite user A's data. These are just some mild cases. I'm pretty sure if you think hard enough, you'll be able to come up with something that will open a security hole by exploiting the fact that what the D code thinks is the filename is different from what the OS thinks is the filename. T
~~I think these are extremely contrived, passwd is the os's responsibly and I expect all oses have it behind root permissions; also backups~~ swapping toStringz to option 2 then? I think adr's FilePath may come from stealing code from rikki, and in thoery what rikki is suggesting is I call `File(FilePath("foo"))` I didnt like figuring it out; I wont actually store a filepath, I will write string then do whatever the error messages and instinct says will make the code I wrote compile If `"passwd\0hi".toStringz == &"passwd__NULL__DONT_DO_THAThi"[0]` it wont match `passwd`; whats the `file(filepath)` abstraction gaining over option 2 or 3
Jul 31
prev sibling parent Kagamin <spam here.lot> writes:
On Thursday, 31 July 2025 at 23:27:42 UTC, H. S. Teoh wrote:
 As a contrived example, say you prohibit "/etc/passwd" as a 
 filename. Now what happens when the user inputs 
 "/etc/passwd\0ha_you_missed_me" as filename?  The OS considers 
 the NUL as the end of the filename, so your user gets access to 
 "/etc/passwd" after all.
If you need path validation, you probably will do more checks than null check, stdio won't cut it. See how path validation vulnerabilities work IRL: https://github.com/dagster-io/dagster/pull/30002
Aug 04
prev sibling next sibling parent reply kdevel <kdevel vogtner.de> writes:
On Thursday, 31 July 2025 at 21:47:24 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 This is indeed a security vulnerability, but it isn't on D's 
 side.
 All system API's take in a null terminated string, when it 
 should've been pointer + length.
The cause of the problem is the silent truncation when the conversion from D-string to C-const char * takes place. This is not a problem of the API which is in the case of POSIX is totally simple (compared to all the VMS/Windows APIs). There must be a reason why C++ uses const char* in its filesystem functions [1].
 If someone has a problem with this currently, you can call 
 ``isValidPath`` in ``std.path``, which will check for the null 
 character.
Unfortunately isValidPath also flags perfectly legal filenames as invalid, e.g. such containing invalid UTF-8 sequences. [1] https://en.cppreference.com/w/cpp/header/iostream.html
Jul 31
parent kdevel <kdevel vogtner.de> writes:
On Thursday, 31 July 2025 at 22:38:41 UTC, kdevel wrote:
 Unfortunately isValidPath also flags perfectly legal filenames
 as invalid, e.g. such containing invalid UTF-8 sequences.
I stand corrected. This is probably not true: ivp.d: ``` import std; void main (string [] args) { auto input = "a\xfc"; writeln (input, ": ", isValidPath (input)); } ``` $ ./ivp [latin9 mode console] aü: true
Jul 31
prev sibling parent =?UTF-8?Q?S=C3=B6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 31.07.25 um 23:47 schrieb Richard (Rikki) Andrew Cattermole:
 This is quite a good example of why for PhobosV3 I want us to go through 
 a FilePath abstraction, rather than accepting random strings for file 
 names.
 
 This is indeed a security vulnerability, but it isn't on D's side.
 All system API's take in a null terminated string, when it should've 
 been pointer + length.
 
 If someone has a problem with this currently, you can call 
 ``isValidPath`` in ``std.path``, which will check for the null character.
 
 https://dlang.org/phobos/std_path.html#isValidPath
I strongly suggest to also at least distinguish between Windows and Posix path formats to avoid ambiguity issues. vibe.core.path defines `WindowsPath`, `PosixPath` and `InetPath`, as well as an OS dependent alias `NativePath` (aliases to either `WindowsPath` or `PosixPath`) to avoid having to think about this all the time. However, conversions between path formats are always explicit, with the appropriate validation taking place.
Aug 04
prev sibling next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:
 Some python code:



    def myfun (filename):
       open (filename, 'w')

    myfun ("a\0c")

 which when executed behaves in an exemplary manner:

    Traceback (most recent call last):
      File "./test.py", line 6, in <module>
        myfun ("a\0c")
      File "./test.py", line 4, in myfun
        open (filename, 'w')
    TypeError: file() argument 1 must be encoded string without 
 null
    bytes, not str

 Other languages like D, perl or even Ada seem to let the 
 embedded NUL
 character silently truncate the filename. This poses a 
 considerable
 risk when the input to std.file functions is not controlled by 
 the
 program author [1]. E.g.

    rmdirRecurse ("/\0/home/user/subdir");

 [1] https://cwe.mitre.org/data/definitions/158.html
https://github.com/dlang/phobos/blob/205256abb1f86faf986f8c789cb733ca4137246e/std/string.d#L368 are you sure? Im not entirely sure if this will always trigger but theres asserts here
Jul 31
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Thursday, 31 July 2025 at 22:36:42 UTC, monkyyy wrote:
 https://github.com/dlang/phobos/blob/205256abb1f86faf986f8c789cb733ca4137246e/std/string.d#L368

 are you sure? Im not entirely sure if this will always trigger 
 but theres asserts here
Yeah, I am aware of this, because I ran into it on another project. (https://github.com/snazzy-d/sdc/pull/410) The issue is, that phobos is compiled without asserts. The OP's issue is looking at this the wrong way. The responsibility is on the user to validate their input. I don't know that the library needs to do this. Even if we did, core.stdc.stdio is still there, and we can't change that. In other words, the assert means *it's on the caller* to make sure they don't pass in a string with 0 terminators in it. The fact it is an assert is a clue that this is a programming error, not a validation error. If we checked for mid-string zero terminators on all calls to `toStringz`, we would kill performance where mostly it isn't necessary (this is only important if you don't trust where the data came from. This would lead to a different sort of problem ("How come D/C interop is so slow!?") What we need in phobos v3 is a set of validating features for user input so we give the user the tools necessary to do this correctly. -Steve
Jul 31
next sibling parent 0xEAB <desisma heidel.beer> writes:
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
wrote:
 The OP's issue is looking at this the wrong way. The 
 responsibility is on the user to validate their input.
Technically, any function calling `toStringz()` should probably inherit that warning from the doc comment of `toStringz` into its own documentation.
Jul 31
prev sibling next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
wrote:
 
 If we checked for mid-string zero terminators on all calls to 
 `toStringz`, we would kill performance where mostly it isn't 
 necessary
I bet the speed test is effectively a tie if its reasonably written
Jul 31
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:
 On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
 wrote:
 
 If we checked for mid-string zero terminators on all calls to 
 `toStringz`, we would kill performance where mostly it isn't 
 necessary
I bet the speed test is effectively a tie if its reasonably written
How do you "reasonably" write a linear search? Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation. -Steve
Jul 31
parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 1 August 2025 at 00:48:36 UTC, Steven Schveighoffer 
wrote:
 On Friday, 1 August 2025 at 00:22:20 UTC, monkyyy wrote:
 On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
 wrote:
 
 If we checked for mid-string zero terminators on all calls to 
 `toStringz`, we would kill performance where mostly it isn't 
 necessary
I bet the speed test is effectively a tie if its reasonably written
How do you "reasonably" write a linear search? Sure you can make it a faster O(n), but it's still O(n). Whereas just "add on a 0 if not there" is an O(1) operation. -Steve
Lets say you have a few paragraph of text that you split by \n then you call toStringz to pass it to raylib(and you pay attention to the immutablity of strings and not replace \n with null) You can do better with char arrays and different data structures but the current api of the sorta maybe kinda dynamic array that airnt special case chars, immutablity, and c liking null termination all combine together to mean its probably making a copy. During that copy you could check.
Jul 31
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
wrote:
 [...]
 If we checked for mid-string zero terminators on all calls to 
 `toStringz`, we would kill performance where mostly it isn't 
 necessary (this is only important if you don't trust where the 
 data came from. This would lead to a different sort of problem 
 ("How come D/C interop is so slow!?")
This is a strawman. I am writing about file system functions! Furthermode in the case of `std.file.rename` `toStringz` is not even called. What is used to convert to `char *` seems to be the highly interesting template `tempCString` in `std.internal.cstring`. And of course a library should not assert, nor exit nor ignore the error, but make it handleable: ``` def myfun (filename): open (filename, 'w') try: myfun ("a\0c") except TypeError: print ("error occurred") ```
Aug 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Friday, 1 August 2025 at 11:02:37 UTC, kdevel wrote:
 On Friday, 1 August 2025 at 00:11:51 UTC, Steven Schveighoffer 
 wrote:
 [...]
 If we checked for mid-string zero terminators on all calls to 
 `toStringz`, we would kill performance where mostly it isn't 
 necessary (this is only important if you don't trust where the 
 data came from. This would lead to a different sort of problem 
 ("How come D/C interop is so slow!?")
This is a strawman. I am writing about file system functions! Furthermode in the case of `std.file.rename` `toStringz` is not even called. What is used to convert to `char *` seems to be the highly interesting template `tempCString` in `std.internal.cstring`.
I was responding to the link sent about `toStringz`. I don't think we can always check for internal 0 characters there. But in `tempCString`, we can, since we are always copying. However, this is delicate, because you will affect performance. A slice assign is `memcpy`, and `memcpy` is damn fast (and hard to reimplement). If instead you check every character, you will change to a for loop, which will be slow. I think the right answer here is to use [strncpy](https://cplusplus.com/reference/cstring/strncpy/). According to the docs, strncpy will copy up to N characters. But if a NUL character is reached before end of the string, then it zeroes the rest of the buffer. This means we can detect whether a 0 was inside the string by checking the last byte copied. So this really is a limitation of `tempCString`, and we can fix that. However, std.file could just have easily used `toStringz`. In terms of correctness, you are passing in a parameter that has different properties than the one ultimately sent to the underlying call. It's always a fight between correctness and performance here. I always prefer (if it is available) an API that does not rely on zero termination, but when it is not available, it needs to be checked. Where it is checked is important. I'm not sure that's the library's responsibility. In other words, if I pass in `rmdir("foo")`, then why should I pay the penalty of examining `"foo"` for malicious NUL bytes? The source is a literal, I can know whether it has them or not. In fact, there isn't even a way to pass in a `char*` to `rmdir`, which is unfortunate. We have the capability of making an API that allows for all cases -- user input, static data, validated data, actual C strings, etc. We shouldn't make all these go through the same super-defensive gauntlet.
 And of course a library should not assert, nor exit nor ignore 
 the error, but make it handleable:

 ```


 def myfun (filename):
    open (filename, 'w')

 try:
    myfun ("a\0c")
 except TypeError:
    print ("error occurred")

 ```
Python is not D. It cannot do any kind of type introspection, and char* just isn't a thing, so it has to always go through the same path. We don't want to make D as slow as python. -Steve
Aug 01
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer 
wrote:
 If instead you check every character, you will change to a for 
 loop, which will be slow.

 I think the right answer here is to use 
 [strncpy](https://cplusplus.com/reference/cstring/strncpy/). 
 According to the docs, strncpy will copy up to N characters. 
 But if a NUL character is reached before end of the string, 
 then it zeroes the rest of the buffer. This means we can detect 
 whether a 0 was inside the string by checking the last byte 
 copied.
https://github.com/dlang/phobos/issues/10836 Would be quite an easy fix if someone wants to tackle it. -Steve
Aug 01
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer 
wrote:
 [...]
 It's always a fight between correctness and performance here.
Pardon?
 [...]
 I'm not sure that's the library's responsibility. In other 
 words, if I pass in `rmdir("foo")`, then why should I pay the 
 penalty of examining `"foo"` for malicious NUL bytes?
The `NUL` is not representable in `char *`. But all phobos filesystem functions use unadorned `string` parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles? If the library expects pathnames without embedded `NUL`s I would create a subtype of `string`, say `fstring` or `cstring`. One then has to discuss which of these calls ``` rmdir ("a\0b"); //with the technique from [1] rmdir (fstring ("a\0b")); rmdir (cast (fstring) ("a\0b")); ``` shall compile and what one expects as runtime behavior. [1] *Implicit type conversion of an argument when a function is called* https://forum.dlang.org/thread/agstjpezerwlgdhphclk forum.dlang.org
Aug 02
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Sunday, 3 August 2025 at 00:03:47 UTC, kdevel wrote:
 On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer 
 wrote:
 [...]
 It's always a fight between correctness and performance here.
Pardon?
I mean in terms of where you apply the checks. You can say "I expect you to have validated this before sending it in", or you can say "I will always validate this, even if you already have, in the name of correctness." And I suppose a better word should have been "tradeoff" and not "fight".
 [...]
 I'm not sure that's the library's responsibility. In other 
 words, if I pass in `rmdir("foo")`, then why should I pay the 
 penalty of examining `"foo"` for malicious NUL bytes?
The `NUL` is not representable in `char *`. But all phobos filesystem functions use unadorned `string` parameters for pathnames. Who else if not the designer of the library shall be responsible for coding (not paying!) the extra cpu cycles? If the library expects pathnames without embedded `NUL`s I would create a subtype of `string`, say `fstring` or `cstring`. One then has to discuss which of these calls
The issue is going to be fixed, not sure if you saw my issue report. It's quite an easy fix actually. But just so you know, C also allows passing in C strings with embedded null characters: ```c #include <unistd.h> int main() { rmdir("hello\0world"); return 0; } ``` And we do expose `core.stdc.posix.unistd`. -Steve
Aug 03
parent reply kdevel <kdevel vogtner.de> writes:
On Sunday, 3 August 2025 at 21:01:05 UTC, Steven Schveighoffer 
wrote:
 On Sunday, 3 August 2025 at 00:03:47 UTC, kdevel wrote:
 On Friday, 1 August 2025 at 17:53:17 UTC, Steven Schveighoffer 
 wrote:
 [...]
 It's always a fight between correctness and performance here.
Pardon?
I mean in terms of where you apply the checks. You can say "I expect you to have validated this before sending it in", or you can say "I will always validate this, even if you already have, in the name of correctness." And I suppose a better word should have been "tradeoff" and not "fight".
I like the fragment "[...] it has to be correct before being fast."
 [...]
 The issue is going to be fixed, not sure if you saw my issue 
 report. It's quite an easy fix actually.
Thanks but sorry I could not yet take a look.
 But just so you know, C also allows passing in C strings with 
 embedded null characters:

 ```c
 #include <unistd.h>

 int main() {
    rmdir("hello\0world");
    return 0;
 }
 ```
On the one hand there is no such thing as a C string with embedded `NUL` (because the `NUL` terminates the string). On the other hand the `rmdir` function takes `const char *pathname`. Your code contains an array argument (`"hello\0world"`). According to C's rules this array decays into a pointer *before* `rmdir` is invoked. Thus one can hardly say that a string with embedded `NUL` is passed to the function.
 And we do expose `core.stdc.posix.unistd`.
Yes. But in the D-function in question is `std.file.rmdir`. I copied the relevant code: ```D import std.stdio : writeln; import std.range : isSomeFiniteCharInputRange; import std.traits : isConvertibleToString; void rmdir (R) (R pathname) if (isSomeFiniteCharInputRange!R && !isConvertibleToString!R) { writeln (__PRETTY_FUNCTION__); writeln (typeid (pathname)); writeln (pathname.length); writeln (pathname); } void rmdir(R)(auto ref R pathname) if (isConvertibleToString!R) { writeln (__PRETTY_FUNCTION__); writeln (typeid (pathname)); writeln (pathname.length); writeln (pathname); } void main () { rmdir ("a\0b"); } ``` which prints ``` void rmdir.rmdir!string.rmdir(string pathname) immutable(char)[] 3 ab <--- there is a NUL between a and b ``` I.e. the function really takes the string with the embedded `NUL`.
Aug 04
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 5 August 2025 at 02:47:22 UTC, kdevel wrote:
 On Sunday, 3 August 2025 at 21:01:05 UTC, Steven Schveighoffer 
 wrote:
 [...]
 The issue is going to be fixed, not sure if you saw my issue 
 report. It's quite an easy fix actually.
Thanks but sorry I could not yet take a look.
TL;DR: We can fix `tempCString` since we are always copying the whole thing. So the issue is moot at that point.
 But just so you know, C also allows passing in C strings with 
 embedded null characters:

 ```c
 #include <unistd.h>

 int main() {
    rmdir("hello\0world");
    return 0;
 }
 ```
On the one hand there is no such thing as a C string with embedded `NUL` (because the `NUL` terminates the string). On the other hand the `rmdir` function takes `const char *pathname`. Your code contains an array argument (`"hello\0world"`). According to C's rules this array decays into a pointer *before* `rmdir` is invoked. Thus one can hardly say that a string with embedded `NUL` is passed to the function.
C does not give me an error for this code. Regardless of what it "technically" is doing, you have the same issue there -- C does not warn you or error on this concern. So it is identically vulnerable to your test case.
 And we do expose `core.stdc.posix.unistd`.
Yes. But in the D-function in question is `std.file.rmdir`.
C already needs to caution people not to call this function with embedded null string array. D can also do the same thing. My point is, I don't imagine we need to "fix" the unistd binding. Again, who's responsible for verification is a tradeoff. In this case, we have an obvious answer for std.file.rmdir, because we can fix `tempCString`. But it also could be solved by just identifying what happens if you pass in such a string, and put the onus on the user to check. -Steve
Aug 04
prev sibling parent Doigt <labog outlook.com> writes:
On Thursday, 31 July 2025 at 20:45:14 UTC, kdevel wrote:
 Some python code:



    def myfun (filename):
       open (filename, 'w')

    myfun ("a\0c")

 which when executed behaves in an exemplary manner:

    Traceback (most recent call last):
      File "./test.py", line 6, in <module>
        myfun ("a\0c")
      File "./test.py", line 4, in myfun
        open (filename, 'w')
    TypeError: file() argument 1 must be encoded string without 
 null
    bytes, not str

 Other languages like D, perl or even Ada seem to let the 
 embedded NUL
 character silently truncate the filename. This poses a 
 considerable
 risk when the input to std.file functions is not controlled by 
 the
 program author [1]. E.g.

    rmdirRecurse ("/\0/home/user/subdir");

 [1] https://cwe.mitre.org/data/definitions/158.html
Rikki pointed out to me that I was blind. I apologize for my tone. You did provide that code that proved your assertion. I was wrong to say what I said and I take back everything I said. I will now retire myself from this discussion.
Jul 31