www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - What's the fastest way to check if a slice points to static data

reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
I need a fast and hopefully relatively cross-platform (ELF, OMF, 
COFF and MachO) way of checking if a slice points to data in the 
read-only section of the binary, i.e. it's pointing to a 
statically-allocated piece of memory.

<side_note>

Of course a simple solution using meta programming would be:

---
enum isStaticallyAllocated(alias var) = __traits(compiles,
{
     // ensures that the value is known at compile-time
     enum value = var;

     // ensures that it's not a manifest constant and that it's
     // actually going to be part of the binary (modulo linker
     // optimizations like gc-sections).
     static immutable addr = &var;
});

enum x = 3;
static immutable y = 4;
immutable z = 5;
int w = 6;

void main()
{
     enum localX = 3;
     static immutable localY = 4;
     immutable localZ = 5;
     int localW = 6;

     pragma (msg, isStaticallyAllocated!x); // false
     pragma (msg, isStaticallyAllocated!y); // true
     pragma (msg, isStaticallyAllocated!z); // true
     pragma (msg, isStaticallyAllocated!w); // false
     pragma (msg, isStaticallyAllocated!localX); // false
     pragma (msg, isStaticallyAllocated!localY); // true
     pragma (msg, isStaticallyAllocated!localZ); // false
     pragma (msg, isStaticallyAllocated!localW); // false
}
---

However, that doesn't work when all you have is a slice as a 
run-time
argument to a function.

</side_note>

Additionally, if the the slice was constructed from a string 
literal,
it should possible to recover a pointer to the zero-terminated 
string.

Or in pseudo-code:

---
void main()
{
     import core.stdc.stdio : printf;
     auto p = "test".fastStringZ;
     p || assert(0, "Something is terribly wrong!");
     printf("%s\n", p);
}

import std.traits : isSomeChar;

// Does the magic
bool isStaticallyAllocated(in scope void[] slice)
{
     // XXX_XXX Fix me!!!!
     return true;
}

/**
  * Returns:
  * A pointer to a null-terminated string in O(1) time,
  * (with regards to the length of the string and the required
  * memory, if any) or `null` if  * the time constraint
  * can't be met.
  */
immutable(T)* fastStringZ(T)(return immutable(T)[] s)  trusted
if (isSomeChar!T)
{
     if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
         return s.ptr;
     else
         return null;
}
---

(Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
work but if you pass the pointer to e.g. a C library and that
library keeps it after the call has completed, good luck tracking
memory corruption if the slice was pointing to automatic/dynamic
memory - e.g. static array buffer on the stack or GC / RC * heap
allocation.
* malloc or custom allocator + smart pointer wrapper)
Jun 24
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Saturday, 24 June 2017 at 12:22:54 UTC, Petar Kirov 
[ZombineDev] wrote:
 [ ... ]

 /**
  * Returns:
  * A pointer to a null-terminated string in O(1) time,
  * (with regards to the length of the string and the required
  * memory, if any) or `null` if  * the time constraint
  * can't be met.
  */
 immutable(T)* fastStringZ(T)(return immutable(T)[] s)  trusted
 if (isSomeChar!T)
 {
     if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
         return s.ptr;
     else
         return null;
 }
 ---

 (Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
 work but if you pass the pointer to e.g. a C library and that
 library keeps it after the call has completed, good luck 
 tracking
 memory corruption if the slice was pointing to automatic/dynamic
 memory - e.g. static array buffer on the stack or GC / RC * heap
 allocation.
 * malloc or custom allocator + smart pointer wrapper)
Please note that not all static immutable strings have to be null terminated. It is possible to generate a string at ctfe which may appear the same as string literal, but does not have the \0 at the end.
Jun 24
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Saturday, 24 June 2017 at 13:11:02 UTC, Stefan Koch wrote:
 On Saturday, 24 June 2017 at 12:22:54 UTC, Petar Kirov 
 [ZombineDev] wrote:
 [ ... ]

 /**
  * Returns:
  * A pointer to a null-terminated string in O(1) time,
  * (with regards to the length of the string and the required
  * memory, if any) or `null` if  * the time constraint
  * can't be met.
  */
 immutable(T)* fastStringZ(T)(return immutable(T)[] s)  trusted
 if (isSomeChar!T)
 {
     if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
         return s.ptr;
     else
         return null;
 }
 ---

 (Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
 work but if you pass the pointer to e.g. a C library and that
 library keeps it after the call has completed, good luck 
 tracking
 memory corruption if the slice was pointing to 
 automatic/dynamic
 memory - e.g. static array buffer on the stack or GC / RC * 
 heap
 allocation.
 * malloc or custom allocator + smart pointer wrapper)
Please note that not all static immutable strings have to be null terminated. It is possible to generate a string at ctfe which may appear the same as string literal, but does not have the \0 at the end.
But in that case, the check `s.ptr[s.length] == 0` in fastStringZ would do the trick, right? BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine returns a non-null-terminated string expression, since the backend or the glue layer would write it to the object file as if it was a null-terminated string. But you're right if you mean that this trick won't work in CTFE, since the `s.ptr[s.length] == 0` trick rightfully is disallowed. --- void main() { static immutable str = generateString(); pragma (msg, str, " is null-terminated at CT: ", str.isNullTerminated()); import std.stdio; writeln(str, " is null-terminated at RT: ", str.isNullTerminated()); } string generateString() { string res; foreach (i; 0 .. 26) res ~= 'a' + i; return res; } import std.traits : isSomeChar; bool isNullTerminated(T)(scope const T[] str) if (isSomeChar!T) { if (!__ctfe) return str.ptr[str.length] == 0; else return false; } --- Compilation output: abcdefghijklmnopqrstuvwxyz is null-terminated at CT: false Application output: abcdefghijklmnopqrstuvwxyz is null-terminated at RT: true
Jun 24
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
Petar Kirov [ZombineDev] wrote:

 Please note that not all static immutable strings have to be null 
 terminated.
 It is possible to generate a string at ctfe which may appear the same as 
 string literal, but does not have the \0 at the end.
But in that case, the check `s.ptr[s.length] == 0` in fastStringZ would do the trick, right?
with the edge case when something like the code i posted below managed to make `a` perfectly aligned with r/o area, and you got segfault by accising out-of-bounds byte.
 BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine returns a
 non-null-terminated string expression, since the backend or the glue layer
 would write it to the object file as if it was a null-terminated string.
immutable ubyte[2] a = [65,66]; enum string s = cast(string)a; immutable ubyte[2] b = [67,68]; // just to show you that there is no zero void main () { assert(s[$-1] == 0); }
Jun 24
prev sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
p.s.: btw, druntime tries to avoid that edge case by not checking for 
trailing out-of-bounds zero if string ends exactly on dword boundary. it 
will miss some strings this way, but otherwise it is perfectly safe.
Jun 24
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
ketmar wrote:

 p.s.: btw, druntime tries to avoid that edge case by not checking for 
 trailing out-of-bounds zero if string ends exactly on dword boundary. it 
 will miss some strings this way, but otherwise it is perfectly safe.
oops. not druntime, phobos, in `std.string.toStringz()`.
Jun 24
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Saturday, 24 June 2017 at 14:18:33 UTC, ketmar wrote:
 with the edge case when something like the code i posted below 
 managed to make `a` perfectly aligned with r/o area, and you 
 got segfault by accising out-of-bounds byte.

 BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine 
 returns a
 non-null-terminated string expression, since the backend or 
 the glue layer
 would write it to the object file as if it was a 
 null-terminated string.
immutable ubyte[2] a = [65,66]; enum string s = cast(string)a; immutable ubyte[2] b = [67,68]; // just to show you that there is no zero void main () { assert(s[$-1] == 0); }
Thanks, I haven't considered immutable statically allocated fixed-size arrays of chars. Specifically, while mutable fixed-size arrays of both character and non-character type are common, I don't think immutable fixed-size char arrays are much used compared to string literals and ctfe-derived strings. I'm tempted to write in the documentation of my hypothetical fastStringZ function that passing anything, but something originating from a slice is UB, though I'm aware how under-specified and hand-wavy this sounds. On Saturday, 24 June 2017 at 14:21:23 UTC, ketmar wrote:
 ketmar wrote:

 p.s.: btw, druntime tries to avoid that edge case by not 
 checking for trailing out-of-bounds zero if string ends 
 exactly on dword boundary. it will miss some strings this way, 
 but otherwise it is perfectly safe.
oops. not druntime, phobos, in `std.string.toStringz()`.
Thanks, for some reason I assumed that toStringz always conservatively copies the string, without even checking the code. It looks like the more aggressive optimization was at some point removed which is visible in this revision: http://www.dsource.org/projects/phobos/changeset/101#file15 and later Andrei reintroduced it with the more conservative heuristic: https://github.com/dlang/phobos/commit/460c844b4fb9b96833871c111dd529d22129ab7c, but I didn't manage to find any discussion about it. *** But in any case, the null-terminated string was just an example application. I'm interested in a fast way to determine the "storage class" of the memory a slice or a pointer point to. I'm expecting some magic along the lines of checking the range of addresses that the rodata section resides in memory. Similar to how some allocators or the GC know if they own a range of memory. Any ideas on that? ***
Jun 24
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Petar Kirov [ZombineDev] wrote:

 ***
 But in any case, the null-terminated string was just an example 
 application.
 I'm interested in a fast way to determine the "storage class" of the 
 memory
 a slice or a pointer point to. I'm expecting some magic along the lines of
 checking the range of addresses that the rodata section resides in memory.
 Similar to how some allocators or the GC know if they own a range of 
 memory.
 Any ideas on that?
 ***
the only query you can do is GC query (see `core.memory.CG` namespace, `addrOf()` API, for example). it will tell you if something was allocated with D GC or not. yet it is not guaranteed to be fast (althru it is usually "fast enough"). i think this is all what you can get without resorting to ugly platform-specific hacks (that will inevitably break ;-).
Jun 24
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Saturday, 24 June 2017 at 18:05:55 UTC, ketmar wrote:
 Petar Kirov [ZombineDev] wrote:

 ***
 But in any case, the null-terminated string was just an 
 example application.
 I'm interested in a fast way to determine the "storage class" 
 of the memory
 a slice or a pointer point to. I'm expecting some magic along 
 the lines of
 checking the range of addresses that the rodata section 
 resides in memory.
 Similar to how some allocators or the GC know if they own a 
 range of memory.
 Any ideas on that?
 ***
the only query you can do is GC query (see `core.memory.CG` namespace, `addrOf()` API, for example). it will tell you if something was allocated with D GC or not. yet it is not guaranteed to be fast (althru it is usually "fast enough").
I'm not interested in asking the GC specifically, but I have looked at its implementation and I know that it keeps such information around: https://github.com/dlang/druntime/blob/v2.074.1/src/gc/impl/conservative/gc.d#L843
 i think this is all what you can get without resorting to ugly 
 platform-specific hacks (that will inevitably break ;-).
Oh, I should have mentioned that I don't expect anything but ugly platform-specific hacks possibly involving the object file format ;) Just enough of them to claim that the solution is somewhat cross-platform :D
Jun 24
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
Petar Kirov [ZombineDev] wrote:

 Oh, I should have mentioned that I don't expect anything but ugly 
 platform-specific hacks possibly involving the object file format ;)
 Just enough of them to claim that the solution is somewhat cross-platform 
 :D
i guess you can loot at how TSL scanning is done in druntime (at least on GNU/Linux). it does some parsing of internal structures of loaded ELF. i guess you can parse section part of the loaded ELF too, to find out r/o sections and their address ranges.
Jun 24
parent Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Saturday, 24 June 2017 at 18:46:06 UTC, ketmar wrote:
 Petar Kirov [ZombineDev] wrote:

 Oh, I should have mentioned that I don't expect anything but 
 ugly platform-specific hacks possibly involving the object 
 file format ;)
 Just enough of them to claim that the solution is somewhat 
 cross-platform :D
i guess you can loot at how TSL scanning is done in druntime (at least on GNU/Linux). it does some parsing of internal structures of loaded ELF. i guess you can parse section part of the loaded ELF too, to find out r/o sections and their address ranges.
that's the stuff :P
Jun 24