digitalmars.dip.ideas - swap
- Dom DiSc (64/64) May 22 Someone said without ref-counting or borrow-check it is not
- Paul Backus (18/23) May 22 Here's the unit test that a `@safe` swap function needs to be
- Dom DiSc (20/37) May 22 This is easy, but then it will also refuse to swap two local
- Dom DiSc (21/25) May 23 It's ugly to use this trick.
- Paul Backus (9/22) May 23 ```d
- Dom DiSc (11/28) May 27 After thinking a lot about this, I would say the third and forth
- Paul Backus (9/16) May 28 Yes, if you simply forbid taking the address of a local object in
- Patrick Schluter (10/45) May 28 Never, ever use XOR swap. It is buggy and slower than a temp
- Dom DiSc (6/15) May 29 I did expect this, but ok, I should forbid using the same object
- Dom DiSc (41/43) May 30 This heavily depends on the time, the copy-constructor needs.
- Richard (Rikki) Andrew Cattermole (50/50) May 29 I've been meaning to reply for a few days now.
- Dom DiSc (15/19) May 30 Yes, dip1000 cannot check the correctness of functions with two
Someone said without ref-counting or borrow-check it is not possible to create a safe swap function. I think this is not true, so I propose that phobos should provide the following swap-function which is usable in safe code: ```d /// Generic swap function without temporary variables /// Types with indirections need to define their own specific swap functions. void swap(T)(ref T x, ref T y) trusted if(is(typeof(x ^= y)) || !hasIndirections!T || isPointer!T || isArray!T) { static if(is(typeof(x ^= y))) { x ^= y; y ^= x; x ^= y; } // numeric types else // flat structs, simple pointers and arrays { ubyte[] ax = (cast(ubyte*)&x)[0 .. T.sizeof]; ubyte[] ay = (cast(ubyte*)&y)[0 .. T.sizeof]; ax[] ^= ay[]; ay[] ^= ax[]; ax[] ^= ay[]; } } system unittest { // type with xor operator int a = 100; int b = -123_456; swap(a,b); assert(a == -123_456); assert(b == 100); // basic array int[] c = [5,4,3,2,1]; int[] d = [6,6,6,6]; swap(c,d); assert(c == [6,6,6,6]); assert(d == [5,4,3,2,1]); // basic pointer (maybe to different types) void* e = &a; void* f = &c; swap(e,f); assert(e == &c); assert(f == &a); // flat struct with gaps struct F { ubyte x; short y; int z; } assert(!hasIndirections!F); assert(F.sizeof == 8); // not 7 F p = F(100, -3, int.min); F q = F(0, 0, -1); (cast(byte*)&q)[1] = 2; // fill the gap with some garbage swap(p,q); assert(p.x==0 && p.y==0 && p.z==-1); assert(q.x==100 && q.y==-3 && q.z==int.min); assert((cast(byte*)&p)[1] == 2); // the garbage was swapped together with the data assert(!(p is F(0, 0, -1))); // memcmp() checks also the garbage in the gaps to be equal // assert(p != F(0, 0, -1)); // with -preview=fieldwise the default comparison is fixed assert(q == F(100, -3, int.min)); } ```
May 22
On Thursday, 22 May 2025 at 09:44:39 UTC, Dom DiSc wrote:Someone said without ref-counting or borrow-check it is not possible to create a safe swap function. I think this is not true, so I propose that phobos should provide the following swap-function which is usable in safe code:Here's the unit test that a ` safe` swap function needs to be able to pass: ```d int* global; safe unittest { int n; int* local = &n; swap(local, local); swap(global, global); assert(!__traits(compiles, swap(local, global))); assert(!__traits(compiles, swap(global, local))); } ``` Your swap function does not pass this test--the 3rd and 4th assertions both fail (when compiling with `-preview=dip1000` to allow taking the address of a local variable).
May 22
On Thursday, 22 May 2025 at 19:38:43 UTC, Paul Backus wrote:Here's the unit test that a ` safe` swap function needs to be able to pass: ```d int* global; safe unittest { int n; int* local = &n; swap(local, local); swap(global, global); assert(!__traits(compiles, swap(local, global))); assert(!__traits(compiles, swap(global, local))); } ``` Your swap function does not pass this test--the 3rd and 4th assertions both fail (when compiling with `-preview=dip1000` to allow taking the address of a local variable).This is easy, but then it will also refuse to swap two local pointers (which is kind of correct, because they have different lifetime): ```d // this will only compile if both x and y are global variables void isGlobal(T)(ref T x, ref T y) safe { x = y; } void swap(T)(ref T x, ref T y) trusted if(is(typeof(x ^= y)) || !hasIndirections!T || isPointer!T || isArray!T) { static if(is(typeof(x ^= y))) { x ^= y; y ^= x; x ^= y; } else { assert(__traits(compiles, isGlobal(x, y))); ubyte[] ax = (cast(ubyte*)&x)[0 .. T.sizeof]; ubyte[] ay = (cast(ubyte*)&y)[0 .. T.sizeof]; ax[] ^= ay[]; ay[] ^= ax[]; ax[] ^= ay[]; } } ```
May 22
On Friday, 23 May 2025 at 06:40:10 UTC, Dom DiSc wrote:```d // this will only compile if both x and y are global variables void isGlobal(T)(ref T x, ref T y) safe { x = y; } ```It's ugly to use this trick. But the real problem is, that two local variables necessarily have different lifetimes, because they are at least declared one after the other. Even if I write ```d int* x, y; ``` the compiler assumes x has longer lifetime than y. We need a way to enforce that two variables are considered to have the same lifetime, together with a new trait to check for that. E.g. ```d bool haveSameLifetime(...) { } ``` then we can allow only variables with same lifetime to be swapped. We don't even need new syntax for this. The compiler simply should consider variables declared in the same statement to start their life simultaneous. Of course this is a mayor change, but this is why I propose a DIP for it.
May 23
On Friday, 23 May 2025 at 09:45:52 UTC, Dom DiSc wrote:On Friday, 23 May 2025 at 06:40:10 UTC, Dom DiSc wrote:```d safe unittest { int n; int*[2] locals = [&n , &n]; swap(locals[0], locals[1]); // same lifetime } ``````d // this will only compile if both x and y are global variables void isGlobal(T)(ref T x, ref T y) safe { x = y; } ```It's ugly to use this trick. But the real problem is, that two local variables necessarily have different lifetimes, because they are at least declared one after the other. Even if I write ```d int* x, y; ``` the compiler assumes x has longer lifetime than y.
May 23
On Friday, 23 May 2025 at 14:04:28 UTC, Paul Backus wrote:```d safe unittest { int n; int*[2] locals = [&n , &n]; swap(locals[0], locals[1]); // same lifetime } ```Yeah, now - this lying in your pocket. Because in this case we can have ```d void swap(T)(T[2] x) trusted if(is(typeof(x[0] ^= x[1])) || !hasIndirections!T || isPointer!T || isArray!T) { static if(is(typeof(x[0] ^= x[1]))) { x[0] ^= x[1]; x[1] ^= x[0]; x[0] ^= x[1]; } // numeric types else // flat structs, simple pointers and arrays { ubyte[] ax = (cast(ubyte*)&x[0])[0 .. T.sizeof]; ubyte[] ay = (cast(ubyte*)&x[1])[0 .. T.sizeof]; ax[] ^= ay[]; ay[] ^= ax[]; ax[] ^= ay[]; } } int global; safe unittest { int local; int*[2] mix = [&global , &local]; // equal lifetime ?!? swap(mix); // works, and dip1000 doesn't complain } ``` So here we can swap globals with locals. Why shouldn't we if we use two parameters? I would still like to have a trait to check for equal lifetime. And some better method to create variables of (really) equal lifetime.
May 23
On Friday, 23 May 2025 at 22:12:22 UTC, Dom DiSc wrote:So here we can swap globals with locals.Because dip1000 is too stupid to detect this. Maybe we simply need a construct "assumeEqualLifetime(x,y)" so that the scope-check is turned off for those two variables and you need to check by hand.
May 23
On Friday, 23 May 2025 at 22:20:42 UTC, Dom DiSc wrote:On Friday, 23 May 2025 at 22:12:22 UTC, Dom DiSc wrote:What's happening is that the parameter of your trusted function incorrectly gets inferred scope, since the compiler doesn't recognize the pointer assignment after you re-interpret cast to a ubyte[] view of the pointer bytes. The example makes me think scope should not be inferred on parameters of trusted functions.So here we can swap globals with locals.Because dip1000 is too stupid to detect this.Maybe we simply need a construct "assumeEqualLifetime(x,y)" so that the scope-check is turned off for those two variables and you need to check by hand.That idea of a ` safe` swap function is that you don't have to check anything by hand to ensure memory safety. By the way, your opening post confuses me a little. You start by trying to prove you can make a ` safe` swap function, and then present a ` trusted` function, so I assume you want a swap function with a safe interface. If that's the case, then it's the function signature that matters, not the implementation. Using the xor swap trick is emphasized as if it's part of the solution, but that just makes the function needlessly trusted instead of safe. It is possible to make a safe swap function that passes your unittest, and the obvious implementation will do: ```D void swap(T)(ref T x, ref T y) safe { auto tmp = x; x = y; y = tmp; } ``` Or alternatively: `import std.algorithm.mutation: swap` Now there is a limitation to ` safe` swap: x and y must be regular 'infinite lifetime' pointers, not scope pointers. Creating a swap of 2 scope pointers isn't possible currently, but that has to do with the expressiveness of `return scope` attributes. You can express a function where `y` gets assigned to `x` like so: ```D void swap(ref scope T x, ref return scope T y) ``` But you can't express that `x` also gets assigned to `y`, because there's simply no syntax / implementation code for that.
May 27
On Tuesday, 27 May 2025 at 15:49:57 UTC, Dennis wrote:That idea of a ` safe` swap function is that you don't have to check anything by hand to ensure memory safety.I thought, we need some new construct to make that possible. But see my last post. Now I don't understand where's the problem.By the way, your opening post confuses me a little. You start by trying to prove you can make a ` safe` swap function, and then present a ` trusted` function, so I assume you want a swap function with a safe interface. If that's the case, then it's the function signature that matters, not the implementation. Using the xor swap trick is emphasized as if it's part of the solution, but that just makes the function needlessly trusted instead of safe.This is because for some objects it may be very expensive or even not possible to create a temporary copy. This is avoided by the xor-construct.It is possible to make a safe swap function that passes your unittest [...] Now there is a limitation to ` safe` swap: x and y must be regular 'infinite lifetime' pointers, not scope pointers.Yes, that's easy even with my function. See the version with "isGlobal" check. But why is this limitation necessary? What can possibly go wrong? You swap two valid objects, and after the first lifetime ends, one of the objects is deleted/freed/marked as garbage/has no valid pointer to it anymore. But does it matter which of the two it is (let by side the case one of the objects is allocated on the stack)? Why and how does it matter? I can't see it. The lifetime of the two pointers is simply swapped together with the variables.Creating a swap of 2 scope pointers isn't possible currently, but that has to do with the expressiveness of `return scope` attributes.Yes, but why does the compiler need to know that the two have been swapped? They are valid objects of the same type. If one is deleted, does it matter which one? I thought the GC will collect only objects with no valid pointer to it. So if the pointer of one of the two goes out of scope, it won't be collected if there are other pointers to it. It does not matter in which variable this valid address is stored (the current GC looks even in non-pointer variables for valid addresses).
May 27
On Wednesday, 28 May 2025 at 06:06:25 UTC, Dom DiSc wrote:This is because for some objects it may be very expensive or even not possible to create a temporary copy. This is avoided by the xor-construct.Okay, but that's a separate problem. I'm just going to discuss the ` safe` interface of swap for simple pointer types, because if we throw performance and special type considerations in the mix it only gets more confusing.But why is this limitation necessary? What can possibly go wrong?Memory corruption in safe code. ```D import std.stdio, std.algorithm; void main() safe { string x = "hello"; void replace(ref string x) { immutable(char)[4] buf = "bye"; string y = buf[]; // y contains a reference to local variable `buf` swap(y, x); // y is assigned to x which has a longer lifetime than `buf` } writeln(x); // hello replace(x); writeln(x); // �, (`buf` is freed, printing garbage memory) } ``` Let me clarify a few things. Ignoring bugs, D without dip1000 / `scope` pointers is already memory safe if you stick to the safe subset. You can freely swap the values of any combination of local and global variables without problem. It's all safe because pointers always point to global memory or GC managed heap memory, preventing use-after-free scenarios. Now there is one scenario where memory gets freed but not because the GC or the end of the program triggered it: local variables. At the end of a function, the stack frame gets cleaned up, freeing all memory used for local variables in that function. Since you can only refer to a local variable name inside the function, direct access is always safe. A problem arises however when you keep a reference to the local variable alive either by returning a nested function, or creating a pointer to the variable. To solve the case with nested functions, the compiler creates a closure with the GC so the stack gets promoted to heap memory. For pointers however, the compiler doesn't do that. It could, but stack pointers are usually created specifically for performance reasons. If you want GC memory you might as well use `new int[]` or an array literal instead of slicing a static array. This does mean that suddenly we're dealing with pointers that can't freely be assigned to anything anymore, but only to things that remain in the same scope as the variable you took the address of. So all this talk about ref-counting, borrow-checking, `scope` pointers and whatnot is about allowing a new scenario where you have a variable that frees itself when it goes out of scope, but you want to allow creating a pointer or 'reference' to that variable which may not outlive the variable is was created from. Again, D is already safe if you simply don't allow creating such references to objects that destruct themselves. You can already do safe manual memory management by allocating in the constructor, implementing a copy constructor, disabling field access, and freeing in the constructor, but that severely restricts use of the allocated payload. For example, you can have a safe reference counted string, but you can't pass the underlying `char[]` to `writeln`, you have to index every `char` manually and pass that. Why? Because `writeln` might assign the `char[]` to a global or something else that lives longer than the reference counted string.Yes, but why does the compiler need to know that the two have been swapped?It doesn't for regular pointers or non-pointers. `swap(ref T x, ref T y)` is always safe when T = int, but when T = string and `x` has a pointer to `char` bytes that are stack-allocated/reference counted/other non-gc pointer, and `y` is a variable that is still accessible after those `char` bytes have been destructed, then you get a dangling pointer, which is obviously not ` safe`.
May 28
On Thursday, 22 May 2025 at 19:38:43 UTC, Paul Backus wrote:Here's the unit test that a ` safe` swap function needs to be able to pass: ```d int* global; safe unittest { int n; int* local = &n; swap(local, local); swap(global, global); assert(!__traits(compiles, swap(local, global))); assert(!__traits(compiles, swap(global, local))); } ``` Your swap function does not pass this test--the 3rd and 4th assertions both fail (when compiling with `-preview=dip1000` to allow taking the address of a local variable).After thinking a lot about this, I would say the third and forth assertion make only sense if by "local" it is meant "allocated on the stack", because swapping anything on the heap is always ok, no matter what lifetime the objects have. And for taking the address of local objects I would say: if this is done, the object must be allocated on the heap, else it is always possible to run into problems, not only with swap. And wasn't the precondition to allow taking the address of local objects exactly, that the compiler then automatically allocate them on the heap?
May 27
On Tuesday, 27 May 2025 at 09:40:04 UTC, Dom DiSc wrote:After thinking a lot about this, I would say the third and forth assertion make only sense if by "local" it is meant "allocated on the stack", because swapping anything on the heap is always ok, no matter what lifetime the objects have. And for taking the address of local objects I would say: if this is done, the object must be allocated on the heap, else it is always possible to run into problems, not only with swap.Yes, if you simply forbid taking the address of a local object in the first place, then there is no problem, and `swap` can easily be made ` safe`, without requiring any weird tricks. The people who said that a ` safe` swap function was impossible without borrow-checking or ref-counting were speaking specifically in the context of `-preview=dip1000`, which allows taking the address of a local variable in ` safe` code. It seems like you were not aware of that context.
May 28
On Wednesday, 28 May 2025 at 16:29:52 UTC, Paul Backus wrote:Yes, if you simply forbid taking the address of a local object in the first place, then there is no problem, and `swap` can easily be made ` safe`, without requiring any weird tricks. The people who said that a ` safe` swap function was impossible without borrow-checking or ref-counting were speaking specifically in the context of `-preview=dip1000`, which allows taking the address of a local variable in ` safe` code.I know. I thought, the new method to allow taking the address of a local variable was implemented by simply allocating them on the heap. So, whenever the compiler sees & is used on locals, it will not allocate the affected variable on the stack. But now I learned, that this is not true for local pointers. I see that this may be necessary for performance, but ok, than other measures like dip1000 are necessary to make this "safe".It seems like you were not aware of that context.I was, but obviously I didn't read the details of the implementation correct. Sorry.
May 29
On Wednesday, 28 May 2025 at 16:29:52 UTC, Paul Backus wrote:The people who said that a ` safe` swap function was impossible without borrow-checking or ref-counting were speaking specifically in the context of `-preview=dip1000`Even in this context there is only one problematic case: swapping stack pointers with heap pointers. But this will always need special handling, and detecting this doesn't require ref-counting or borrow-checking. For the moment it should simply be forbidden. In such strange usecases a workaround by hand would be recommended anyway.
May 29
On Thursday, 22 May 2025 at 09:44:39 UTC, Dom DiSc wrote:Someone said without ref-counting or borrow-check it is not possible to create a safe swap function. I think this is not true, so I propose that phobos should provide the following swap-function which is usable in safe code: ```d /// Generic swap function without temporary variables /// Types with indirections need to define their own specific swap functions. void swap(T)(ref T x, ref T y) trusted if(is(typeof(x ^= y)) || !hasIndirections!T || isPointer!T || isArray!T) { static if(is(typeof(x ^= y))) { x ^= y; y ^= x; x ^= y; } // numeric types else // flat structs, simple pointers and arrays { ubyte[] ax = (cast(ubyte*)&x)[0 .. T.sizeof]; ubyte[] ay = (cast(ubyte*)&y)[0 .. T.sizeof]; ax[] ^= ay[]; ay[] ^= ax[]; ax[] ^= ay[]; } }Never, ever use XOR swap. It is buggy and slower than a temp variablesystem unittest { // type with xor operator int a = 100; int b = -123_456; swap(a,b); assert(a == -123_456); assert(b == 100); // basic array int[] c = [5,4,3,2,1]; int[] d = [6,6,6,6]; swap(c,d); assert(c == [6,6,6,6]); assert(d == [5,4,3,2,1]);Try `swap(c, c)`. The result is not what you expect. Instead of getting the original content, your xor swap erases the content. The xor swap is also slower because each instruction depends on the preceding one which means that it will always take 3 cycles to execute one swap. With a temp variable, 2 of the instructions can be executed together making it possible to execute in 2 cycles.
May 28
On Wednesday, 28 May 2025 at 14:47:23 UTC, Patrick Schluter wrote:Never, ever use XOR swap. It is buggy and slower than a temp variable Try `swap(c, c)`. The result is not what you expect. Instead of getting the original content, your xor swap erases the content.I did expect this, but ok, I should forbid using the same object twice - doesn't make sense anyway.The xor swap is also slower because each instruction depends on the preceding one which means that it will always take 3 cycles to execute one swap. With a temp variable, 2 of the instructions can be executed together making it possible to execute in 2 cycles.This maybe true for the buildin types, but for the interesting cases (like structures with complicated constructors) making a temporary copy maybe very expensive or not possible at all.
May 29
On Wednesday, 28 May 2025 at 14:47:23 UTC, Patrick Schluter wrote:Never, ever use XOR swap. It is buggy and slower than a temp variableThis heavily depends on the time, the copy-constructor needs. My final version now looks so: ```d /// Generic swap function /// need to be trusted, because the current escape-analysis cannot prove its safe /// swap for objects with indirections need to handle all self-references separately /// (if they have none, this swap function would work correct on them, /// but there is no easy way to check for that). void swap(T)(ref T x, ref T y) trusted if(!hasIndirections!T || isPointer!T || isArray!T) { assert(!(x is y), "cannot swap something with itself"); static if(!isScalarType!T) { if(!x || !y) assert(0, "can swap only constructed objects"); assert(!(__traits(isStackobject, x) ^ __traits(isStackobject, y)), "cannot swap heap-objects with stack-objects"); } static if(!hasElaborateCopyConstructor!T) // standard implementation { T tmp = x; x = y; y = tmp; } else // avoid temporary copy, as this may be expensive (or not possible at all) { ubyte[] ax = (cast(ubyte*)&x)[0 .. T.sizeof]; ubyte[] ay = (cast(ubyte*)&y)[0 .. T.sizeof]; ax[] ^= ay[]; ay[] ^= ax[]; ax[] ^= ay[]; } } ```
May 30
I've been meaning to reply for a few days now. Implementation wise, the body of the swap function is irrelevant. As others have stated. This can be trusted code. The issue in terms of safety is for calls to the function, not within the function. Library code especially core functions like swap and move do not need to be proven safe, only provide a safe interface. These two functions have the same problem, its the by-ref of the parameters. By-ref parameters function as two separate parameters within lifetime tracking, an input and an output. ```d T move(ref T input) => input; ``` Actually looks something more akin to: ```d T move(ref T input, ref T output) { scope(exit) output = T.init; return input; } ``` So what we need to track is if the parameter has been modified to contain a different object. ```d T move( same ref T input) => input; ``` When what we need it to be: ```d T move( different ref T input) { scope(exit) input = T.init; return input } ``` This is used in the caller: ```d int* ptr = ...; // variable `ptr` is not null and contains object `1` int* another = move(ptr); // variable `ptr` is null // variable `another` is not null and contains object `1` ``` The way the compiler knows about the ``another`` variable containing the old value of ``ptr`` is due to escape analysis. This is what DIP1000 is meant to solve (it doesn't for swap, but does for move). So an updated signature: ``T move( different return scope ref T input);`` Important to note that in escape analysis attributes like ``return scope`` have the relationship: input "contributes to" output, not: input "becomes" output. I hope that clarifies things a bit.
May 29
On Thursday, 29 May 2025 at 21:03:37 UTC, Richard (Rikki) Andrew Cattermole wrote:The way the compiler knows about the ``another`` variable containing the old value of ``ptr`` is due to escape analysis. This is what DIP1000 is meant to solve (it doesn't for swap, but does for move).Yes, dip1000 cannot check the correctness of functions with two (or more) writable ref-parameters. But the funny fact is: It doesn't need to. If both are heap-objects or both stack-objects, nothing bad can happen. So we only need to mark the swap function trusted to prevent dip1000 from analysing it, and guarantee by an assert within the swap-function that not one object is on the heap and the other on the stack (and that both are different and not NULL). That's all we need to do to make the interface safe. So, swap is one of the cases where we need trusted, because the compiler cannot prove it's safe, but it is.
May 30