digitalmars.D - Memory safety, C#, D and more

bearophile (114/115) May 05 2009 Here I have collected few more bits that may be interesting for D develo...
bearophile <bearophileHUGS lycos.com> writes:
Here I have collected few more bits that may be interesting for D
development/design.

-------------------


movable variable. The fixed statement is only permitted in an unsafe context:

http://msdn.microsoft.com/en-us/library/f58wzh21.aspx
http://msdn.microsoft.com/en-us/library/aa664784(VS.71).aspx

So it "pins" a variable, so the GC can't move it anymore in memory, so you can

avoid a conservative GC and keep its moving one.

You can use it for example like this:

int[,,] a = new int[2, 3, 4];
unsafe {
   fixed (int* p = a) {
      for (int i = 0; i < a.Length; ++i) // treat as linear
         p[i] = i;
   }
}

Where int[,,] are built-in multi-dimensional arrays made of a single block of
memory.

that save some memory and improve cache coherence a bit (but sometimes on
modern CPU I have seen they may end a bit slower, because they may require
integer multiplications to find items if a bitshift can't be used).

fixed can also be nested if you want to pin two or more pointers:

fixed (...) fixed (...) { ... }

The pointer is meant as fixed only inside the scope.

Where you use "fixed" to take the char* of a string, then the compiler calls
toStringz automatically.

You can also use fixed to call another function with a pointer:

class Test {
   unsafe static void Fill(int* p, int count, int value) {
      for (; count != 0; count--)
         *p++ = value;
   }
   static void Main() {
      int[] a = new int[100];
      unsafe {
         fixed (int* p = a) Fill(p, 100, -1);
      }
   }
}

I guess the compiler makes sure to never relocate the "a" array inside that
Fill() method.


everything possible to increase flexibility. D starts from an unsafe situation
and does more to give some safety.

This explains a bit how "fixed" interacts with the generational GC:
http://www.codeproject.com/KB/dotnet/pointers.aspx

Pinning has a HUGE cost to the garbage collector. I assume that you are
familiar with the generational algorithm of the garbage collection. Let us say
we allocated enough memory to fill Gen 0 Heap (the youngest), and that an
additional allocation will trigger a collection. If that very last allocation
at the end of the heap was pinned, the pinned object moves to generation 1.
(Call GC.GetGeneration(obj) and see). Gen 1 is guaranteed to grow to include
the pinned memory at the very end of the Gen 0 Heap. Even if all other memory
in Gen 0 was freed, that would still leave a huge unreclaimed space of memory
and Gen 0 will begin allocating starting from its previous limit. That is how
bad "pinning" is. [...] when you use fixed, do whatever you have do quickly and
avoid any memory allocation in the process, which can potentially trigger a
garbage collection. If a garbage collection did occur inside a fixed block,
most likely the pinned memory was close to the end of Gen 0 heap.<



For example if you run the following code (not in debug mode):

int* a = stackalloc int[n];
for (int i = 0; i < 3 * n; i++) {
    a[i] = i;
    Console.WriteLine("a[i] = {0}", a[i]);
}

With n=10 it stops running just after i=10 (1 past the length). So the runtime
is able to catch the trespassing outside the allowed memory anyway, and the
docs say it stops the program as soon as possible to avoid malicious code,
avoid troubles, etc.



that's a stack safety, not an heap one.


(often the compiler/runtime isn't able to remove array bound checks, despite
this is a supported feature) and slower than equivalent "release mode" D code.

uses a canary, or sets the memory after the array as not writeable.

After a small test with the following code that performs reads only:

int* a = stackalloc int[n];
for (int i = 0; i < 30 * n; i++) {
    Console.WriteLine("a[{0}] = {1}", i, a[i]);
}

Now the running doesn't stop, so with n=10 it stops printing when i = 299. So
there's write-safety only.

I have tried with dmd a stack-based "array":

import std.conv: toInt;
import std.c.stdlib: alloca;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    int* a = cast(int*)alloca(n * int.sizeof);
    for (int i = 0; i < 30 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

It stops printing after i = 12 (3 items after the last one). If inside the loop
I keep only the printf, it prints up to 300 and more, no read safety.


While the following code with a heap-based array:

import std.conv: toInt;
void main(string[] args) {
    int n = args.length == 2 ? toInt(args[1]) : 10;
    auto aa = new int[n];
    auto a = aa.ptr;
    for (int i = 0; i < 3000 * n; i++) {
        a[i] = i;
        printf("a[%d] = %d\n", i, a[i]);
    }
}

generates an Access Violation after i=15391, there's not much write safety.



using System;
unsafe sealed class test {
    static unsafe void Main(string[] args) {
        int n = args.Length > 0 ? Int32.Parse(args[0]) : 10;
        int[] a = new int[n];
        unsafe {
            fixed (int* p = a) {
                for (int i = 0; i < 1000 * n; ++i) {
                    p[i] = i;
                    Console.WriteLine("p[{0}] = {1}", i, p[i]);
                }
            }
        }
    }
}

prints items up to i=20 and then throws an exception:
System.IO.IOException, "The handle is invalid"

(in debug code it stops when i is about 25). So even with heap memory and in

faster, because the program stops very close to where the bug is).

Having such safety when working with pointers-based arrays is a very good
thing, I'd like to have it D too when I am not compiling in release mode. Is
this doable?

-----------------------------


0,1,2,3... of items, but the compiler sees them as powers of two, so they can
be combined bitwise:
http://weblogs.asp.net/wim/archive/2004/04/07/109095.aspx

[Flags]
public enum ClientStates {
  Ordinary,
  HasDiscount,
  IsSupplier,
  IsBlackListed,
  IsOverdrawn
}

ClientStates c = ClientStates.HasDiscount | ClientStates.IsSupplier;


for D2 too.

-----------------------------

Unrelated. (Java) 'new' considered harmful:
http://www.ddj.com/java/184405016

Bye,
bearophile
May 05 2009
D Programming

C/C++ Programming

Other

digitalmars.D - Memory safety, C#, D and more