www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Notes 5, GC performance

reply bearophile <bearophileHUGS lycos.com> writes:
1) A more uniform syntax may be better, so the following can both work:
s.sizeof
s.typeof
So later typeof() function can then be removed.



2) If I want to create an AA of simple structs I think I have to do the
following:


import std.stdio: putr = writefln;
import std.string: cmp;

void main() {
  auto s1a = "hello".dup;
  auto s1b = "hello".dup;
  int[string] aa1;
  aa1[s1a] = 10;
  aa1[s1b] = 20;
  putr(aa1);


  struct S2 {
    string s;
    string toString() { return "<" ~ this.s ~ ">"; }
  }
  auto s2a = S2("hello".dup);
  auto s2b = S2("hello".dup);
  int[S2] aa2;
  aa2[s2a] = 10;
  aa2[s2b] = 20;
  putr(aa2); // bug


  struct S3 {
    string s;

    uint toHash() {
      // return s.hash; // not possible yet
      // One-at-a-Time hash by Bob Jenkins
      // slower than built-in hash, but more uniform
      uint h = 0;
      foreach (c; cast(ubyte[])s) {
        h += c;
        h += (h << 10);
        h ^= (h >> 6);
      }
      h += (h << 3);
      h ^= (h >> 11);
      h += (h << 15);
      return h;
    }

    int opEquals(S3* other) {
      return this.s == other.s;
    }

    int opCmp(S3* other) {
      return cmp(this.s, other.s);
    }

    string toString() { return "<" ~ this.s ~ ">"; }
  }

  auto s3a = S3("hello".dup);
  auto s3b = S3("hello".dup);
  int[S3] aa3;
  aa3[s3a] = 10;
  aa3[s3b] = 20;
  putr(aa3);
}

(Strings already have a hashing functions defined, but how can I use that hash
value to avoid such One-at-a-Time hash function?)
Even better: what's the rationale behind the current (IHMO wrong/corner-case)
behavior of structs? I think they have to define an automatic toString and
hashing too. Better is to have a .hash or .toHash or tohash attribute
(property) on any built-in, or to have a global function hash() that gives the
has of anything (calling toHash when necessary, to find it recursively).



3) In thousands of functions/classes in my D libs I have code like the
following; all those functions/classes scan/use keys of the given AAs (because
an AA key is much more useful, with a key you can find the value, while with a
value you generally can't quickly find the key).

static if (IsAA!(TyItems)) {
  foreach (key, val; items) {
    // key processing...
  }
} else {
  foreach (el; items) {
    // el processing...
  }
}

So far I haven't found a way to avoid such silly code duplication.
Note IsAA!() is template true if the given type is an AA.
So I think I'd like the D foreach with 1 variable to scan by default the keys
of a given AA:
foreach(key; someAA) {...}


4) The syntax of a language changes the typical usage of its constructs, and
it's better for such constructs to be very fit to their typical usage. In D AAs
can be defined and used with a simple & short syntax, so I think people
(expecially people used to scripting languages) use them quite often, even in
situations where there are only few keys (while in C++ the usage of hash maps
is less easy, so they are used in situations where the programmer wants to put
more keys). So I think D AAs have to be quick even when there are few keys
(like 6-25 keys). I have written few benchmarks for such situations (that I
think are common), and now I think D AAs aren't much optimized for them.
I already know that scanning a short array is generally faster than looking in
an AA, but I think D AA implementation may be optimized a bit for the situation
with few keys.
In my benchmarks the scan of the array (to see if an item is present) is faster
than the "in" AA if the arrays is less than about 18 items long, if the
retrieval probability is about 1/2 and items/keys are integers.
Adding an element to a dynamic array, with a scan to be sure it's not already
present, is faster than adding an item to an AA if the number of items is less
than about 180, and this is a lot!
There are various places around too look for ideas regarding possible ways to
speed up the AAs for the situations with few keys, one of such places is the
implementation of Python dicts (in Python dicts are used everywhere (each
variable lookup is one or more dict accesses!) so they are tuned for the
situation for few keys too). If you want to look at Python dicts implementation
you can look inside its C sources, in the directory Objects, the files
dictnotes.txt and dictobject.c (now I can't give direct URLS). Python C sources
are quite more tidy than Perl ones :-)



5) I think I may have found another performance problem of the GC used by DMD,
this is a small benchmark I have created reducing a program of mine:

import std.file: read;
static import std.gc;
static import std.c.time;

double clock() {
  return std.c.time.clock() / cast(double)std.c.time.CLOCKS_PER_SEC;
}

struct xsplit {
  string str;

  int opApply(int delegate(ref string) dg) {
    int result;
    size_t i;
    size_t istart = 0;
    bool inword = false;
    size_t str_length = str.length;
    string part;

    for (i = 0; i < str_length; i++) {
      switch (str[i]) {
        case ' ', '\t', '\f', '\r', '\n', '\v':
          if (inword) {
            part = str[istart .. i];
            result = dg(part);
            if (result) goto END;
            inword = false;
          }
          break;

        default:
          if (!inword) {
            istart = i;
            inword = true;
          }
        break;
      }
    }
    if (inword) {
      part = str[istart .. i];
      result = dg(part);
      if (result) goto END;
    }

    END:
      return result;
  }
}

void main() {
  float t0 = clock();
  //string filename = "dictionary.txt"; // 503_000 distinct words, 6.3 MB
  auto words = xsplit(cast(string)read(filename));
  std.gc.disable();
  int[string] parts;
  foreach(w; words)
    parts[w] = 10;
  float t1 = clock();
  printf("%f\n", t1-t0);
}

(The std.gc.disable is nearly necessary, otherwise it burns more than 5 seconds
to create the AA).
The timings on 503_000 distinct words, 6.3 MB are:
  t1-t0 = 1.76 s
  Total running time: 3.57 s
I think it's strange that it takes more time to destroy an AA than to find all
the hashing valees and create it.

------------------------

Few little things I have recently read that may be interesting:

6) Is this problem possible in D too? If the answer is positive, how can the
language (help) avoid such bugs?
http://cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html


7) About OcaML:
OcaML saves us [...] reversing the default behavior that you see in
C++/Java/Perl/Ruby: instead of variables being nullable by default, you have to
explicitly declare that this variable you’re using is possibly null (called
"None").<
8) About const in C++:
I actually meant I would have liked C++ to have const inference. I guess I mean
I would have liked some sort of system where I didn't have to type the keyword
"const" everywhere, but the compiler would look for inconsistencies just as ML
finds type inconsistencies without needing manifest type declarations
everywhere. The flip side of that argument is that if the compiler did the
checking without declarations, I wouldn't have been forced to wade through the
code, thinking about it carefully. It's a good exercise... once...<
9) Something class-related I have read regarding the "Fan" programming language:

Mixins

Building software is often a modeling problem - figuring out how to map a
domain model into code. In an object oriented language, this typically means

classes support single inheritance of both type and implementation. Interfaces
support multiple inheritance of type, but do not support inheritance of
implementation.


interface is often a decision that haunts you. Because once you choose a class
you've burned your only chance for implementation inheritance. If you have a
complicated domain model, then interfaces become a necessary burden - but often
end up resulting in a lot of busy work if you need them to have common
implementation code. Interfaces are also fraught with peril when it comes to
versioning because you can't add a method without breaking all of the
implementing code.


class/interface model. Multiple inheritance offers lots of power, but comes at
the expense of complexity and some pretty nasty pitfalls. Fan takes a middle of

that can have method implementations. To avoid some of the pitfalls of true
multiple inheritance, mixins restrict features such as fields which store
state. Mixins are a very nice feature in the Fan toolbox when it comes to
designing your object oriented models.
<

Bye,
bearophile
Apr 16 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
...
 (The std.gc.disable is nearly necessary, otherwise it burns more than 5
seconds to create the AA).
 The timings on 503_000 distinct words, 6.3 MB are:
   t1-t0 = 1.76 s
   Total running time: 3.57 s
 I think it's strange that it takes more time to destroy an AA than to find all
the hashing valees and create it.
The Phobos GC is not aware of types contained in AAs, so it will walways scan AA nodes regardless of what's in them. Tango's GC is a bit smarter (and I have a ticket pending to improve it further), but it will still not be perfect because the compiler only passes in TypeInfo for the key, not the value, and both are stored in the same memory block so if one must be scanned then the other will be also. This could be the cause of the slowdown you're seeing. Sean
Apr 16 2008
prev sibling next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
bearophile wrote:
 1) A more uniform syntax may be better, so the following can both work:
 s.sizeof
 s.typeof
 So later typeof() function can then be removed.
I wouldn't mind .typeof for variables. But typeof() can be applied to an expression too. So would you recommend we do (a+b).typeof in that case?
 3) In thousands of functions/classes in my D libs I have code like the
following; all those functions/classes scan/use keys of the given AAs (because
an AA key is much more useful, with a key you can find the value, while with a
value you generally can't quickly find the key).
 
 static if (IsAA!(TyItems)) {
   foreach (key, val; items) {
     // key processing...
   }
 } else {
   foreach (el; items) {
     // el processing...
   }
 }
 
 So far I haven't found a way to avoid such silly code duplication.
 Note IsAA!() is template true if the given type is an AA.
 So I think I'd like the D foreach with 1 variable to scan by default the keys
of a given AA:
 foreach(key; someAA) {...}
Agreed. This is how Python does it too. Makes a lot more sense to me. I think Walter's argument was that an AA is just an array with generalized indices. foreach(x; array) doesn't iterate over keys (the numbers 0..array.length-1), so therefore foreach(x; someAA) shouldn't either. I disagree, but that's what Walters's argued in the past. They are similar syntactically, but the use cases are quite distinct. I don't think the syntactic similarity should be put ahead of utility. But it's a good example you bring up -- because I think it directly counter's Walter's argument. His argument is that the syntactic similarity should be honored over functionality concerns because of "generic code". And yet here you have an example of generic code where you are saying that adherence to syntactic similarities is actually causing you headaches. So that's good evidence from real-world code that the generic code argument is bogus.
 4) The syntax of a language changes the typical usage of its constructs, and
it's better for such constructs to be very fit to their typical usage. In D AAs
can be defined and used with a simple & short syntax, so I think people
(expecially people used to scripting languages) use them quite often, even in
situations where there are only few keys (while in C++ the usage of hash maps
is less easy, so they are used in situations where the programmer wants to put
more keys). So I think D AAs have to be quick even when there are few keys
(like 6-25 keys). I have written few benchmarks for such situations (that I
think are common), and now I think D AAs aren't much optimized for them.
 I already know that scanning a short array is generally faster than looking in
an AA, but I think D AA implementation may be optimized a bit for the situation
with few keys.
 In my benchmarks the scan of the array (to see if an item is present) is
faster than the "in" AA if the arrays is less than about 18 items long, if the
retrieval probability is about 1/2 and items/keys are integers.
 Adding an element to a dynamic array, with a scan to be sure it's not already
present, is faster than adding an item to an AA if the number of items is less
than about 180, and this is a lot!
 There are various places around too look for ideas regarding possible ways to
speed up the AAs for the situations with few keys, one of such places is the
implementation of Python dicts (in Python dicts are used everywhere (each
variable lookup is one or more dict accesses!) so they are tuned for the
situation for few keys too). If you want to look at Python dicts implementation
you can look inside its C sources, in the directory Objects, the files
dictnotes.txt and dictobject.c (now I can't give direct URLS). Python C sources
are quite more tidy than Perl ones :-)
 
Agreed. Smart people have spent a lot of time benchmarking and improving the performance of dicts in Python. And Python's license is quite liberal. http://www.python.org/psf/license/ Basically you can do anything you want with it, but you'd probably have to tack on a "Portions copyright PSF" in any place where you list DMD's copyright info. --bb
Apr 16 2008
prev sibling parent reply "Janice Caron" <caron800 googlemail.com> writes:
On 16/04/2008, bearophile <bearophileHUGS lycos.com> wrote:
  3) In thousands of functions/classes in my D libs I have code like the
following; all those functions/classes scan/use keys of the given AAs (because
an AA key is much more useful, with a key you can find the value, while with a
value you generally can't quickly find the key).

  static if (IsAA!(TyItems)) {
   foreach (key, val; items) {
     // key processing...
   }
  } else {
   foreach (el; items) {
     // el processing...
   }
  }

  So far I haven't found a way to avoid such silly code duplication.
What's wrong with foreach (key, val; items) {...} ? That works for AAs, regular arrays, and any class which overloads opApply with two variables.
Apr 16 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Janice Caron:
? That works for AAs, regular arrays, and any class which overloads opApply
with two variables.<
Generally you can't be sure an iterable class/struct supports two indexes too (for example my hyper-fast primes generator doesn't offer the index argument too to avoid code duplication). Bye, bearophile
Apr 17 2008