digitalmars.D - Re: Few things II

bearophile <bearophileHUGS lycos.com> Aug 11 2007

BCS <ao pathlink.com> Aug 11 2007

bearophile <bearophileHUGS lycos.com> writes:

Oskar Linde:

Many times people have been posting improvements to the built in AA. The
problem is proving that the improvements really are better for the general
case. The hardest part has proven to be defining the "general case".<


I understand. We may collect a large repository of code/snippets that use AAs,
so they can be tuned against that repository.

-----------------

BCS:

I am currently working on a project that is written in C# but which we are
considering converting to D. The use of yield has turned out to be a
significant performance problem. This is the reson I don't like it.<


I see. Let's assume you are right, that yield is too much slow for a low level
language (as D. D is a low leven language that is able to be used as
medium-level language too).
I think yield is good if you use Python, that's a high level language.
I also think a language like D can have both some hi-level constructs and some
lower level ones. Often many parts of a program don't need max speed (a
profiler usually shows just few hot spots in a program), so there you may use
yield. Where speed is more important the language can offer you means to go
faster. This looks like a win-win situation to me (but the programmer must know
what language constructs are slow, so they don't have to be used in speed
critical parts of the code). After developing ShedSkin I think languages fit
for both low level and high level programming can be designed (but a programmer
has to know them well to avoid using the slow features in the speed/memory
critical parts of the code).


D's opApply syntax IMO seems much cleaner and easier to understand.<


I can't agree (but I know Python much more than D, so I am biased), yield of
Python is very simple :-) Here are some examples (to reduce space I have
reduced the indent from 4 to 2 spaces, I have removed most comments and all the
docstrings):

Space-efficient and lazy version of sieve of Eratosthene, by Eppstein:

def xsieve():
  yield 2         # Only even prime.  Sieve only odd numbers.

  # Generate recursively the sequence of primes up to sqrt(n).
  # Each p from the sequence is used to initiate sieving at p*p.
  roots = xsieve()
  root = roots.next()
  square = root*root

  # The main sieving loop.
  # We use a hash table D such that D[n]=2p for p a prime factor of n.
  # Each prime p up to sqrt(n) appears once as a value in D, and is
  # moved to successive odd multiples of p as the sieve progresses.
  D = {}
  n = 3
  while True:
    if n >= square:   # Time to include another square?
      D[square] = root+root
      root = roots.next()
      square = root*root

    if n not in D:    # Not witnessed, must be prime.
      yield n
    else:         # Move witness p to next free multiple.
      p = D[n]
      q = n+p
      while q in D:
        q += p
      del D[n]
      D[q] = p
    n += 2        # Move on to next odd number.


Generator that returns the integer codons of the given (string) sequence:

def xcodons(seq, overlap=0):
  assert 0 <= overlap <= 2
  len_seq = len(seq)
  pos = 0
  while len_seq - pos >= 3:
    yield seq[pos : pos+3]
    pos += 3 - overlap


Divides a given 2D mat in equal blocks, of given size, and yields them:

def xblockfy(mat, blocknrows, blockncols):
  if blocknrows <= 0: raise ValueError("blocknrows must be >= 1")
  if blockncols <= 0: raise ValueError("blockncols must be >= 1")
  if mat and mat[0]:
    nx = len(mat[0])
    ny = len(mat)
    for y in xrange(0, ny, blocknrows):
      ymax = min(y+blocknrows, ny)
      for x in xrange(0, nx, blockncols):
        block = []
        for yy in xrange(y, ymax):
          block.extend(mat[yy][x: x+blockncols])
        yield block


Arithmetic progression generator, that accepts noninteger increments too:

def xfrange(start, end=None, inc=None):
  if inc is None:
    inc = 1.0
  if end is None:
    end = start + 0.0
    start = 0.0
  else:
    start = float(start)
  count = int( ceil((end-start)/inc) )
  for i in xrange(count):
    yield start + i * inc


Lazy iterable split, with key function too (I have created a similar thing for
D):

def xsplit(seq, key=bool, keepkeys=True):
  group = []
  for el in seq:
    if key(el):
      if group:
        yield group
      group = []
      if keepkeys:
        group.append(el)
    else:
      group.append(el)
  yield group


Yields all the distinct pairs of the given seq:

def xpairs(seq):
  lung = len(seq)
  for i,e1 in enumerate(seq):
    for j in xrange(i+1, lung):
      yield e1, seq[j]


Yields all the positions of item in a given iterable:

def xpositions(seq, item):
  for pos, el in enumerate(seq):
    if el == item:
      yield pos


I know you can do those things with D too, this is an example of a translation
from Python that uses yield to D, a lazy generator of all the permutations of a
sequence, algorithm from Phillip Paul Fuchs, modified:

def xpermutations(items, copy=True):
  if copy:
    items = list(items)
    n = len(items)
    p = range(n+1)
    i = 1
    yield items[:]
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items[:]
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1
  else:
    items = items[:]
    n = len(items)
    p = range(n+1)
    i = 1
    yield items
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1


Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true) {
  return new Xpermutations!(TyItem)(items, copy);
}

class Xpermutations(TyItem) {
  int[] range(int stop) {
    int[] result;

    if (stop > 0) {
      result.length = stop;
      for(int i = 0; i < stop; i++)
        result[i] = i;
    }
    return result;
  }

  TyItem[] items;
  bool copy;

  this(TyItem[] items, bool copy=true) {
    this.items = items.dup;
    this.copy = copy;
  }

  int opApply(int delegate(ref TyItem[]) dg) {
    int result;

    int n = items.length;
    TyItem aux;
    auto p = range(n + 1);
    int i = 1;

    if (copy) { // yield duplicated arrays (safer)
      auto items2 = items.dup;
      result = dg(items2);  if (result) goto END; // yield items2
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        items2 = items.dup;
        result = dg(items2);  if (result) goto END; // yield items2
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    } else { // yield just references (faster)
      result = dg(items);  if (result) goto END; // yield items
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        result = dg(items);  if (result) goto END; // yield items
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    }

    END:
    return result;
  }
}


(If you spot problems in that D code you are encouraged to say it, I am a
newbie of D still).
They do similar things, but I can't see how you can say that opApply of D
"seems much cleaner and easier to understand" than Python yield :-)


OTOH you can fake it quiet well with an delegate that stores most of it's state
in an object.<


Something quite similar to a working version of your code is the builtin Python
function iter(), that given an iterable (like an array, an iterable object,
etc) return a lazy iterable object that allows to scan its elements in a
flexible way (I have created some similar functions with D). But iter() doesn't
replace yield.

Bye and thank you for the answers,
bearophile

Aug 11 2007

BCS <ao pathlink.com> writes:

Reply to bearophile,

 Oskar Linde:
 
 Many times people have been posting improvements to the built in AA.
 The problem is proving that the improvements really are better for
 the general case. The hardest part has proven to be defining the
 "general case".<
 


 use AAs, so they can be tuned against that repository.
 
 -----------------
 
 BCS:
 
 D's opApply syntax IMO seems much cleaner and easier to understand.<
 


 yield of Python is very simple :-) Here are some examples 


[dropped python code]

 I know you can do those things with D too, this is an example of a
 translation from Python that uses yield to D, a lazy generator of all
 the permutations of a sequence, algorithm from Phillip Paul Fuchs,
 modified:
 


[dropped D code]

 (If you spot problems in that D code you are encouraged to say it, I
 am a newbie of D still).


I don't see anything that is a problem (I don't follow the details though). 
I'll put a copy at the end that has some edits, mostly for code space and 
editablility.

 
 They do similar things, but I can't see how you can say that opApply
 of D "seems much cleaner and easier to understand" than Python yield
 :-)


Maybe I under-spoke, the opApply is much more transparent. the system is 
doing, and hiding, a lot less complexity. IMO a system hiding to much
complexity 
is not a good thing. Also, opApply is working within the existing constraints 
of the language.
 
 OTOH you can fake it quiet well with an delegate that stores most of
 it's state in an object.<
 


 builtin Python function iter(), that given an iterable (like an array,
 an iterable object, etc) return a lazy iterable object that allows to
 scan its elements in a flexible way (I have created some similar
 functions with D). But iter() doesn't replace yield.
 
 Bye and thank you for the answers,
 bearophile





|Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true)
|{
|  if(copy)
|    return new Xpermutations!(TyItem, true)(items);
|  else
|    return new Xpermutations!(TyItem, false)(items);
|}
|
|class Xpermutations(TyItem, bool copy)
|{
|  int[] range(int stop)
|  {
|    int[] result;
|    if (stop > 0)
|    {
|      result.length = stop;
|      for (int i = 0; i < stop; i++)
|        result[i] = i;
|    }
|    return result;
|  }
|
|  TyItem[] items;
|  bool copy;
|
|  this(TyItem[] items, bool copy=true)
|  {
|    this.items = items.dup;
|    this.copy = copy;
|  }
|
|  int opApply(int delegate(ref TyItem[]) dg)
|  {
|    TyItem[] items2 = items;
|
|    static if(copy) items2 = items.dup;
|    if (int result = dg2(items2)) return result; // yield items2
|
|    int n = items.length;
|    int i = 1;
|    auto p = range(n + 1);
|    TyItem aux;
|
|    while (i < n)
|    {
|      p[i] -= 1;
|      int j = (i & 1) ? p[i] : 0;
|      aux = items[j];
|      items[j] = items[i];
|      items[i] = aux; // swap
|      
|      static if(copy) items2 = items.dup;
|      if (int result = dg2(items2)) return result; // yield items2
|      
|      i = 1;
|      
|      while (p[i] == 0)
|      {
|        p[i] = i;
|        i++;
|      }
|    }
|  return 0;
|  }
|}

Aug 11 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Few things II