www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Algorithms in the std lib

reply bearophile <bearophileHUGS lycos.com> writes:
Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

Every programmer has special needs that require external modules, a standard
lib can't contain everything. But often you can find two levels of needs: a
more general one and a more specialized one. For example in the std lib of
Python you can find lazy generators of permutations and combinators. They are
often useful, but they can't be enough for people that work a lot with with
combinatorics. Such people have to write or download a module with much more
specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better
the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango,
in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted. 

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of
bisect()), used a binary search and answers true if the item is present, false
otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate
on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint. 


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of
'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This
iterable object yields consecutive key(item) and group arrays from the given
iterable that have the same result or are all true or all false (it can also
yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll probably want
to translate them for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.

Bye,
bearophile

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.Soggetto: Algorithms in the std lib

Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

Every programmer has special needs that require external modules, a standard
lib can't contain everything. But often you can find two levels of needs: a
more general one and a more specialized one. For example in the std lib of
Python you can find lazy generators of permutations and combinators. They are
often useful, but they can't be enough for people that work a lot with with
combinatorics. Such people have to write or download a module with much more
specific stuff.

So I think the std lib of D may enjoy some more algorithms, to fulfill better
the "general needs".
Some of the following algorithms may be already present in Phobos and/or Tango,
in such case please ignore them.

The following stuff is already implemented in my dlibs, and I use it:

bisect:
  Return the index where to insert item x in an array a, assuming a is sorted. 

isInSorted:
  Given a sorted array, an item, and an optional key callable (like the key of
bisect()), used a binary search and answers true if the item is present, false
otherwise.

binomial(n, k)
factorial
combinations/xcombinations
permutations/xpermutations
subsets/xsubsets
xlexicalPermutations
apply (see lisp)
mapApply(map an apply on an iterable)

convexHull2D
polygonCentroid
RGB2HSV
HSBV2RGB
floodFill

all/any
  (all or any of the items of an iterable are true or is true a given predicate
on them)

nextPow2
isPow2
powMod

reverseBits
  Quickly reverse the bits in a ubyte/uint. 


partition/xpartition
  splits an iterable in sub-iterables of fixed length

gcd

xprimes (yields primes lazily at hyper speed)

nrank
median/medianObj

thousands(n)  (1000000 => "1_000_000")
joinArray
xsplit
numSorted (smart sort foo1x comes before foo10x)

allEqual
filterAA (associative arrays)
get a.get(2, "def") ==> "def" (for associative arrays)

rotateLeft/rotateRight (for arrays)
AAintersection

maxs
  Given an iterable 'iter', return a new array of the equally maximum items of
'iter'.
mins

xgroupBy/xgroupBync
  This is really useful: it takes an iterable and an optional predicate. This
iterable object yields consecutive key(item) and group arrays from the given
iterable that have the same result or are all true or all false (it can also
yield just the group arrays, or it can yield the index too).

pop (removes last of an array/AA and returns it)
prepend (for arrays, adds at the start)

There is a refined function to flatten nested iterables.

----------------

The following stuff is present in my Python "bag of tricks", I'll write some of
this for D1 too:

baseConvert (base[1,64] => base[1, 64])
To convert numbers to strings from/to any base.

K-means

IntervalMap (map of intervals, a kind of associative data structure)

toposort (topological sort, already in Graph below, but also as a single
external function for simpler usages).

Odict (associative array where items are also ordered according to the
insertion order, they are in a double linked list)

mean
mode
stdev

digitalLine
  Brensenham line algorithm, returns an iterator of (x,y) coords (not useful
for the screen)
digitalCircle
segmentSegmentIntersection

triangleContains (predicate, is point into triangle?)

A directed Graph data structure, the class contains most common graph
algorithms.
http://sourceforge.net/projects/pynetwork/
(I have already partially implemented this in D1).

-----------------

You may want to add some things to this list, or to remove things you think
aren't useful enough for lot of people.

Bye,
bearophile
Feb 17 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Seeing how the Phobos is being worked on, I think it's a good moment to think
about other things that it can eventually contain.
 In all the languages that I use I slowly create a lib of algorithms and data
structures and similar stuff.

Ok, I can implement all that... but twice?!? :o) Andrei
Feb 17 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

Sorry for the double posting. And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-) Bye, bearophile
Feb 17 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Daniel de Kok:
 How about suffix array construction?

I agree that it's useful to have in a std lib (I have had to implement it some times, but not in D yet). (In those dlibs there are other things like Kd-trees and Range Minimum Query algorithms and data structures, that despite being nice and all are probably not useful to enough people). Bye, bearophile
Feb 17 2009
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Daniel de Kok:
 http://static.danieldk.org/snippets/d/sarr.d

That code needs much more unittests, that test all corner cases you can think of too. Bye, bearophile
Feb 17 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel de Kok wrote:
 On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-)

How about suffix array construction? It's a very underused useful data structure where you compose a parallel array to data array that is sorted by suffix. This is often used in natural language processing to be able to quickly count the frequency of arbitrary long n-grams without having to store counts of n-grams in a large hash map or traversing the data array in linear order. This is a probably pretty lame implementation that I made as one of my first tiny D programs (I have only started to learn D in tiny bits): http://static.danieldk.org/snippets/d/sarr.d Take care, Daniel (P.s. there are some algorithms that can construct suffix arrays faster than a regular sort, but they usually have a lot of requirements, like perfect hashing of all the elements.)

Yah; in fact I already have a Trie implementation (prefix tree) hanging on around std. I'm just afraid to bite the bullet in choosing a container overall design. I'd need more experience, and currently bugs in dmd's copy constructors prevent most experimentation with value types. Andrei
Feb 17 2009
prev sibling next sibling parent Daniel de Kok <me danieldk.org> writes:
On Tue, Feb 17, 2009 at 5:50 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu:
 Ok, I can implement all that... but twice?!? :o)

Sorry for the double posting. And regarding the implementation, first of all it can be seen if you and other people think it's generally useful stuff. Later I'll offer some/most implementations :-)

How about suffix array construction? It's a very underused useful data structure where you compose a parallel array to data array that is sorted by suffix. This is often used in natural language processing to be able to quickly count the frequency of arbitrary long n-grams without having to store counts of n-grams in a large hash map or traversing the data array in linear order. This is a probably pretty lame implementation that I made as one of my first tiny D programs (I have only started to learn D in tiny bits): http://static.danieldk.org/snippets/d/sarr.d Take care, Daniel (P.s. there are some algorithms that can construct suffix arrays faster than a regular sort, but they usually have a lot of requirements, like perfect hashing of all the elements.)
Feb 17 2009
prev sibling parent Daniel de Kok <me danieldk.org> writes:
On Tue, Feb 17, 2009 at 7:05 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Daniel de Kok:
 http://static.danieldk.org/snippets/d/sarr.d

That code needs much more unittests, that test all corner cases you can think of too.

Indeed, thanks for the suggestion!
Feb 17 2009