www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - K-Nearest Neighbor

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Just found this: 
http://www.reddit.com/r/programming/comments/27qjxd/knearest_neighbor_in_d_language/

Andrei
Jun 09 2014
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 06/09/2014 06:34 PM, Andrei Alexandrescu wrote:
 Just found this:
 http://www.reddit.com/r/programming/comments/27qjxd/knearest_neighbor_in_d_language/


 Andrei
I wonder what bearophile's response will be. ;) Ali
Jun 09 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Ali Cehreli:

 I wonder what bearophile's response will be. ;)
Despite looking like a silly sequence of optimizations, I do have some general comments on that text. Thanks to Kenji (https://github.com/D-Programming-Language/dmd/pull/3650 ) this code is now valid: void foo(size_t N)(ref int[N] b) if (N == 4) {} void main() { int[5] a; foo(a[1 .. $]); } The D type system is able to understand that if you slice away the first item of an an array of 5 items, you produce a pointer to an array of 4 items. But the D static typing is not very strong (precise), D is not yet using all the fruits given by static typing. D throws away the compile-time knowledge about pointer alignments, so if you write code like this without the 7 shorts of padding, the program crashes at run-time, because of misalignment: uint distance(immutable ref short[nCols - 1] p1, immutable ref short[nCols - 1] p2) pure nothrow nogc { alias TV = short8; enum size_t Vlen = TV.init.array.length; assert(p1.length % Vlen == 0); immutable v1 = cast(immutable TV*)p1.ptr; immutable v2 = cast(immutable TV*)p2.ptr; TV totV = 0; foreach (immutable i; 0 .. p1.length / Vlen) { TV d = v1[i] - v2[i]; totV += d * d; } uint tot = 0; foreach (immutable t; totV.array) tot += t; return tot; } TLabel classify(immutable short[nCols][] trainingSet, immutable ref short[nCols - 1] pixels) pure nothrow nogc { auto closestDistance = uint.max; auto closestLabel = TLabel.max; foreach (immutable ref s; trainingSet) { immutable dist = pixels.distance(s[1 .. $]); if (dist < closestDistance) { closestDistance = dist; closestLabel = s[labelIndex]; } } return closestLabel; } In this program there is all the information necessary to compute simply at compile-time the alignment of v1 and v2 and generate a compile-time error if you try to perform SIMD operations using such unaligned pointers. I don't like D to throw away static information that can be used to avoid run-time crashes, this is the opposite of what is usually called a safe language. D type system is able to keep the length of arrays at compile-time, allowing data types like ushort[N], but in a system language that allows such simple usage of SIMD with core.simd it's also useful to encode in the pointer/array type the alignment. So this code should not compile: uint distance(immutable ref short[nCols - 1] p1, immutable ref short[nCols - 1] p2) pure nothrow nogc { alias TV = short8; enum size_t Vlen = TV.init.array.length; assert(p1.length % Vlen == 0); immutable v1 = cast(immutable TV*)p1.ptr; immutable v2 = cast(immutable TV*)p2.ptr; TV totV = 0; foreach (immutable i; 0 .. p1.length / Vlen) { TV d = v1[i] - v2[i]; totV += d * d; While this should compile: uint distance(immutable ref align(16) short[nCols - 1] p1, immutable ref align(16) short[nCols - 1] p2) pure nothrow nogc { alias TV = short8; enum size_t Vlen = TV.init.array.length; assert(p1.length % Vlen == 0); immutable v1 = cast(immutable TV*)p1.ptr; immutable v2 = cast(immutable TV*)p2.ptr; TV totV = 0; foreach (immutable i; 0 .. p1.length / Vlen) { TV d = v1[i] - v2[i]; totV += d * d; And now this function that calls distance should not compile: TLabel classify(immutable short[nCols][] trainingSet, immutable ref short[nCols - 1] pixels) pure nothrow nogc { auto closestDistance = uint.max; auto closestLabel = TLabel.max; foreach (immutable ref s; trainingSet) { immutable dist = pixels.distance(s[1 .. $]); // error And now this should compile: TLabel classify(immutable align(16) short[nCols][] trainingSet, immutable ref align(16) short[nCols - 1] pixels) pure nothrow nogc { auto closestDistance = uint.max; auto closestLabel = TLabel.max; foreach (immutable ref s; trainingSet) { immutable dist = pixels.distance(s[8 .. $]); And this should compile because std.file.read returns memory allocated by the GC that is align(16): align(16) immutable(short[nCols])[] readData(size_t nCols)(in string fileName) { return cast(typeof(return))std.file.read(fileName); } Where the alignment is not known at compile-time the D compiler could add run-time alignment asserts in debug builds, to give nice run-time error messages. The simpler int[4] or int[] types are still valid and usable, they could be align(1). If you think of them as align(16) you are writing faith-based code when you use SIMD instructions. Automatic variables (stack-allocated) too could allow alignment annotations (perhaps ldc2 is already supporting this syntax): void main() { align(16) ubyte[60] ubs; } I discussed this topic another time in past: http://forum.dlang.org/thread/wldsugyllfpnzeksupku forum.dlang.org Bye, bearophile
Jun 10 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/10/2014 1:46 AM, bearophile wrote:
 I don't like D to
 throw away static information that can be used to avoid run-time crashes, this
 is the opposite of what is usually called a safe language.
To be pedantic, D being a "safe" language means "memory safe", not "no seg faults of any sort". Memory safety means that all memory accessed is valid memory, i.e. no memory corruption. http://en.wikipedia.org/wiki/Memory_safety Memory alignment seg faults are something else entirely.
Jun 10 2014