www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Port of Python's difflib.SequenceMatcher class

reply Michael Butscher <mbutscher gmx.de> writes:
Hi, 

a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
generate diff's is available at

  http://www.mbutscher.de/snippets/difflib_d20061202.zip

It might need some cleaning up yet but the translated doctests pass 
(except one I couldn't make compile in D, but "in theory" it passes as 
well).

Comments, critique?



Michael
Dec 02 2006
next sibling parent Walter Bright <newshound digitalmars.com> writes:
Michael Butscher wrote:
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

Yes: please put up a web page about it! See http://www.digitalmars.com/d/howto-promote.html
Dec 02 2006
prev sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Michael Butscher wrote:
 Hi, 
 
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

I agree with Walter that you should throw this up on a page somewhere. I'm curious, but rarely have time to sift through sourcecode unless I'm in need of something specific - I develop using SVN 99% of the time, which does .diff output for me already. But I *am* curious about how the porting went, what the pitfalls were, and how you worked around Python idioms and tuple types. Also, I'm wondering if the D version brings any extra perks like better performance, or less/clearer code? -- - EricAnderton at yahoo
Dec 04 2006
parent reply Michael Butscher <mbutscher gmx.de> writes:
Pragma wrote:
 Michael Butscher wrote:
 Hi, 
 
 a D port (version 0.175) of Python's difflib.SequenceMatcher class to 
 generate diff's is available at
 
   http://www.mbutscher.de/snippets/difflib_d20061202.zip
 
 It might need some cleaning up yet but the translated doctests pass 
 (except one I couldn't make compile in D, but "in theory" it passes as 
 well).
 
 Comments, critique?

I agree with Walter that you should throw this up on a page somewhere.

At least I have mentioned it on the page http://www.mbutscher.de/software.html as a "snippet" (it isn't much more, I think).
 I'm curious, but rarely have time to sift through sourcecode unless I'm 
 in need of something specific - I develop using SVN 99% of the time, 
 which does .diff output for me already.

I will need it later for a project written in Python (kind of personal wiki without server) to allow to store different versions of a wiki page. When the time comes, I will add a little C interface for a DLL which mainly can create some sort of binary diff of two arbitrary byte-blocks and allows to apply the diff to the first block to create the second.
 But I *am* curious about how the porting went, what the pitfalls were, 
 and how you worked around Python idioms and tuple types.

- The often used "self" was just translated to "this" therefore the code looks a bit weird in D, e.g.: void set_seq2(ST b) { if (b is this.b) return; this.b = b; this.matching_blocks = null; this.opcodes = null; this.fullbcount = null; this.chain_b(); } - One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like IndexType j2lenget(IndexType i, IndexType def) { IndexType* result = i in j2len; if (result) return *result; else return def; } Probably this can be done more elegantly, but I personally think that get() should be a standard method of AAs. - The class used only two types of tuples which had clear purposes, so they were translated into structs without much harm.
 Also, I'm 
 wondering if the D version brings any extra perks like better 
 performance, or less/clearer code?

I have not yet done any benchmarks, but I just assume that D is much faster. The D code is a bit longer and IMHO a bit less readable than Python, but I'm much more used to Python than D. Michael
Dec 06 2006
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Michael Butscher wrote:

 - One thing I really missed in D was the get() method for Python 
 dictionaries with a default argument. Therefore I created inner 
 functions like
 
         IndexType j2lenget(IndexType i, IndexType def)
         {
             IndexType* result = i in j2len;
             if (result)
                 return *result;
             else
                 return def;
         }
 
 Probably this can be done more elegantly, but I personally think that
 get() should be a standard method of AAs.

+1. Me too. If IFTI were smarter, something like this would do the trick: V get(V,K)(V[K] dict, K key, V def = V.init) { V* ptr = key in dict; return ptr? *ptr: def; } The property trick works for AA's too so taking one instance of that: char[] get(char[][int] dict, int key, char[] def = null) { char[]* ptr = key in dict; return ptr? *ptr: def; } you can do: char[][int] i2s; i2s[1] = "Hello"; i2s[5] = "There"; writefln( i2s.get(1, "yeh") ); writefln( i2s.get(2, "default") ); writefln( i2s.get(1) ); writefln( i2s.get(2) ); Too bad the template version doesn't work. D doesn't seem to be able to pick out the V and K from an associative array argument. --bb
Dec 06 2006
next sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Bill Baxter wrote:
 Michael Butscher wrote:
 
 - One thing I really missed in D was the get() method for Python 
 dictionaries with a default argument. Therefore I created inner 
 functions like

         IndexType j2lenget(IndexType i, IndexType def)
         {
             IndexType* result = i in j2len;
             if (result)
                 return *result;
             else
                 return def;
         }

 Probably this can be done more elegantly, but I personally think that
 get() should be a standard method of AAs.

+1. Me too. If IFTI were smarter, something like this would do the trick: V get(V,K)(V[K] dict, K key, V def = V.init) { V* ptr = key in dict; return ptr? *ptr: def; }

And what compiler do you use? The above code works perfectly. :) The following two get functions have been part of my own standard imports for quite a while and I find them very handy. T get(T,U)(T[U] aa, U key) { T* ptr = key in aa; return ptr ? *ptr : T.init; } bool get(T,U,int dummy=1)(T[U] aa, U key, out T val) { T* ptr = key in aa; if (!ptr) return false; val = *ptr; return true; } /Oskar
Dec 07 2006
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Bill Baxter wrote:

 V get(V,K)(V[K] dict, K key, V def = V.init)
 {
     V* ptr = key in dict;
     return ptr? *ptr: def;
 }
 

     char[][int] i2s;
     i2s[1] = "Hello";
     i2s[5] = "There";
 
     writefln( i2s.get(1, "yeh") );
     writefln( i2s.get(2, "default") );
     writefln( i2s.get(1) );
     writefln( i2s.get(2) );
 
 Too bad the template version doesn't work.
 D doesn't seem to be able to pick out the V and K from an associative 
 array argument.

Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3]. writefln( i2s.get(1, "yeh"[]) ); writefln( i2s.get(2, "default"[]) ); both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type. /Oskar
Dec 07 2006
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Oskar Linde wrote:
 Bill Baxter wrote:
 
 V get(V,K)(V[K] dict, K key, V def = V.init)
 {
     V* ptr = key in dict;
     return ptr? *ptr: def;
 }

     char[][int] i2s;
     i2s[1] = "Hello";
     i2s[5] = "There";

     writefln( i2s.get(1, "yeh") );
     writefln( i2s.get(2, "default") );
     writefln( i2s.get(1) );
     writefln( i2s.get(2) );

 Too bad the template version doesn't work.
 D doesn't seem to be able to pick out the V and K from an associative 
 array argument.

Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3]. writefln( i2s.get(1, "yeh"[]) ); writefln( i2s.get(2, "default"[]) ); both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type. /Oskar

Oh, ok. So I was right, but for the wrong reason. :-) The compiler message wasn't very specific about what it didn't like, just "no match" was all it was willing to divulge. These char[] char[N] conversion issues are rather annoying. --bb
Dec 07 2006