www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Covert a complex C header to D

reply biocyberman <biocyberman gmail.com> writes:
khash.h 
(http://attractivechaos.github.io/klib/#Khash%3A%20generic%20hash%20table) is a
part of klib library in C. I want to covert it to D in the process of learning
deeper about D.

First I tried with Dstep 
(https://github.com/jacob-carlborg/dstep) and read the C to D 
article (https://dlang.org/ctod.html). I managed to covert the 
basic statements to D, but all multiline 'define' macros are 
stripped off. So I am trying to recreate them with D way. For 
example:


#define __KHASH_TYPE(name, khkey_t, khval_t) \
	typedef struct kh_##name##_s { \
		khint_t n_buckets, size, n_occupied, upper_bound; \
		khint32_t *flags; \
		khkey_t *keys; \
		khval_t *vals; \
	} kh_##name##_t;


I changed to:

template __KHASH_TYPE(string name){
   "struct  kh_" ~ name ~"_t { " ~
                 "khint_t n_buckets, size, n_occupied, 
upper_bound; " ~
                 "khint32_t *flags; " ~
                 "khkey_t *keys; " ~
                 "khval_t *vals; " ~
         "}"

}

// NEXT: use mixin with this template.

I am currently get a bit intimidated looking at KHASH_INIT2 macro 
in khash.c. How do I convert this to the equivalent and idiomatic 
D?
Apr 02
next sibling parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Sunday, 2 April 2017 at 21:43:52 UTC, biocyberman wrote:
 khash.h 
 (http://attractivechaos.github.io/klib/#Khash%3A%20generic%20hash%20table) is
a part of klib library in C. I want to covert it to D in the process of
learning deeper about D.

 First I tried with Dstep 
 (https://github.com/jacob-carlborg/dstep) and read the C to D 
 article (https://dlang.org/ctod.html). I managed to covert the 
 basic statements to D, but all multiline 'define' macros are 
 stripped off. So I am trying to recreate them with D way. For 
 example:


 #define __KHASH_TYPE(name, khkey_t, khval_t) \
 	typedef struct kh_##name##_s { \
 		khint_t n_buckets, size, n_occupied, upper_bound; \
 		khint32_t *flags; \
 		khkey_t *keys; \
 		khval_t *vals; \
 	} kh_##name##_t;


 I changed to:

 template __KHASH_TYPE(string name){
   "struct  kh_" ~ name ~"_t { " ~
                 "khint_t n_buckets, size, n_occupied, 
 upper_bound; " ~
                 "khint32_t *flags; " ~
                 "khkey_t *keys; " ~
                 "khval_t *vals; " ~
         "}"

 }

 // NEXT: use mixin with this template.

 I am currently get a bit intimidated looking at KHASH_INIT2 
 macro in khash.c. How do I convert this to the equivalent and 
 idiomatic D?
You are on the right track, converting #define's that declare symbols to template strings to be mixed in. But you also need to parameterise the key type and the value type as they are also arguments to the macro. so you'd go mixin( __KHASH_TYPE("mytype",string, int)); However it is generally considered better to use templates where possible as they are generally astir to reason about (and look nicer). Since this is a relatively simple case we could just go: struct kh_hashtable_t(string name,K,V) { //kh_hashtable_t is a struct parameterised on the types K and V khint_t n_buckets, size, n_occupied, upper_bound; khint32_t *flags; K *keys; V *vals; } and not worry about "name", the compiler will generate an internal name for us. Doesn't matter what it is, but it is guaranteed to be unique which is the main property we want. We probably don't even need the nam parameter at all. (there is also the builtin hash table declared V[K] e.g. int[string] i.e. a hash table of ints indexed by strings.). So for KHASH_INIT2: the argument to the macro are name: a string scope: a protection modifier (in C they use static inline, in D this would be pragma(inline, true) private. But I would ignore this parameter. khkey_t: the key type khval_t: the value type kh_is_map: a bool (not sure of its purpose). __hash_func: the function used to generate a hash from the key __hash_equal: so you'd want something like template KHASH_INIT(string name,K,V,bool kh_is_map, alias keyhash, alias equal = (V a , V b) => a==b) { //... } where K and V are types, "alias keyhash" is a function that transforms a key into a hash and alias equal is a function that deternimes if two values(keys?) are equal. you'd call it like KHASH_INIT!("some_name",string,int,true, (string a) => myFancyHash(a) /* leave equal as a default*/); Let me know if you get stuck. Nic
Apr 02
prev sibling next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Sunday, 2 April 2017 at 21:43:52 UTC, biocyberman wrote:
 template __KHASH_TYPE(string name){
   "struct  kh_" ~ name ~"_t { " ~
                 "khint_t n_buckets, size, n_occupied, 
 upper_bound; " ~
                 "khint32_t *flags; " ~
                 "khkey_t *keys; " ~
                 "khval_t *vals; " ~
         "}"

 }
Not that you'll get bitten by it in this case but in D the pointer declarator * is left associative. i.e. in C int *pInt, Int; // "Int" is int not an int* int *pInt, Int[3]; // Int is a static array of 3 ints. but in D misleading: int *pInt, Int; // Int is an int*!! wrong: int *pInt, three_Ints[3]; // Error cannot mix declared types not misleading int* pInt, pInt2; // BOTH int* int*pInt; //pointer to int int[3] three_Ints; // static array of 3 ints.
Apr 02
parent reply biocyberman <biocyberman gmail.com> writes:
On Monday, 3 April 2017 at 00:00:04 UTC, Nicholas Wilson wrote:
 On Sunday, 2 April 2017 at 21:43:52 UTC, biocyberman wrote:
 template __KHASH_TYPE(string name){
   "struct  kh_" ~ name ~"_t { " ~
                 "khint_t n_buckets, size, n_occupied, 
 upper_bound; " ~
                 "khint32_t *flags; " ~
                 "khkey_t *keys; " ~
                 "khval_t *vals; " ~
         "}"

 }
Not that you'll get bitten by it in this case but in D the pointer declarator * is left associative. i.e. in C int *pInt, Int; // "Int" is int not an int* int *pInt, Int[3]; // Int is a static array of 3 ints. but in D misleading: int *pInt, Int; // Int is an int*!! wrong: int *pInt, three_Ints[3]; // Error cannot mix declared types not misleading int* pInt, pInt2; // BOTH int* int*pInt; //pointer to int int[3] three_Ints; // static array of 3 ints.
Thank you for some excellent tips, Nicholas Wilson. I made this repo https://github.com/biocyberman/klibD. You are more than welcome to make direct contributions with PRs there. The next milestone want to reach is to complete to conversion of khash.d and have to test code with it.
Apr 03
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 3 April 2017 at 10:04:53 UTC, biocyberman wrote:
 On Monday, 3 April 2017 at 00:00:04 UTC, Nicholas Wilson wrote:
 On Sunday, 2 April 2017 at 21:43:52 UTC, biocyberman wrote:
 template __KHASH_TYPE(string name){
   "struct  kh_" ~ name ~"_t { " ~
                 "khint_t n_buckets, size, n_occupied, 
 upper_bound; " ~
                 "khint32_t *flags; " ~
                 "khkey_t *keys; " ~
                 "khval_t *vals; " ~
         "}"

 }
Not that you'll get bitten by it in this case but in D the pointer declarator * is left associative. i.e. in C int *pInt, Int; // "Int" is int not an int* int *pInt, Int[3]; // Int is a static array of 3 ints. but in D misleading: int *pInt, Int; // Int is an int*!! wrong: int *pInt, three_Ints[3]; // Error cannot mix declared types not misleading int* pInt, pInt2; // BOTH int* int*pInt; //pointer to int int[3] three_Ints; // static array of 3 ints.
Thank you for some excellent tips, Nicholas Wilson. I made this repo https://github.com/biocyberman/klibD. You are more than welcome to make direct contributions with PRs there. The next milestone want to reach is to complete to conversion of khash.d and have to test code with it.
I'm very buy atm but I will give some general tips: prefer template over string mixins where possible. This will make the code much more readable. try to remove Cisms. Seperate declaration and definition is the most glaring example. But also the function that deal with the kh_hastable should be member function. all of the "name" parameters in the macros should not be needed as D has overloading and mangling to handle that. Other than that, good luck and learn lots!
Apr 03
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Monday, 3 April 2017 at 11:18:21 UTC, Nicholas Wilson wrote:
    prefer template over string mixins where possible. This will 
 make the code much more readable.
My advise would be the opposite. templates put much more pressure on the compiler then string-mixins do. Also the code that templates expand to is hard to get. Whereas the code that string mixins expand to can always be printed one way or another.
Apr 03
parent reply biocyberman <biocyberman gmail.com> writes:
On Monday, 3 April 2017 at 23:10:49 UTC, Stefan Koch wrote:
 On Monday, 3 April 2017 at 11:18:21 UTC, Nicholas Wilson wrote:
    prefer template over string mixins where possible. This 
 will make the code much more readable.
My advise would be the opposite. templates put much more pressure on the compiler then string-mixins do. Also the code that templates expand to is hard to get. Whereas the code that string mixins expand to can always be printed one way or another.
Could you elaborate more about this (i.e. show where mixins is more readable, debugable and less stressful to the compiler) ? This kind of information is good for tuning stage later. My goal now is to finish the conversion and running of the header and the test code (https://github.com/attractivechaos/klib/blob/master/test/khash_test.c). Ali: I noticed the -E option recently but haven't really used it. I now generated the pre-processed source and try to make use of it.
Apr 05
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Wednesday, 5 April 2017 at 12:27:23 UTC, biocyberman wrote:
 On Monday, 3 April 2017 at 23:10:49 UTC, Stefan Koch wrote:
 On Monday, 3 April 2017 at 11:18:21 UTC, Nicholas Wilson wrote:
    prefer template over string mixins where possible. This 
 will make the code much more readable.
My advise would be the opposite. templates put much more pressure on the compiler then string-mixins do. Also the code that templates expand to is hard to get. Whereas the code that string mixins expand to can always be printed one way or another.
Could you elaborate more about this (i.e. show where mixins is more readable, debugable and less stressful to the compiler) ? This kind of information is good for tuning stage later. My goal now is to finish the conversion and running of the header and the test code (https://github.com/attractivechaos/klib/blob/master/test/khash_test.c). Ali: I noticed the -E option recently but haven't really used it. I now generated the pre-processed source and try to make use of it.
While Stefan correctly notes that templates are slower than string mixins I generally find templates easier to read. In terms of debugability: you can pragma(msg, myGeneratedString) to see the generated code; the error messages you get from templates are slightly more difficult to read than normal error messages in that you have to figure out what the significance of a particular parameter is (is it missing a method or operator? is it a struct instead of a class?), properly constraining the templates helps with this although the compiler is usually pretty good.
Apr 05
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
Covert has a very different meaning. :)

Ali
Apr 03
parent reply biocyberman <biocyberman gmail.com> writes:
On Tuesday, 4 April 2017 at 05:29:42 UTC, Ali Çehreli wrote:
 Covert has a very different meaning. :)

 Ali
Thanks Ali. My fingers argued they are the same :) And I can't find a way to edit my post after posting. I would love to have your input. I am revisited your book several times to read relevant sections. But these complex macros are still holding me back.
Apr 04
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Tuesday, 4 April 2017 at 09:37:12 UTC, biocyberman wrote:
 On Tuesday, 4 April 2017 at 05:29:42 UTC, Ali Çehreli wrote:
 Covert has a very different meaning. :)

 Ali
Thanks Ali. My fingers argued they are the same :) And I can't find a way to edit my post after posting. I would love to have your input. I am revisited your book several times to read relevant sections. But these complex macros are still holding me back.
Most of those macros are not needed and can be just part the struct definition: i.e. you want something like struct kh_hashtable(K,V,bool _is_map, alias hash_func, alias hash_eq = (K a, K b)=> a == b) { khint_t n_buckets, size, n_occupied, upper_bound; khint32_t *flags; K *keys; V *vals; //No need for __KHASH_PROTOTYPES / __KHASH_IMPL just declare the function as methods of the struct this() { ... } // in place of kh_init_##name ~this() { ... } // for destroy resize(khint_t new_size){ ... } //kh_resize_##name // and so on for each method in __KHASH_IMPL }
Apr 04
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 04/02/2017 02:43 PM, biocyberman wrote:
 khash.h
 
(http://attractivechaos.github.io/klib/#Khash%3A%20generic%20hash%20table)
 is a part of klib library in C. I want to covert it to D in the process
 of learning deeper about D.
These are macros used by the library developer to generate library facilities without repetition. Not uncommon for C libraries... As Nicholas Wilson says, just ignore most of these macros because in the end what you want are the types and functions that the public interface of the library includes. (Or, the public documentation of the library includes.) In this case, looking at the preprocessor output to see what is generated may help. For example, use the -E compiler switch of gcc. If you're not familiar with this switch, you may be intimidated at first as it includes all headers that your header includes itself. Just search for the said library types and functions to see how they ended up like after preprocessing. Ali
Apr 04