www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - strtok?

reply "Carlos Santander B." <csantander619 gmail.com> writes:
I have this C program (not written by me) that uses strtok. To port it 
to D, I wrote this:

//--------------------------------------------------------------
extern(C) char * strtok (char * strToken, char * strDelimit);

char [] tokenize(char [] str, char [] sep)
{
	char * arg1, arg2, res;
	arg2 = toStringz(sep);
	arg1 = (str.length>0) ? toStringz(str) : null;
	res = strtok(arg1,arg2);
	return toString(res);
}
//--------------------------------------------------------------

I would like to have a D only version of this. However, I'm not sure 
what strtok does. Does anybody know how to do this?

-- 
Carlos Santander Bernal

JP2, you'll always live in our minds
Apr 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sat, 09 Apr 2005 11:32:02 -0500, Carlos Santander B.  
<csantander619 gmail.com> wrote:
 I have this C program (not written by me) that uses strtok. To port it  
 to D, I wrote this:

 //--------------------------------------------------------------
 extern(C) char * strtok (char * strToken, char * strDelimit);

 char [] tokenize(char [] str, char [] sep)
 {
 	char * arg1, arg2, res;
 	arg2 = toStringz(sep);
 	arg1 = (str.length>0) ? toStringz(str) : null;
 	res = strtok(arg1,arg2);
 	return toString(res);
 }
 //--------------------------------------------------------------

 I would like to have a D only version of this. However, I'm not sure  
 what strtok does.

From MSDN: char *strtok( char *strToken, const char *strDelimit ); wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit ); unsigned char *_mbstok( unsigned char*strToken, const unsigned char *strDelimit ); All of these functions return a pointer to the next token found in strToken. They return NULL when no more tokens are found. Each call modifies strToken by substituting a NULL character for each delimiter that is encountered. The strtok function finds the next token in strToken. The set of characters in strDelimit specifies possible delimiters of the token to be found in strToken on the current call. wcstok and _mbstok are wide-character and multibyte-character versions of strtok. The arguments and return value of wcstok are wide-character strings; those of _mbstok are multibyte-character strings. These three functions behave identically otherwise. On the first call to strtok, the function skips leading delimiters and returns a pointer to the first token in strToken, terminating the token with a null character. More tokens can be broken out of the remainder of strToken by a series of calls to strtok. Each call to strtok modifies strToken by inserting a null character after the token returned by that call. To read the next token from strToken, call strtok with a NULL value for the strToken argument. The NULL strToken argument causes strtok to search for the next token in the modified strToken. The strDelimit argument can take any value from one call to the next so that the set of delimiters may vary. Warning Each of these functions uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these function from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.
 Does anybody know how to do this?

D can do much better than C, using slices you can tokenize a string without modification and return all the results in an array. import std.stdio; import std.string; char[][] tokenise(char[] input, char[] tokens) { char[][] res = null; int start = -1; foreach(int i, char c; input) { if (tokens.find(c) == -1) { if (start == -1) start = i; } else { if (start != -1) { res ~= input[start..i]; start = -1; } } } if (start != -1) res ~= input[start..$]; return res; } void main() { char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,"; writefln(input); foreach(char[] s; tokenise(input,",.")) writefln(s); } Regan
Apr 10 2005
parent "Carlos Santander B." <csantander619 gmail.com> writes:
Regan Heath wrote:
  From MSDN:
 
 char *strtok( char *strToken, const char *strDelimit );
 wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
 unsigned char *_mbstok( unsigned char*strToken, const unsigned char  
 *strDelimit );
 
 All of these functions return a pointer to the next token found in  
 strToken. They return NULL when no more tokens are found. Each call  
 modifies strToken by substituting a NULL character for each delimiter 
 that  is encountered.
 
 The strtok function finds the next token in strToken. The set of  
 characters in strDelimit specifies possible delimiters of the token to 
 be  found in strToken on the current call. wcstok and _mbstok are  
 wide-character and multibyte-character versions of strtok. The 
 arguments  and return value of wcstok are wide-character strings; those 
 of _mbstok  are multibyte-character strings. These three functions 
 behave identically  otherwise.
 
 On the first call to strtok, the function skips leading delimiters and  
 returns a pointer to the first token in strToken, terminating the token  
 with a null character. More tokens can be broken out of the remainder 
 of  strToken by a series of calls to strtok. Each call to strtok 
 modifies  strToken by inserting a null character after the token 
 returned by that  call. To read the next token from strToken, call 
 strtok with a NULL value  for the strToken argument. The NULL strToken 
 argument causes strtok to  search for the next token in the modified 
 strToken. The strDelimit  argument can take any value from one call to 
 the next so that the set of  delimiters may vary.
 
 Warning   Each of these functions uses a static variable for parsing 
 the  string into tokens. If multiple or simultaneous calls are made to 
 the same  function, a high potential for data corruption and inaccurate 
 results  exists. Therefore, do not attempt to call the same function 
 simultaneously  for different strings and be aware of calling one of 
 these function from  within a loop where another routine may be called 
 that uses the same  function.  However, calling this function 
 simultaneously from multiple  threads does not have undesirable effects.
 

Thanks for that.
 
 D can do much better than C, using slices you can tokenize a string  
 without modification and return all the results in an array.
 
 import std.stdio;
 import std.string;
 
 char[][] tokenise(char[] input, char[] tokens)
 {   
     char[][] res = null;
     int start = -1;
     
     foreach(int i, char c; input) {
         if (tokens.find(c) == -1) {
             if (start == -1) start = i;
         }
         else {
             if (start != -1) {
                 res ~= input[start..i];
                 start = -1;
             }
            
         }
     }
     if (start != -1) res ~= input[start..$];
     return res;
 }
 
 void main()
 {
     char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";
     
     writefln(input);
     foreach(char[] s; tokenise(input,",."))
         writefln(s);
 }
 
 Regan

And especially thanks for that! -- Carlos Santander Bernal JP2, you'll always live in our minds
Apr 10 2005