digitalmars.D.learn

digitalmars.D.learn - strtok?

Carlos Santander B. (18/18) Apr 09 2005 I have this C program (not written by me) that uses strtok. To port it

Regan Heath (69/85) Apr 10 2005 From MSDN:

Carlos Santander B. (6/84) Apr 10 2005 And especially thanks for that!

"Carlos Santander B." <csantander619 gmail.com> writes:

I have this C program (not written by me) that uses strtok. To port it 
to D, I wrote this:

//--------------------------------------------------------------
extern(C) char * strtok (char * strToken, char * strDelimit);

char [] tokenize(char [] str, char [] sep)
{
	char * arg1, arg2, res;
	arg2 = toStringz(sep);
	arg1 = (str.length>0) ? toStringz(str) : null;
	res = strtok(arg1,arg2);
	return toString(res);
}
//--------------------------------------------------------------

I would like to have a D only version of this. However, I'm not sure 
what strtok does. Does anybody know how to do this?

-- 
Carlos Santander Bernal

JP2, you'll always live in our minds

Apr 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sat, 09 Apr 2005 11:32:02 -0500, Carlos Santander B.  
<csantander619 gmail.com> wrote:
 I have this C program (not written by me) that uses strtok. To port it  
 to D, I wrote this:

 //--------------------------------------------------------------
 extern(C) char * strtok (char * strToken, char * strDelimit);

 char [] tokenize(char [] str, char [] sep)
 {
 	char * arg1, arg2, res;
 	arg2 = toStringz(sep);
 	arg1 = (str.length>0) ? toStringz(str) : null;
 	res = strtok(arg1,arg2);
 	return toString(res);
 }
 //--------------------------------------------------------------

 I would like to have a D only version of this. However, I'm not sure  
 what strtok does.

 From MSDN:

char *strtok( char *strToken, const char *strDelimit );
wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
unsigned char *_mbstok( unsigned char*strToken, const unsigned char  
*strDelimit );

All of these functions return a pointer to the next token found in  
strToken. They return NULL when no more tokens are found. Each call  
modifies strToken by substituting a NULL character for each delimiter that  
is encountered.

The strtok function finds the next token in strToken. The set of  
characters in strDelimit specifies possible delimiters of the token to be  
found in strToken on the current call. wcstok and _mbstok are  
wide-character and multibyte-character versions of strtok. The arguments  
and return value of wcstok are wide-character strings; those of _mbstok  
are multibyte-character strings. These three functions behave identically  
otherwise.

On the first call to strtok, the function skips leading delimiters and  
returns a pointer to the first token in strToken, terminating the token  
with a null character. More tokens can be broken out of the remainder of  
strToken by a series of calls to strtok. Each call to strtok modifies  
strToken by inserting a null character after the token returned by that  
call. To read the next token from strToken, call strtok with a NULL value  
for the strToken argument. The NULL strToken argument causes strtok to  
search for the next token in the modified strToken. The strDelimit  
argument can take any value from one call to the next so that the set of  
delimiters may vary.

Warning   Each of these functions uses a static variable for parsing the  
string into tokens. If multiple or simultaneous calls are made to the same  
function, a high potential for data corruption and inaccurate results  
exists. Therefore, do not attempt to call the same function simultaneously  
for different strings and be aware of calling one of these function from  
within a loop where another routine may be called that uses the same  
function.  However, calling this function simultaneously from multiple  
threads does not have undesirable effects.

 Does anybody know how to do this?

D can do much better than C, using slices you can tokenize a string  
without modification and return all the results in an array.

import std.stdio;
import std.string;

char[][] tokenise(char[] input, char[] tokens)
{	
	char[][] res = null;
	int start = -1;
	
	foreach(int i, char c; input) {
		if (tokens.find(c) == -1) {
			if (start == -1) start = i;
		}
		else {
			if (start != -1) {
				res ~= input[start..i];
				start = -1;
			}
			
		}
	}
	if (start != -1) res ~= input[start..$];
	return res;
}

void main()
{
	char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";
	
	writefln(input);
	foreach(char[] s; tokenise(input,",."))
		writefln(s);
}

Regan

Apr 10 2005

"Carlos Santander B." <csantander619 gmail.com> writes:

Regan Heath wrote:
  From MSDN:
 
 char *strtok( char *strToken, const char *strDelimit );
 wchar_t *wcstok( wchar_t *strToken, const wchar_t *strDelimit );
 unsigned char *_mbstok( unsigned char*strToken, const unsigned char  
 *strDelimit );
 
 All of these functions return a pointer to the next token found in  
 strToken. They return NULL when no more tokens are found. Each call  
 modifies strToken by substituting a NULL character for each delimiter 
 that  is encountered.
 
 The strtok function finds the next token in strToken. The set of  
 characters in strDelimit specifies possible delimiters of the token to 
 be  found in strToken on the current call. wcstok and _mbstok are  
 wide-character and multibyte-character versions of strtok. The 
 arguments  and return value of wcstok are wide-character strings; those 
 of _mbstok  are multibyte-character strings. These three functions 
 behave identically  otherwise.
 
 On the first call to strtok, the function skips leading delimiters and  
 returns a pointer to the first token in strToken, terminating the token  
 with a null character. More tokens can be broken out of the remainder 
 of  strToken by a series of calls to strtok. Each call to strtok 
 modifies  strToken by inserting a null character after the token 
 returned by that  call. To read the next token from strToken, call 
 strtok with a NULL value  for the strToken argument. The NULL strToken 
 argument causes strtok to  search for the next token in the modified 
 strToken. The strDelimit  argument can take any value from one call to 
 the next so that the set of  delimiters may vary.
 
 Warning   Each of these functions uses a static variable for parsing 
 the  string into tokens. If multiple or simultaneous calls are made to 
 the same  function, a high potential for data corruption and inaccurate 
 results  exists. Therefore, do not attempt to call the same function 
 simultaneously  for different strings and be aware of calling one of 
 these function from  within a loop where another routine may be called 
 that uses the same  function.  However, calling this function 
 simultaneously from multiple  threads does not have undesirable effects.
 

Thanks for that.

 
 D can do much better than C, using slices you can tokenize a string  
 without modification and return all the results in an array.
 
 import std.stdio;
 import std.string;
 
 char[][] tokenise(char[] input, char[] tokens)
 {   
     char[][] res = null;
     int start = -1;
     
     foreach(int i, char c; input) {
         if (tokens.find(c) == -1) {
             if (start == -1) start = i;
         }
         else {
             if (start != -1) {
                 res ~= input[start..i];
                 start = -1;
             }
            
         }
     }
     if (start != -1) res ~= input[start..$];
     return res;
 }
 
 void main()
 {
     char[] input = ",ab.c,,..def,.,g,,h..i,,jkl,";
     
     writefln(input);
     foreach(char[] s; tokenise(input,",."))
         writefln(s);
 }
 
 Regan

And especially thanks for that!

-- 
Carlos Santander Bernal

JP2, you'll always live in our minds

Apr 10 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - strtok?