www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - sscanf() and using \ in format = no workie?

reply AEon <aeon2001 lycos.de> writes:
Until D supports a native version of sscanf(), I have been experimenting
with the command some more, and noted that defining a sscanf format that
contains a \ will fail:

char[] line = "  0:00 InitGame:
\gamename\baseq3\blah\\mapname\q3tourney2\protocol\"

char[] junk, map;
char[] mtempl = "\\mapname\\%s\\";											
junk.length = line.length;
map.length = line.length;
int ret = sscanf( line, mtempl, map.ptr );

printf("\n  \"%.*s\"\n   -> Ret: %d  Map: \"%.*s\"  Junk: \"%.*s\"\n",
line, ret, map, junk );


char[] mtempl = "Initgame: %s\\mapname\\%s\\";
int ret = sscanf( line, mtempl, junk.ptr, map.ptr );


Neither of the above to mtempl (templates) will recognize the line above
and grab the mapname "q3tourney2".

IIRC I had the exact same problem long ago, and then gave up on sscanf. :(

Any idea what I am doing wrong?

AEon
Apr 04 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 04 Apr 2005 09:01:21 +0200, AEon <aeon2001 lycos.de> wrote:
 Until D supports a native version of sscanf(), I have been experimenting
 with the command some more, and noted that defining a sscanf format that
 contains a \ will fail:

 char[] line = "  0:00 InitGame:
 \gamename\baseq3\blah\\mapname\q3tourney2\protocol\"

 char[] junk, map;
 char[] mtempl = "\\mapname\\%s\\";											
 junk.length = line.length;
 map.length = line.length;
 int ret = sscanf( line, mtempl, map.ptr );

 printf("\n  \"%.*s\"\n   -> Ret: %d  Map: \"%.*s\"  Junk: \"%.*s\"\n",
 line, ret, map, junk );


 char[] mtempl = "Initgame: %s\\mapname\\%s\\";
 int ret = sscanf( line, mtempl, junk.ptr, map.ptr );


 Neither of the above to mtempl (templates) will recognize the line above
 and grab the mapname "q3tourney2".

 IIRC I had the exact same problem long ago, and then gave up on sscanf.  
 :(

 Any idea what I am doing wrong?
Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :) From the MSDN sscanf docs: "%s: String, up to first white-space character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7." So, "%s" means characters up until the next space. You can use "%[a-z0-9]" to stop on any character other than those specified within the []. So... import std.c.stdio; import std.string; void main() { char[] line = "0:00 InitGame: \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\"; char[] mtempl = "\\mapname\\%[a-z0-9]"; char[] junk, map; int ret; map.length = line.length; junk.length = line.length; ret = line.find("\\mapname"); ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\"\n", line, ret, map); } should get you what you want. Notes: - I have used toStringz, this function converts a D string into a C string. Now, technically it's not required above because static strings in D have a null terminator appended (not sure why). - I have used 'find' to find the \\mapname part of the string (and done no error checking). Regan
Apr 04 2005
parent reply AEon <aeon2001 lycos.de> writes:
Regan Heath wrote:

 Do you have a copy of MSDN? or similar ANSI C documentation? If not, 
 find  some :)
MSDN ? I have The Waite Group's Essential Guide to ANSI C.
  From the MSDN sscanf docs:
 
 "%s: String, up to first white-space character (space, tab or newline). 
 To  read strings not delimited by space characters, use set of square 
 brackets  ([ ]), as discussed following Table R.7."
 
 So, "%s" means characters up until the next space.
 You can use "%[a-z0-9]" to stop on any character other than those  
 specified within the [].
 
 So...
 
 import std.c.stdio;
 import std.string;
 
 void main()
 {
     char[] line = "0:00 InitGame:  
 \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\";
 
     char[] mtempl = "\\mapname\\%[a-z0-9]";
     char[] junk, map;   
     int ret;
     
     map.length = line.length;
     junk.length = line.length;
     ret = line.find("\\mapname");
     ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr );
     printf("\n  \"%.*s\"\n   -> Ret: %d  Map: \"%.*s\"\n", line, ret, map);
 }
 
 should get you what you want.
 
 Notes:
 
 - I have used toStringz, this function converts a D string into a C  
 string. Now, technically it's not required above because static strings 
 in  D have a null terminator appended (not sure why).
 
 - I have used 'find' to find the \\mapname part of the string (and done 
 no  error checking).
Thank you for taking the time to explicitly code the above. I actually did use a find command to quickly do the above only using D string commands. The above was more a general application test for sscanf() to write a programmable parser. Alas "\\mapname\\%[a-z0-9]" is just not good enough, since the names could contain just about any character < ASCII 127. And I noticed something else: " 34:03 Kill: 1 0 7: AEon - gXp killed pezen by MOD_ROCKET_SPLASH" A log line that contains blanks in names, will completely messup a sscanf() with the format: int ret = sscanf( line,"%d:%02d Kill: %d %d %d: %s"~ " killed " ~ "%s" ~ "by MOD_ROCKET_SPLASH", &min,&sec,&pl1,&pl2,&mod, fragger.ptr,fragged.ptr ); fragger will be "AEon" and not "AEon - gXp". I then noted that at the very least I would need "%[a-zA-Z0-9]", but when I add "%[a-zA-Z 0-9]" (a blank) to allow for names with blanks. sscanf totally messed up the read. Seems like there are certain limitations of the sscanf usefuleness. :( Slowly I seem to recally why I dropped the use of sscanf(), for the reasons above, only by now I had forgotten them. Hmpf. Back to my old code where I hack the lines appart by hand. AEon
Apr 04 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 04 Apr 2005 11:20:52 +0200, AEon <aeon2001 lycos.de> wrote:
 Regan Heath wrote:

 Do you have a copy of MSDN? or similar ANSI C documentation? If not,  
 find  some :)
MSDN ? I have The Waite Group's Essential Guide to ANSI C.
"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?
  From the MSDN sscanf docs:
  "%s: String, up to first white-space character (space, tab or  
 newline). To  read strings not delimited by space characters, use set  
 of square brackets  ([ ]), as discussed following Table R.7."
  So, "%s" means characters up until the next space.
 You can use "%[a-z0-9]" to stop on any character other than those   
 specified within the [].
  So...
  import std.c.stdio;
 import std.string;
  void main()
 {
     char[] line = "0:00 InitGame:   
 \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\";
      char[] mtempl = "\\mapname\\%[a-z0-9]";
     char[] junk, map;       int ret;
         map.length = line.length;
     junk.length = line.length;
     ret = line.find("\\mapname");
     ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr );
     printf("\n  \"%.*s\"\n   -> Ret: %d  Map: \"%.*s\"\n", line, ret,  
 map);
 }
  should get you what you want.
  Notes:
  - I have used toStringz, this function converts a D string into a C   
 string. Now, technically it's not required above because static strings  
 in  D have a null terminator appended (not sure why).
  - I have used 'find' to find the \\mapname part of the string (and  
 done no  error checking).
Thank you for taking the time to explicitly code the above.
NP. I just copy/pasted your code and tried to make it work.
 I actually did use a find command to quickly do the above only using D  
 string commands. The above was more a general application test for  
 sscanf() to write a programmable parser.
I understand.
 Alas "\\mapname\\%[a-z0-9]" is just not good enough, since the names  
 could contain just about any character < ASCII 127.
Hmm...
 And I noticed something else:

 " 34:03 Kill: 1 0 7: AEon - gXp killed pezen by MOD_ROCKET_SPLASH"

 A log line that contains blanks in names, will completely messup a  
 sscanf() with the format:

 int ret = sscanf( line,"%d:%02d Kill: %d %d %d: %s"~ " killed " ~ "%s" ~  
 "by MOD_ROCKET_SPLASH",
 	&min,&sec,&pl1,&pl2,&mod, fragger.ptr,fragged.ptr );

 fragger will be "AEon" and not "AEon - gXp".

 I then noted that at the very least I would need "%[a-zA-Z0-9]", but  
 when I add "%[a-zA-Z 0-9]" (a blank) to allow for names with blanks.  
 sscanf totally messed up the read.

 Seems like there are certain limitations of the sscanf usefuleness. :(
Indeed. Can you change the format of the log lines? If so, choose a delimiter that cannot appear in the log data and use that to delimit the fields. If you cannot then scan for the keywords i.e. "Kill: %d %d d:" <here be player name> "killed" <here be opponent name> "by" <here be device> (you're probably doing this already, right?)
 Slowly I seem to recally why I dropped the use of sscanf(), for the  
 reasons above, only by now I had forgotten them. Hmpf.

 Back to my old code where I hack the lines appart by hand.
Have you tried the regexp library? std.regexp. I'm no regexp expert but you might be able to use the "split" function in std.regexp to parse your log lines. Not sure. Regan
Apr 04 2005
parent reply AEon <aeon2001 lycos.de> writes:
Regan Heath wrote:

 Regan Heath wrote:
 Do you have a copy of MSDN? or similar ANSI C documentation? If not,  
 find  some :)
MSDN ? I have The Waite Group's Essential Guide to ANSI C.
"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?
Nope... the documentation I have seem to be quite old and pretty close to the original ANSI C definitions, without any more modern thrills (like C99 e.g.). The above syntax is not mentioned %[a-z], that was the reason it surprised me. Are you sure this actually *is* ANSI C, and not some latter addition, that D just "happens" to also support?
 Seems like there are certain limitations of the sscanf usefuleness. :(
Indeed. Can you change the format of the log lines?
Alas I can't change the logs, these are from more than 30 or so games, and every game engine comes along with its own format. Some log files like those from half-life e.g. put names in double quotes, making them a *lot* easier to parse.
 If so, choose a  
 delimiter that cannot appear in the log data and use that to delimit 
 the  fields. If you cannot then scan for the keywords i.e. "Kill: %d %d 
 d:"  <here be player name> "killed" <here be opponent name> "by" <here 
 be  device> (you're probably doing this already, right?)
I am... that was the reason I was surprised sscanf did not work. E.g. using the key words should delimit the %s marked player names in a completely obvious way. But it does not. I will probably write up my own function that does something like sscanf, but in a more "solid" way. I noted that sscanf works quite well with numbers %d "1234:32 Kill" " 1234:32Kill" will both work just file with a "%d:%d Kill" format.
 Have you tried the regexp library? std.regexp. I'm no regexp expert but  
 you might be able to use the "split" function in std.regexp to parse 
 your  log lines. Not sure.
A split with the "keywords" would already help. But indeed regular expressions is still something I need to look into. IIRC was there not some discussion that regular expressions were not yet fully implemented in D? AEon
Apr 07 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Thu, 07 Apr 2005 10:43:41 +0200, AEon <aeon2001 lycos.de> wrote:
 Regan Heath wrote:

 Regan Heath wrote:
 Do you have a copy of MSDN? or similar ANSI C documentation? If not,   
 find  some :)
MSDN ? I have The Waite Group's Essential Guide to ANSI C.
"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?
Nope... the documentation I have seem to be quite old and pretty close to the original ANSI C definitions, without any more modern thrills (like C99 e.g.). The above syntax is not mentioned %[a-z], that was the reason it surprised me. Are you sure this actually *is* ANSI C, and not some latter addition, that D just "happens" to also support?
You might be right: "Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z]. This is a common scanf function extension, but note that the ANSI standard does not require it." It's important to note that D technically doesn't 'support' it, D is link compatible with C, and the C libraries that come with D i.e. the Digital Mars ones, 'support' it. When you call ANSI C functions from D you're calling C functions. Hence the problems with char[] which is not a C string.
 Seems like there are certain limitations of the sscanf usefuleness. :(
Indeed. Can you change the format of the log lines?
Alas I can't change the logs, these are from more than 30 or so games, and every game engine comes along with its own format. Some log files like those from half-life e.g. put names in double quotes, making them a *lot* easier to parse.
Those half-life guys have prooven to be quite smart on a number of occasions, one more doesn't surprise me.
 If so, choose a  delimiter that cannot appear in the log data and use  
 that to delimit the  fields. If you cannot then scan for the keywords  
 i.e. "Kill: %d %d d:"  <here be player name> "killed" <here be opponent  
 name> "by" <here be  device> (you're probably doing this already,  
 right?)
I am... that was the reason I was surprised sscanf did not work. E.g. using the key words should delimit the %s marked player names in a completely obvious way. But it does not. I will probably write up my own function that does something like sscanf, but in a more "solid" way.
Have you tried: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/19835
 I noted that sscanf works quite well with numbers %d

 "1234:32 Kill"
 " 1234:32Kill"

 will both work just file with a "%d:%d Kill" format.
Strings can contain numbers, numbers cannot contain letters, numbers are easy :)
 Have you tried the regexp library? std.regexp. I'm no regexp expert  
 but  you might be able to use the "split" function in std.regexp to  
 parse your  log lines. Not sure.
A split with the "keywords" would already help. But indeed regular expressions is still something I need to look into. IIRC was there not some discussion that regular expressions were not yet fully implemented in D?
There was a lot of discussion about further integrating them, but I believe the std.regexp library does a fairly good job already. Regan
Apr 07 2005