www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - String Parsing with \" in a ".." text line

reply AEon <AEon_member pathlink.com> writes:
I started to code by parser, a *lot* easier with D, commands like
std.string.split work mirracles.

But I am still wondering how to optimize parsing, in this case of a
configuration file:

<code>
// comments
[General]
game		"Quake III Arena"
gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
gameOpt		"-q3a"			// *** comment
gameMode	"16"
// comments
</code>

I do a 

   std.string.find(line, "game")

to find out if the line contains my key-variable. And then a

  char[][] splitLine = std.string.split(line, "\"");

accessing the value of the var of interest via

 splitLine[1]

Now that is fine and dandy. But when I want to allow the user to use double
quotes (") in the config file, this will turn ugly, since the above split does
not differ between " and \".

Any ideas how to elegantly read the var/value pairs should the value contain a
\"?

(In C I did some very evil manual hacking to make that work).

Thanx.

AEon
Mar 20 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Any ideas how to elegantly read the var/value pairs should the value  
 contain a
 \"?

 (In C I did some very evil manual hacking to make that work).
I think you have to write your own version of split, one that allows "escaped" characters. Once written I'd recommend it for inclusion into std.string. Regan
Mar 20 2005
parent AEon <AEon_member pathlink.com> writes:
Regan Heath says...

 Any ideas how to elegantly read the var/value pairs should the value  
 contain a
 \"?

 (In C I did some very evil manual hacking to make that work).
I think you have to write your own version of split, one that allows "escaped" characters. Once written I'd recommend it for inclusion into std.string.
:)... will take a while to get a useful version written, since I am still learning about all the goodies in std.string. Basically what could be useful would be a char[][] splitx(char[] stringtosplit, char[] delimiter, char[] non-delimiters) of sorts: splitx( line, "\"", "\\\""); A simpler solution would be to use another delimiter in my config files. But that would leave the problem, that any delimiter could also be needed in the text. If I find anything useful, will post the code. AEon
Mar 21 2005
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
AEon wrote:
 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
Is this a third-party file format? If not, why not define a format that's that little bit easier to parse? I'd be inclined to go for something resembling Windows .ini files. But if you still want to do it this way....
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable.
Which won't work if "game" is somewhere in the value, not in the key. How about checking whether the line _begins_ with "game"?
 And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
<snip> By using split for this you're making life difficult for yourself. How about just picking out the first and last quotes, using find and findr? Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 21 2005
parent reply AEon <AEon_member pathlink.com> writes:
Stewart Gordon...

good points

 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
Is this a third-party file format? If not, why not define a format that's that little bit easier to parse? I'd be inclined to go for something resembling Windows .ini files. But if you still want to do it this way....
True... it is totally up to me to define the format, I felt that was the easiest way to format the cfg file, and is easy to read.
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable.
Which won't work if "game" is somewhere in the value, not in the key. How about checking whether the line _begins_ with "game"?
I had thought of that, but then forgot to check for it. Sigh :)
 And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
<snip> By using split for this you're making life difficult for yourself. How about just picking out the first and last quotes, using find and findr?
Well as long as there is no \" in the line, split will do the job much quicker. Just checked, you are talking regular expression. Still need to learn about those. AEon
Mar 21 2005
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
AEon wrote:
<snip>
 By using split for this you're making life difficult for yourself.  How 
 about just picking out the first and last quotes, using find and findr?
Well as long as there is no \" in the line, split will do the job much quicker. Just checked, you are talking regular expression. Still need to learn about those.
I actually meant the find and rfind (oops, where did findr come from?) in std.string, not std.regexp. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 21 2005
prev sibling next sibling parent reply David Medlock <amedlock nospam.org> writes:
AEon wrote:
 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
 
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable. And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
 
 Any ideas how to elegantly read the var/value pairs should the value contain a
 \"?
 
 (In C I did some very evil manual hacking to make that work).
 
 Thanx.
 
 AEon
Why not just use an existing scripting language for your configuration files? I would recommend Small (http://www.compuphase.com/small.htm) or Lua (http://www.lua.org/). This scripting language would be useful within your game as well. -David
Mar 21 2005
next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
David Medlock wrote:
<snip>
 Why not just use an existing scripting language for your configuration 
 files?
 
 I would recommend Small (http://www.compuphase.com/small.htm) or
 Lua (http://www.lua.org/).
Around two years ago I invented a configuration language called Configur8. It's basically a slightly more powerful version of Windows INI files (with one or two syntactical differences). It's no match for a scripting language, but is perfect for stuff like the above appears to be. I haven't yet created a D interface, but I plan to do it at some point. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 21 2005
prev sibling parent reply AEon <AEon_member pathlink.com> writes:
David Medlock says...

Why not just use an existing scripting language for your configuration 
files?

I would recommend Small (http://www.compuphase.com/small.htm) or
Lua (http://www.lua.org/).

This scripting language would be useful within your game as well.
Is that not a tad overkill... I only want to define a few variables and log file obituaries, that need to be as readable as possible. AEon
Mar 21 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Mar 2005 16:44:36 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 David Medlock says...

 Why not just use an existing scripting language for your configuration
 files?

 I would recommend Small (http://www.compuphase.com/small.htm) or
 Lua (http://www.lua.org/).

 This scripting language would be useful within your game as well.
Is that not a tad overkill... I only want to define a few variables and log file obituaries, that need to be as readable as possible.
The simplest possible format... If you assume your values cannot contain \r\n and your labels/settings cannot contain spaces then you can simply use the following format: label<space>value<\r\n> and parse it by calling "find" on each line, looking for a space, and assuming the rest of the line (minus the \r\n) is the value. If you decide later on that you need \r\n in your values you can encode them as \, r, \, n eg. label<space>regan\r\nwas\r\nhere<\r\n> In general the fewer special characters you define, the fewer special cases you have to handle in values. Further if you can pick characters you will never want to use in values you don't have to handle any special cases at all. Regan Regan
Mar 21 2005
parent reply AEon <AEon_member pathlink.com> writes:
Regan Heath says...
The simplest possible format...

If you assume your values cannot contain \r\n and your labels/settings  
cannot contain spaces then you can simply use the following format:

label<space>value<\r\n>

and parse it by calling "find" on each line, looking for a space, and  
assuming the rest of the line (minus the \r\n) is the value.

If you decide later on that you need \r\n in your values you can encode  
them as \, r, \, n eg.

label<space>regan\r\nwas\r\nhere<\r\n>

In general the fewer special characters you define, the fewer special  
cases you have to handle in values. Further if you can pick characters you  
will never want to use in values you don't have to handle any special  
cases at all.
Hmm... sure that would work, but space is definately something that needs to be used. But your example would be a nightmare to read. The config file format is supposed to be read and changed by not only myself, but any casual stats user. A way to do it: [varlable] values of the variable the whole line not allowing // comments sure that would be trivial to do. But also that is a bit less nice to read. My stats will have quite extensive 4-5 columns ob obituatries using " as a delimiter. So a good way to get rid of \" would help. In my ANSI C code, it simply was not possible, and I was lucky enough not to require it (or I had to hack my code to allow for it). A quite elegant way to solve the problem would be to disallow tabs as values, and use the tab char to seperate the columns. Problem with that is, the user might accidentally add a tab and never notice it. AEon
Mar 21 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Mar 2005 23:10:05 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Regan Heath says...
 The simplest possible format...

 If you assume your values cannot contain \r\n and your labels/settings
 cannot contain spaces then you can simply use the following format:

 label<space>value<\r\n>

 and parse it by calling "find" on each line, looking for a space, and
 assuming the rest of the line (minus the \r\n) is the value.

 If you decide later on that you need \r\n in your values you can encode
 them as \, r, \, n eg.

 label<space>regan\r\nwas\r\nhere<\r\n>

 In general the fewer special characters you define, the fewer special
 cases you have to handle in values. Further if you can pick characters  
 you
 will never want to use in values you don't have to handle any special
 cases at all.
Hmm... sure that would work, but space is definately something that needs to be used.
In the label/setting name?
 But your example would be a nightmare to read.
I don't think so, but I guess this is personal preference. If you like, replace <space> with <tab>, or allow both.
 The config file format is
 supposed to be read and changed by not only myself, but any casual stats  
 user.
KISS (Keep It Simple Stupid) - no insult intended. The more people who have to edit it, the simpler you should attempt to make it. Alternately provide a simple program to read/write it and get them to use that.
 A quite elegant way to solve the problem would be to disallow tabs as  
 values,
 and use the tab char to seperate the columns. Problem with that is, the  
 user
 might accidentally add a tab and never notice it.
- Treat consecutive tabs as 1 tab. - Ignore trailing tabs. I can see 1 potential problem. Depending on the text editor and length of the values in the file the columns might not line up in the text file. However, Excell and I imagine other spreadsheet style programs can load/save tab seperated value text files, also comma seperated text files i.e. a,b,c d,e,f ..etc.. Regan
Mar 21 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon wrote:

 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
 
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable. And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
 
 Any ideas how to elegantly read the var/value pairs should the value contain a
 \"?
 
 (In C I did some very evil manual hacking to make that work).
 
 Thanx.
 
I have a module that will 'tokenize' lines that will probably suit your needs. I've attached the code but if you can't fetch it that way, let me know and I'll make it available on the web. -- Derek Melbourne, Australia 22/03/2005 1:46:54 PM begin 644 test.d M:6UP;W)T(&QI;F5T;VME;CL-" T*:6UP;W)T('-T9"YS=&1I;SL-" T*=F]I M"B` ("!4;VMS(#T M"B` ("!W<FET969L;B B7&Y5<VEN9R! )7- +"! )7- +"! )7- (BP 3&EN M92P 1&5L:6TL($-O;6TI.PT*("` (&9O<F5A8V H:6YT(&DL(&-H87);72!L M3&EN93L 5&]K<RD-"B` ("` ("` =W)I=&5F;&XH(B4R9"TM/F`E<V`B+"!I M8RP 9&5F("P 6V=H:2P :FML72` (#L M("([(CL-" T*("` (%1O:W, /2!4;VME;FEZ94QI;F4H3&EN92P 1&5L:6TL M($-O;6TI.PT*("` ('=R:71E9FQN*")<;E5S:6YG(&`E<V`L(&`E<V`L(&`E M8VAA<EM=(&Q,:6YE.R!4;VMS*0T*("` ("` ("!W<FET969L;B B)3)D+2T^ 68"5S8"(L(&DL(&Q,:6YE*3L-" T*?0`` ` end begin 644 linetoken.d M;6]D=6QE(&QI;F5T;VME;CL-"G!R:79A=&4 >PT*("` (&EM<&]R="!S=&0N M8V4L(&-H87);72!P1&5L:6T /2`B+"(L(&-H87);72!P0V]M;65N="`]("(O M9&-H87);72!P4V]U<F-E+"!D8VAA<EM=('!$96QI;2`]("(L(BP 9&-H87); M=&EC(&1C:&%R6UT =D-L;W-E0G)A8VME="`](")<(B<I77U (CL-" T*("` M(&EF("AP1&5L:6TN;&5N9W1H(#X ,"D-"B` ("` ("` +R\ 3VYL>2!U<V4 M<VEN9VQE+6-H87( 9&5L:6UI=&5R<RX 17AC97-S(&-H87)S(&%R92!I9VYO M<F5D+ T*("` ("` ("!L1&5L:6T ?CT <$1E;&EM6S!=.PT*("` (&5L<V4- M"B` ("` ("` ;$1E;&EM(#T (B([("` +R\ 365A;FEN9R`G86YY(&=R;W5P M("` (&Q4<FEM4W!O="`]("TQ.PT*("` (&9O<F5A8V H:6YT(&DL(&1C:&%R M/2`P*0T*("` ("` ("![("` +R\ 0VAE8VL 9F]R(&-O;6UE;G0 <W1R:6YG M("` ("` ("` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("` ("` ("` M/2`M,2D-"B` ("` ("` >PT*("` ("` ("` ("` +R\ 3F]T(&EN(&$ =&]K M96X >65T+ T*("` ("` ("` ("` :68 *'-T9"YC='EP92YI<W-P86-E*&,I M*0T*("` ("` ("` ("` ("` (&-O;G1I;G5E.R` +R\ 4VMI<"!O=F5R('-P M<R!A8F]U="!T;R!S=&%R="X-"B` ("` ("` ("` (&Q);E1O:V5N(#T ;%)E M;E1O:V5N("L ,3L-"B` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` M("` ("` ("` ("!L4F5S=6QT6VQ);E1O:V5N72!^/2!C.PT*("` ("` ("` M("` ;$QI=$UO9&4 /2!F86QS93L-"B` ("` ("` ("` (&Q4<FEM4W!O="`] M('L ("`O+R!/;FQY(&-H96-K(&9O<B!D96QI;6ET97)S(&EF(&YO="!I;B`G M8G)A8VME="<M;6]D92X-"B` ("` ("` ("` (&EF("AL1&5L:6TN;&5N9W1H M(#T M9"YC='EP92YI<W-P86-E*&,I*0T*("` ("` ("` ("` ("` ('L-"B` ("` M("` ("` ("` ("` ("` ;%1R:6U3<&]T(#T M("` ("` (&Q);E1O:V5N(#T M($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` ("` ("` ("` M("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!&;W5N9"!A('1O:V5N(&1E M("` ("` ("` :68 *&Q4<FEM4W!O="`A/2`M,2D-"B` ("` ("` ("` ("` M;V9F('1R86EL:6YG('-P86-E<RX-"B` ("` ("` ("` ("` ("` ("` ;%)E M<W5L=%ML26Y4;VME;ETN;&5N9W1H(#T M("` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` ("` ("` ("` ("` M('T-"B` ("` ("` ("` ("` ("!L26Y4;VME;B`]("TQ.PT*("` ("` ("` M("` ("` ("\O($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` M"B` ("` ("` :68 *&Q297-U;'1;;$EN5&]K96Y=+FQE;F=T:"`]/2`P*0T* M('EE="X-"B` ("` ("` ("` (&Q0;W, /2!F:6YD*'9/<&5N0G)A8VME="P M('L-"B` ("` ("` ("` ("` ("`O+R!!;B`G;W!E;B< 8G)A8VME="!W87, M9F]U;F0L('-O(&UA:V4 =&AI<R!I=',-"B` ("` ("` ("` ("` ("`O+R!O M=VX =&]K96XL('-T87)T(&%N;W1H97( ;F5W(&]N92P 86YD(&=O(&EN=&\- M"B` ("` ("` ("` ("` ("`O+R`G8G)A8VME="<M;6]D92X-"B` ("` ("` M("` ("` ;$EN5&]K96X /2!L4F5S=6QT+FQE;F=T:#L-"B` ("` ("` ("` M("` ("!L4F5S=6QT+FQE;F=T:"`](&Q);E1O:V5N("L ,3L-" T*("` ("` M("AC(#T M("` ("` ("` ;$YE<W1,979E;"TM.PT*("` ("` ("` ("` ("` (&EF("AL M3F5S=$QE=F5L(#T M("` ("` ("` ("`O+R!/:V%Y+"!))W9E(&9O=6YD('1H92!E;F0 ;V8 =&AE M(&)R86-K971E9"!C:&%R<RX-"B` ("` ("` ("` ("` ("` ("` +R\ 3F]T M92!T:&%T('1H:7, 9&]E<VXG="!N96-E<W-A<FEL>2!M96%N('1H92!E;F0 M;V8-"B` ("` ("` ("` ("` ("` ("` +R\ 82!T;VME;B!W87, 86QS;R!F M("` ("` ("` (&-O;G1I;G5E.PT*("` ("` ("` ("` ("` ('T-"B` ("` M("` ("` +R\ 1FEN86QL>2P 22!G970 =&\ 861D('1H:7, 8VAA<B!T;R!T M"B` ("` ("` :68 *&Q.97-T3&5V96P /3T ,"D-"B` ("` ("` ("` ("\O M($]N;'D 8VAE8VL 9F]R('1R86EL:6YG('-P86-E<R!I9B!N;W0 :6X )V)R M86-K970G+6UO9&4-"B` ("` ("` ("` (&EF("AS=&0N8W1Y<&4N:7-S<&%C M92AC*2D-"B` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!)="!W M87, 82!S<&%C92P <V\ :70 :7, <&]T96YT:6%L;'D 82!T<F%I;&EN9R!S M<&%C92P-"B` ("` ("` ("` ("` ("`O+R!T:'5S($D ;6%R:R!I=', <W!O M="`H:68 :70G<R!T:&4 9FER<W0 :6X 82!S970 ;V8 <W!A8V5S+BD-"B` M("` ("` ("` ("` ("!I9B`H;%1R:6U3<&]T(#T]("TQ*0T*("` ("` ("` M("` ("` ("` ("!L5')I;5-P;W0 /2!L4F5S=6QT6VQ);E1O:V5N72YL96YG M("` ("` ("!L5')I;5-P;W0 /2`M,3L-" T*("` ('T-" T*("` (&EF("AL M4F5S=6QT+FQE;F=T:"`]/2`P*0T*("` ("` ("!L4F5S=6QT('X]("(B.PT* M(%1R:6T ;V9F('1R86EL:6YG('-P86-E<R!O;B!L87-T('1O:V5N+ T*("` M("` ("!L4F5S=6QT6R0M,5TN;&5N9W1H(#T M>PT*("` ("` ("!I9B`H<$-H87)4;T9I;F0 /3T 8RD-"B` ("` ("` ("` M=R!4;R!5<V4 /3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/0T*26YS M(%1O:W, /2!4;VME;FEZ94QI;F4H26YP=71,:6YE+"!$96QI;4-H87(L($-O M;6UE;G13=')I;F<I.PT**BH 3F]T92!T:&%T(&ET(&%C8V5P=', 86QL("=C M=&EN92!S8V%N<R!T:&4 :6YP=70 <W1R:6YG(&%N9"!R971U<FYS(&$ <V5T M(&]F('-T<FEN9W,L(&]N90T*<&5R('1O:V5N(&9O=6YD(&EN('1H92!I;G!U M96QI;4-H87( :7, 86X 96UP='D <W1R:6YG+"!T:&5N('1O:V5N<R!A<F4 M9&5L:6UI=&5D(&)Y(&%N>2!G<F]U<`T*;V8 ;VYE(&]R(&UO<F4 =VAI=&4M M(BX-" T*268 0V]M;65N=%-T<FEN9R!I<R!N;W0 96UP='DL('1H96X 86QL M('!A<G1S(&]F('1H92!I;G!U="!S=')I;F< 9G)O;0T*=&AE(&)E9VEN:6YG M(&]F('1H92!C;VUM96YT('1O('1H92!E;F0 87)E(&EG;F]R960N($)Y(&1E M9F%U;'0-"D-O;6UE;G13=')I;F< :7, (B\O(BX-" T*268 82!T;VME;B!B M96=I;G, =VET:"!A('%U;W1E("AS:6YG;&4L(&1O=6)L92!O<B!B86-K*2P M=&AE;B!Y;W4 =VEL;`T*9V5T(&)A8VL ='=O('1O:V5N<RX 5&AE(&9I<G-T M(&ES('1H92!Q=6]T92!A<R!A('-I;F=L92!C:&%R86-T97( <W1R:6YG+`T* M86YD('1H92!S96-O;F0 :7, 86QL('1H92!C:&%R86-T97)S('5P('1O+"!B M=70 ;F]T(&EN8VQU9&EN9R!T:&4 ;F5X=`T*<75O=&4 ;V8 =&AE('-A;64 M=&]K96X 8F5G:6YS('=I=& 82!B<F%C:V5T("AP87)E;G1H97-I<RP <W%U M;VME;G,N(%1H92!F:7)S="!I<R!T:&4 ;W!E;FEN9R!B<F%C:V5T(&%S(&$ M;&P =&AE(&-H87)A8W1E<G, =7` =&\L(&)U="!N;W0-"FEN8VQU9&EN9RP M=&AE(&UA=&-H:6YG(&5N9"!B<F%C:V5T+"!T86MI;F< ;F5S=&5D(&)R86-K M971S("AO9B!T:&4 <V%M90T*='EP92D :6YT;R!C;VYS:61E<F%T:6]N+ T* M(&$ 8F%C:RUS;&%S:"!C:&%R86-T97( *%PI+"!T:&5N(&YE>'0 8VAA<F%C M;BX 66]U(&-A;B!U<V4 =&AI<R!T;R!F;W)C90T*=&AE(&1E;&EM:71E<B!C M:&%R86-T97( ;W( <W!A8V5S('1O(&)E(&EN<V5R=&5D(&EN=&\ 82!T;VME M("!4;VME;FEZ94QI;F4H(F-H87)A8W1E<B` ("!O<B!S<&%C97, =&\ 8F4 M7'0 :6YS97)T960B+"`B(BD-"B`M+3X >R)C:&%R86-T97(B+"`B;W(B+"`B M<W!A8V5S(BP (G1O(BP (F)E(BP (FEN<V5R=&5D(GT-" T*("` 5&]K96YI M>F5,:6YE*"( 86)C.R!D968 +"!G:&D[("(L("([(BD-"B`M+3X >R)A8F,B M(%MD968 +"!G:&E=("` ("` ("` ("`B*0T*("TM/B![(F%B8R(L(");(BP M9VAI(GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!;9&5F("P 6V=H:2P M:FML72!=("`B*0T*("TM/B![(F%B8R(L(");(BP (F1E9B`L(%MG:&DL(&IK M;%T (GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!D968 +"!G:&D .R!C ` end
Mar 21 2005
next sibling parent reply AEon <AEon_member pathlink.com> writes:
Derek Parnell says...

22/03/2005 1:46:54 PM
begin 644 test.d
What kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format? AEon
Mar 22 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Derek Parnell says...

 22/03/2005 1:46:54 PM
 begin 644 test.d
What kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
It's uuencoded. Search the web for a utility commonly called uudecode. Or, if you have WinACE rename/save the data as a .uue file right click on it in windows explorer and use the winace "extract here" option. Regan
Mar 22 2005
parent J C Calvarese <jcc7 cox.net> writes:
Regan Heath wrote:
 On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon 
 <AEon_member pathlink.com>  wrote:
 
 Derek Parnell says...
...
 It's uuencoded. Search the web for a utility commonly called uudecode. 
 Or,  if you have WinACE rename/save the data as a .uue file right click 
 on it  in windows explorer and use the winace "extract here" option.
 
 Regan
Decoding the fun way: http://www.dsource.org/tutorials/index.php?show_example=146 -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 22 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
22/03/2005 1:46:54 PM
begin 644 test.d
What kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
It ain't "my" format but a very commonly used one - UUEncode. Most news readers can handle it but I'll make the file available on the web (for now). http://www.users.bigpond.com/ddparnell/linetoken.d -- Derek Parnell Melbourne, Australia 22/03/2005 11:49:15 PM
Mar 22 2005
parent AEon <AEon_member pathlink.com> writes:
In article <18k6r1o1h3k7g.1478moloz1j32.dlg 40tude.net>, Derek Parnell says...
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
22/03/2005 1:46:54 PM
begin 644 test.d
What kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
It ain't "my" format but a very commonly used one - UUEncode. Most news readers can handle it but I'll make the file available on the web (for now). http://www.users.bigpond.com/ddparnell/linetoken.d
I had not been suggesting you invented it ;)... Copy/paste, name file uue works just fine with TotalCommander. Had totally forgotten about uue. AEon
Mar 22 2005
prev sibling parent reply AEon <AEon_member pathlink.com> writes:
Derek Parnell says...

begin 644 linetoken.d
A few questions about your code: char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment = "//") As I understand this, pDelim and pComment can be set on calling via TokenizeLine(), but need not since both have "default" values? If so, another very useful code example. int find(dchar[] pStringToScan, dchar pCharToFind) I noted that you defined you own find() function. Generally would that function conflict with those defined in the std lib? Or do user-defined functiontions automatically shadow lib functions? Amazing piece code (will take me a while to read/understand), from your example output test cases : TokenizeLine(" abc, def , ghi, ") // default Delim "," Comment is "//" --> {"abc", "def", "ghi", ""} split(" abc, def , ghi, ", ","), and then apply strip() on every element, would do the same, but not as elegantly :) TokenizeLine("character or spaces to be \t inserted", "") --> {"character", "or", "spaces", "to", "be", "inserted"} An empty delimiter seems to be an alias for \t and " " (space)? Nice! (Just noted from your info: However, if DelimChar is an empty string, then tokens are delimited by any group of one or more white-space characters. By default, DelimChar is ",".) Duplicating that with split() would be tough. TokenizeLine(" abc; def , ghi; ", ";") --> {"abc", "def , ghi", "" } Noting, you seem to be calling something like strip() though not exactly that function. TokenizeLine(" abc, [def , ghi] ") // default Delim "," Comment is "//" --> {"abc", "[", "def , ghi"} (Explanation: If a token begins with a bracket (parenthesis, square, or brace), then you will get back two tokens. The first is the opening bracket as a single character string, and the second is all the characters up to, but not including, the matching end bracket, taking nested brackets (of the same type) into consideration.) Would not: --> {"abc", "[", "def , ghi", "]"} or even --> {"abc", "def , ghi" } be "neater"? TokenizeLine(" abc, [def , [ghi, jkl] ] ") --> {"abc", "[", "def , [ghi, jkl] "} Anything in brackets is treated literally (i.e. as is), so nested brackets are not interpreted. OK. So if you actually wanted to use [] or () in strings, and that may well happen often, one would actually need to "escape" those in some way? I am not sure that the special treatment brackets require will always be convenient. TokenizeLine(` abc, "def , ghi" , jkl `) --> {"abc", `"`, "def , ghi", "jkl"} TokenizeLine(` "moo" \t " oi\"nk\" " \t "ladida " //Comment`, `"`, `//`) 0-->`` 1-->`moo` 2-->`t` 3-->`oi"nk"` 4-->`t` 5-->`ladida` Wishlist: 0: Would wish to not have element 0. 1: fine, but should be element token 0. 2: \t tab no longer recognized, tab should have been ingnored 3: Perfect 4: same as 2 5: Perfect "6" comment ignored, fine I would hope to do this to a line: `"moo" <whitespaces> " oi\"nk\" " <whitespaces> "ladida "//Comment` ->0: `moo` ->1: `oi"nk"` ->2: `ladida` Presently it would not be possibly to rely on a specific column to contain the info a specific double quote pair. Could that be made possible? Thanx for your work... AEon
Mar 22 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 22 Mar 2005 14:14:12 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
begin 644 linetoken.d
A few questions about your code: char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment = "//") As I understand this, pDelim and pComment can be set on calling via TokenizeLine(), but need not since both have "default" values? If so, another very useful code example. int find(dchar[] pStringToScan, dchar pCharToFind) I noted that you defined you own find() function. Generally would that function conflict with those defined in the std lib? Or do user-defined functiontions automatically shadow lib functions?
The D method to resolve such ambiguities is to fully qualify the reference with the package/module name, such as ... lPos = util.linetoken.find(lResult, lToken); [snip]
 TokenizeLine("character    or spaces to be \t inserted", "")
 --> {"character", "or", "spaces", "to", "be", "inserted"}
 
 An empty delimiter seems to be an alias for \t and " " (space)? Nice!
 (Just noted from your info: However, if DelimChar is an empty string, then
 tokens are delimited by any group of one or more white-space characters. By
 default, DelimChar is ",".)
 Duplicating that with split() would be tough.
Yes, the empty delimiter uses groups of one or more whitespace characters to act as a single delimiter. If you really need *just* the space character to be the delimiter then use that, same with tabs.
 
 TokenizeLine(" abc; def , ghi; ", ";")
 --> {"abc", "def , ghi", "" }
 
 Noting, you seem to be calling something like strip() though not exactly that
 function.
 
 
 TokenizeLine(" abc, [def , ghi]        ") // default Delim ","  Comment is "//"
 --> {"abc", "[", "def , ghi"}
 
 (Explanation: If a token begins with a bracket (parenthesis, square, or brace),
 then you will get back two tokens. The first is the opening bracket as a single
 character string, and the second is all the characters up to, but not
including,
 the matching end bracket, taking nested brackets (of the same type) into
 consideration.)
 
 Would not:
 
 --> {"abc", "[", "def , ghi", "]"}
 
 or even
 
 --> {"abc", "def , ghi" }
 
 be "neater"?
Often, when parsing the tokens you need to know if a token was enclosed in brackets or quotes. By supplying the opening bracket or quote in the returned tokens, you can quickly see which were bracketed tokens. Also, there is no need to supply the closing bracket or quote as you know what that would have been by the opening bracket or quote. In other words, if you come across a token of "{" you know the next token was enclosed in braces, so you don't need to see the final brace.
 
 TokenizeLine(" abc, [def , [ghi, jkl] ]  ")
 --> {"abc", "[", "def , [ghi, jkl] "}
 
 Anything in brackets is treated literally (i.e. as is), so nested brackets are
 not interpreted. OK.
 
 So if you actually wanted to use [] or () in strings, and that may well happen
 often, one would actually need to "escape" those in some way? I am not sure
that
 the special treatment brackets require will always be convenient.
There are two ways (at least) to do that. First method is to use the Escape character (the back-slash "\"). TokenizeLine(` abc, \[def , [ghi, jkl] ] `) --> { "abc", "[def", "ghi, jkl", "]" } TokenizeLine(`abc, def\, ghi, jkl`) --> { "abc", "def, ghi", "jkl"} Note only 3 tokens. The other way is to enclose it inside a different sort of bracket/quote. TokenizeLine(`He said, '"Let's go down to the river".`, ``) --> { `He`, `said,`, `'`, `"Let's go down to the river".` }
 TokenizeLine(` "moo"  \t " oi\"nk\"  " \t "ladida " //Comment`, `"`, `//`)
 0-->``
 1-->`moo`
 2-->`t`
 3-->`oi"nk"`
 4-->`t`
 5-->`ladida`
 
 Wishlist:
 0: Would wish to not have element 0.
 1: fine, but should be element token 0.
 2: \t tab no longer recognized, tab should have been ingnored
 3: Perfect
 4: same as 2
 5: Perfect
 "6" comment ignored, fine
Well, you said that the token delimiter was the double-quote. Also, you used the 'raw' string format so the sequence "\t" is not a tab but literally a backslash-t combination. So this would have been broken up like this ... ` ` `moo` ` \t ` ` oi\"nk\" ` ` \t "` `ladida ` ` //Comment` Then when leading and trailing spaces are removed you get ... `` `moo` `\t` `oi\"nk\"` `\t` `ladida` `//Comment` Then applying escaped characters `` `moo` `t` `oi"nk"` `t` `ladida` `//Comment` Then when removing comments ... `` `moo` `t` `oi"nk"` `t` `ladida`
 I would hope to do this to a line:
 
 `"moo" <whitespaces> " oi\"nk\"  " <whitespaces> "ladida "//Comment`
 
 ->0: `moo`
 ->1: `oi"nk"`
 ->2: `ladida`
 
Toks = TokenizeLine(`"moo" <whitespaces> " oi\"nk\" " <whitespaces> "ladida "//Comment`", ""); // Toks --> { `"`, `moo`, `"`, ` oi"nk" `"`, `ladida` } int i; foreach(char[] aTok; Toks) { if (aTok != `"`) { writefln("->%d: `%s`", i, std.string.strip(aTok)); i++; } }
 Presently it would not be possibly to rely on a specific column to contain the
 info a specific double quote pair.
 
 Could that be made possible?
I suppose so, but it is designed to handle free form text and not column-delimited stuff. -- Derek Parnell Melbourne, Australia 23/03/2005 1:40:47 AM
Mar 22 2005
parent reply AEon <AEon_member pathlink.com> writes:
Derek Parnell,

At least in my case:

[Weapons]
"0"	" killed   by MOD_SHOTGUN"	"Shotgun"	"SG"
"1"	" killed   by MOD_GAUNTLET"	"Gauntlet"	"G"

something quite simple just occured to me. When using std.string.splitline() to
read complete lines from a text file you, will *never* encounter a \n in the
line, since that would have placed the content on another line. 

So when you have something line this:

"1"	" killed \" \" by MOD_GAUNTLET"	"Gauntlet"	"G"

You cound do a replace \", \n and be sure that line will not loose any
information.

Then char[][] spline = split(line, "\""); And finally replace any \n back to \"
(or right back to " depending how you want to use the spline elements).

Obviously your code is a lot more flexible, but as we all strive for "KISS" ;)
my idea should work quite well.

BTW: I have been noting, many feedback posts should really be archived,
especially all the very useful code-examples.

AEon
Mar 22 2005
parent reply J C Calvarese <jcc7 cox.net> writes:
AEon wrote:
 Derek Parnell,
 BTW: I have been noting, many feedback posts should really be archived,
 especially all the very useful code-examples.
 
 AEon
I'm not sure what you mean by "archive". It's not like Walter clears out these newsgroups at the end of every month. Walter has even produced some handy index pages such as http://www.digitalmars.com/d/archives/digitalmars/D/index.html They are particularly useful because Google indexes them (http://www.digitalmars.com/d/archives/advancedsearch.html). Apparently, he hasn't spun out "archives" for this particular newsgroup yet, but I'm sure he will eventually. On the other hand, if by "archive" you mean gathering together snippets of code, the dsource tutorials projects has already done some of this: http://www.dsource.org/tutorials/. If you think new examples are being added too slowly, you're welcome to start adding some yourself. :) Also, there's an ever-growing amount of useful information available at Wiki4D. Two of my favorite pages: http://www.prowiki.org/wiki4d/wiki.cgi?NewsDmD http://www.prowiki.org/wiki4d/wiki.cgi?ErrorMessages Everyone is invited to add and/or update to the wiki content, too. It's much easier than writting HTML -- and quite self-explanatory. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 22 2005
parent reply AEon <aeon2001 lycos.de> writes:
J C Calvarese wrote:

 AEon wrote:
 BTW: I have been noting, many feedback posts should really be archived,
 especially all the very useful code-examples.
I'm not sure what you mean by "archive". It's not like Walter clears out these newsgroups at the end of every month.
True enough, and now that I finally have a newsreader installed, (Mozilla Thunderbird), I can download and search the posts. I had not been aware of this, since I never used newsgroups before :). I am more of a Forum guy.
 Walter has even produced some handy index pages such as 
 http://www.digitalmars.com/d/archives/digitalmars/D/index.html
 
 They are particularly useful because Google indexes them 
 (http://www.digitalmars.com/d/archives/advancedsearch.html).
 
 Apparently, he hasn't spun out "archives" for this particular newsgroup 
 yet, but I'm sure he will eventually.
Personally I hope Walter does have to "waste" his time with things like that too much, giving him more time to work on D. So if the updates are less regular that is fine.
 On the other hand, if by "archive" you mean gathering together snippets 
 of code, the dsource tutorials projects has already done some of this: 
 http://www.dsource.org/tutorials/. If you think new examples are being 
 added too slowly, you're welcome to start adding some yourself. :)
That was what I had been thinking about. And I would normally help with this, but I am desperately trying to recode some 1000+ hours of AEstats coding to D, and that takes up all my time. But once that is done I'd be glad to help. Till then I should have a more solid grasp of D as well. AEon
Mar 23 2005
parent reply J C Calvarese <jcc7 cox.net> writes:
AEon wrote:
 J C Calvarese wrote:
...
 True enough, and now that I finally have a newsreader installed, 
 (Mozilla Thunderbird), I can download and search the posts. I had not 
 been aware of this, since I never used newsgroups before :). I am more 
 of a Forum guy.
I made that transition myself a few years ago (web forums -> newsreader). I still regularly use the web interface when I'm away from my home computer, but I much prefer Thunderbird when it's available.
 Apparently, he hasn't spun out "archives" for this particular 
 newsgroup yet, but I'm sure he will eventually.
Personally I hope Walter does have to "waste" his time with things like that too much, giving him more time to work on D. So if the updates are less regular that is fine.
I think it's automated to where it isn't much effort for him. He probably just pushes a button every month or so. (I don't want him to waste a lot of time on it either, but it's nice to have Google index the newsgroup messages.)
 On the other hand, if by "archive" you mean gathering together 
 snippets of code, the dsource tutorials projects has already done some 
 of this: http://www.dsource.org/tutorials/. If you think new examples 
 are being added too slowly, you're welcome to start adding some 
 yourself. :)
That was what I had been thinking about. And I would normally help with this, but I am desperately trying to recode some 1000+ hours of AEstats coding to D, and that takes up all my time. But once that is done I'd be glad to help. Till then I should have a more solid grasp of D as well. AEon
Good. No pressure, I was just suggesting some easy ways to collaborate. AEstats sounds like an interesting use of time, too. ;) -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 23 2005
parent AEon <aeon2001 lycos.de> writes:
J C Calvarese wrote:

 No pressure, I was just suggesting some easy ways to collaborate. 
 AEstats sounds like an interesting use of time, too. ;)
:9 In 4 days I have been able to do more in AEstats (in D), than I was able to do in AEstats++ (in C) in 3-4 weeks. Since in D I no longer need to use pointers, strings are for free, and D has very powerful easy to use sting functions, most of my code is basically error checking, e.g. config using invalid syntax, missing double quotes and the like. The code itself is very minimal. This is the way C could should always have been! I already have all the hardcoded obituaries replaced with configuration obituaries that are read on the fly. Sure this is not really a big deal, trying to coding that in C made me weep... And the best part, AEstats (then called AEstats++) will sooner or later be database driven via MYSQL... and that also should be a *lot* simpler to do than an C++ code. AEon
Mar 25 2005