digitalmars.D - DMD 0.92 release

Walter (7/7) Jun 07 2004 It's now possible to do assymmetrical operator overloads with commutativ...

DemmeGod (2/11) Jun 07 2004
J Anderson (7/15) Jun 07 2004 Wow, Walter you must have been visited by an angle or something last

J C Calvarese (6/7) Jun 07 2004 Hip! Hip! Hooray!

Charlie (2/9) Jun 07 2004

James Widman (6/8) Jun 07 2004 Angles? Dude, that's sick. I can't help but picture some poor

Sean Kelly (6/12) Jun 07 2004 Great timing :) I had just started looking at streams today. Just to c...

Walter (7/20) Jun 07 2004 Not

David L. Davis (7/14) Jun 07 2004 Walter: Thxs! For the "Added default arguments to function parameters." ...

Walter (5/9) Jun 07 2004 near

David L. Davis (9/18) Jun 07 2004 Walter: Sure, I'd be happy to donate the ifind() and irfind() functions ...

Walter (1/1) Jun 07 2004 Post it and let's have a look!

Arcane Jill (2/2) Jun 08 2004 This is the best release of D I've seen so far. It's brilliant. Well don...

Walter (3/4) Jun 08 2004 Thanks! I hope each one is better than the last .

David L. Davis (87/88) Jun 08 2004 Walter: Here they be. :) I kept to a very simple "KISS" approach, and bu...

Walter (15/106) Jun 08 2004 Thanks! I have some suggestions, though . First, using toupper()

Arcane Jill (15/17) Jun 08 2004 (I'm actually replying to David here, not Walter)

David L. Davis (13/31) Jun 08 2004 Jill: I just missed seeing your message before I posted the code again, ...

David L. Davis (119/124) Jun 08 2004 Walter: Ok, I think I've followed your suggestions correctly. :) Please ...

Walter (12/138) Jun 08 2004 The function uppercases the input string. It shouldn't modify its inputs...

Regan Heath (7/182) Jun 08 2004 A perfect example of where 'in' should mean 'const' and the compiler

David L. Davis (6/12) Jun 08 2004 Regan: If an "in" acts like an "inout" for strings, does it do this for ...

Regan Heath (12/31) Jun 08 2004 No. Strings are passed by reference, (int, real, long, etc.) are not, se...
Arcane Jill (14/41) Jun 08 2004 I echo that sentiment.

David L. Davis (224/225) Jun 08 2004 Walter: Third time around is normally the "Charm!" Anywayz, I've been ha...

Vathix (11/18) Jun 09 2004 hammering

David L. Davis (16/23) Jun 09 2004 Vathix: I looked over your stringclass.d code that's based off of Walter...

Walter (10/10) Jun 09 2004 There's no need to .dup the strings. Just have a loop that looks like th...

David L. Davis (321/331) Jun 10 2004 Walter: Ok, per your advice I've copied the original find() / rfind() fu...

Walter (23/359) Jun 10 2004 That's more like it! Now, can the unit tests also test the case

David L. Davis (384/386) Jun 11 2004 Walter: Opps! Sorry about that...I've now added in some additional unitt...
David L. Davis (398/400) Jun 12 2004 Walter: Darn, I had to fix just one more thing in the code...I discovere...
David L. Davis (119/119) Jun 15 2004 Walter: It would sure be nice to also have the ireplace() and icount() f...

Arcane Jill (6/8) Jun 09 2004 Ah! Now these old ASCII habits really should be dropped. Hauke has writt...

Chr. Grade (6/14) Jun 09 2004 Haven't read any related postings before, so I don't know if I'm far off...
Stewart Gordon (12/20) Jun 09 2004

Hauke Duden (8/30) Jun 09 2004 That is exactly the same as using 0x20 directly since D's character

Stewart Gordon (11/13) Jun 09 2004 Yes, you do have a point there. What's more, there isn't a 1:1 mapping

Hauke Duden (10/21) Jun 09 2004 You're wrong. the Unicode standard defines 1:1 case mappings (see

Stewart Gordon (12/26) Jun 09 2004 There seems to be a contradiction here. That file indicates that

Arcane Jill (17/26) Jun 09 2004 Look, it's perfectly simple. Everybody's right. And because everybody's ...

Hauke Duden (5/8) Jun 09 2004 Ouch. I didn't know that. Makes me feel happy that I stayed away from

Arcane Jill (33/41) Jun 10 2004 Actually, I think I'd quite like to have a bash at writing some of the U...

Hauke Duden (22/55) Jun 10 2004 I think it is a good idea to coordinate our Unicode efforts. I haven't

Walter (6/16) Jun 10 2004 support. Even

Hauke Duden (13/18) Jun 12 2004 Ok, I finally got around to looking at it. It seems that UPR simply

Arcane Jill (7/10) Jun 12 2004 Yeah, I stand corrected. The format isn't useful to us. I thought it wou...

Arcane Jill (35/37) Jun 13 2004 I'm a bit more awake now. The approach that I took when I had to do this...

Hauke Duden (46/92) Jun 13 2004 It sounds like you're really excited about this one ;). Your ideas sound...

Arcane Jill (18/40) Jun 13 2004 I'd thought of that.

Hauke Duden (14/48) Jun 13 2004 Me, panicking? No chance ;).

Hauke Duden (15/46) Jun 09 2004 Where did you get that information? From the data file

Stewart Gordon (17/34) Jun 10 2004 So that's why the uppercase form is given twice. I couldn't find a key

Hauke Duden (7/19) Jun 10 2004 The file format is described here:

Stewart Gordon (18/26) Jun 10 2004 The first column has been omitted from that list.

Arcane Jill (20/22) Jun 10 2004 I know nothing about Welsh, but, I do know that Welsh is NOT an exceptio...

Walter (16/27) Jun 10 2004 /hope/

David L. Davis (12/20) Jun 09 2004 Jill: Don't sweat it, all your advice has been encouraging! :) If I wasn...

Charlie (2/21) Jun 07 2004 Why not tolower() both of the strings ?

How to compare case in Unicode (9/10) Jun 07 2004 That won't cover all cases in Unicode, but there is a similar function d...
Arcane Jill (9/10) Jun 07 2004 That won't cover all cases in Unicode, but there is a similar function d...

Arcane Jill (14/18) Jun 07 2004 Now that we have Hauke's Unichar stuff, we can do better than ASCII, we ...

Jeroen van Bemmel (3/3) Jun 07 2004 Binary operator overloading: for consistency, I would suggest replacing

Ant (4/7) Jun 07 2004 I think this was discussed before.

Ivan Senji (13/24) Jun 08 2004 charset="iso-8859-1"

Stewart Gordon (8/13) Jun 08 2004 Maybe. It could also clear up what's FWIS a common coding error.
Walter (12/12) Jun 08 2004 charset="iso-8859-1"

Hauke Duden (4/13) Jun 08 2004 Default function arguments! Yay! Thanks Walter!
Stewart Gordon (12/19) Jun 08 2004 Kris's dsc.io project did that all along, so what's new?

Ant (4/18) Jun 08 2004 Give him a break, he said before he can't cope with every thing
Walter (8/22) Jun 08 2004 commutative

Stewart Gordon (28/35) Jun 08 2004 Well, just after I posted it, there was a bit of a debate over this and

Walter (4/30) Jun 08 2004 My thought as well was to include a starting bit offset. The downside to

Ivan Senji (3/40) Jun 08 2004 But if bit slicing would work then, then it isn't a loss but a gain!

Arcane Jill (12/21) Jun 08 2004 I don't think Walter will have any problem getting bit slicing to work. ...

Stewart Gordon (13/28) Jun 09 2004 I thought that inout bits were already not supported. Unless that's

Walter (2/2) Jun 09 2004 Another option is to only allow bit slicing on byte boundaries, and only

Arcane Jill (4/6) Jun 09 2004 That's EXACTLY what my workaround does. You can have the code for free i...

Walter (4/12) Jun 10 2004 you

Arcane Jill (6/184) Jun 11 2004 Oky doke - here goes. One thing though - this is merely a workaround for

Sean Kelly (6/18) Jun 11 2004 Very nice. This brings up a question though... assuming I want to know ...
Stewart Gordon (11/16) Jun 14 2004

Arcane Jill (2/4) Jun 14 2004 Excellent!

J Anderson (25/28) Jun 09 2004 Yeah I've been thinking this is probably the best option. Users could
Stewart Gordon (9/11) Jun 10 2004 Yes, that was another suggestion in the debate. But I'm inclined to

Sean Kelly (11/18) Jun 08 2004 Me too, but darned if I've come up with the answer so far. But if push ...

Kris (5/27) Jun 08 2004 I'l be watching for your alternative Sean ...

"Walter" <newshound digitalmars.com> writes:

It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html

Jun 07 2004

DemmeGod <me demmegod.com> writes:

Package attribute!  Yeee-haw!

On Mon, 07 Jun 2004 14:33:37 -0700, Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
 
 http://www.digitalmars.com/d/changelog.html

Jun 07 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Walter wrote:

It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html
  

Wow, Walter you must have been visited by an angle or something last 
night.  You've done a complete backflip on so many issues.  Not that I 
mind, its a good thing you keep an opened mind.

default arguments yay.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 07 2004

J C Calvarese <jcc7 cox.net> writes:

J Anderson wrote:
...
 default arguments yay.

Hip! Hip! Hooray!

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Jun 07 2004

Charlie <Charlie_member pathlink.com> writes:

<sings>For hes a jolly good fellow</sings>

In article <ca2v2q$1n14$1 digitaldaemon.com>, J C Calvarese says...
J Anderson wrote:
...
 default arguments yay.

Hip! Hip! Hooray!

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Jun 07 2004

James Widman <james jwidman.com> writes:

In article <ca2oi1$1d41$1 digitaldaemon.com>,
 J Anderson <REMOVEanderson badmama.com.au> wrote:
 Wow, Walter you must have been visited by an angle or something last 
 night. 

Angles? Dude, that's sick.  I can't help but picture some poor 
Flatlander's 2-dimensional body parts strewn about... :-)

All of us healthy-minded people know Walter was really visited by Mr. 
Hyper-sphere from 4-space.

Jun 07 2004

Sean Kelly <sean f4.ca> writes:

In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

Great timing :)  I had just started looking at streams today.  Just to clarify,
it looks like the old rules would not evaluate b.opfunc_r(a) if a.opfunc is
defined, whether or not there was an overload for a.opfunc(b).  Do I have this
right?

Sean

Jun 07 2004

"Walter" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message
news:ca2pn3$1et4$1 digitaldaemon.com...
 In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D.


Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

 Great timing :)  I had just started looking at streams today.  Just to

clarify,
 it looks like the old rules would not evaluate b.opfunc_r(a) if a.opfunc

is
 defined, whether or not there was an overload for a.opfunc(b).  Do I have

this
 right?

Right.

Jun 07 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html

Walter: Thxs! For the "Added default arguments to function parameters." :)) Now
I can pull out all my wrapper functions...this is some really Great News!!

<*Wonders*> To you think Phobos.std.string could get a non-case sensitive
version of find (ifind) and rfind (irfind) added to it sometime in the near
future? It would be very useful (even if it just does ASCII). Thxs for your
reply in advance. :)

Jun 07 2004

"Walter" <newshound digitalmars.com> writes:

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca2u66$1lui$1 digitaldaemon.com...
 <*Wonders*> To you think Phobos.std.string could get a non-case sensitive
 version of find (ifind) and rfind (irfind) added to it sometime in the

near
 future? It would be very useful (even if it just does ASCII). Thxs for

your
 reply in advance. :)

Do you want to write one and donate it?

Jun 07 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca37ns$24qn$1 digitaldaemon.com>, Walter says...
"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca2u66$1lui$1 digitaldaemon.com...
 <*Wonders*> To you think Phobos.std.string could get a non-case sensitive
 version of find (ifind) and rfind (irfind) added to it sometime in the

near
 future? It would be very useful (even if it just does ASCII). Thxs for

your
 reply in advance. :)

Do you want to write one and donate it?

Walter: Sure, I'd be happy to donate the ifind() and irfind() functions I've
already written to solve my problem. Both have the normal (char[], char[])
parameters of find() and rfind(), but I've added a third int parameter that
allows setting the starting position for the search within the String, which
nomally defaults to a "0" when only the first two are passed in. But it's
nothing fancy compared to the code I've seen from others here in the forum. 

So how would go about donating code toward making "D" better developer tool? :))

"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 07 2004

"Walter" <newshound digitalmars.com> writes:

Post it and let's have a look!

Jun 07 2004

Arcane Jill <Arcane_member pathlink.com> writes:

This is the best release of D I've seen so far. It's brilliant. Well done.
Jill

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca3pa5$1r7$1 digitaldaemon.com...
 This is the best release of D I've seen so far. It's brilliant. Well done.

Thanks! I hope each one is better than the last <g>.

Jun 08 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca3jmn$2pjm$1 digitaldaemon.com>, Walter says...
Post it and let's have a look!

Walter: Here they be. :) I kept to a very simple "KISS" approach, and builded
these functions upon the existing std.string functions. But if you'd like me to,
I could make these functions independent of the other std.string functions so
that these are in a stand alone raw "D" code format. I just didn't feel at the
time that I should to follow a "Recreate the Wheel" approach, when the existing
functions worked fine for what I needed.

Note - My indenting will disappear when I post this thru the Web...sorry about
that! :(

/*******************************************************************
* Function      : int ifind( in char[], in char[], in int = 0 )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and set 
*                           the third parameter as a default of 0
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
*******************************************************************
*
* Note: Meant to be a case insensitive version of std.string.find
*       with an optional start looking from this "String Position" parameter.
*/	
int ifind
(
in char[] sStr,
in char[] sSubStr,
in int    iStartPos = 0
)
{
char[] sTmpStr;
int    iRtnVal;

// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

// If greater than to upper boundary return not found    
if ( iStartPos > sStr.length - 1 ) 
return -1;  

// If less than to lower boundary return not found   
else if ( iStartPos < 0 ) 
return - 1; 

sTmpStr = tolower( sStr[ iStartPos .. sStr.length ] );

if ( iStartPos == 0 ) 
return find( sTmpStr, tolower( sSubStr ) );   
else
{
iRtnVal = find( sTmpStr, tolower( sSubStr ) );

return ( iRtnVal != -1 ) ? iStartPos + iRtnVal : iRtnVal;
} 
} // end int ifind( char[],char[], int = 0 ) 


/*******************************************************************
* Function      : int irfind( in char[], in char[], in int = -1 )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and set
*                           the third parameter as a default of -1
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
*******************************************************************
*
* Note: Meant to be a case insensitive version of std.string.rfind
*       with an optional start looking from this "String Position" parameter.
*/	
int irfind
(
in char[] sStr,
in char[] sSubStr,
in int    iEndPos = -1
)
{
char[] sTmpStr;

// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

// If iEndPos == -1 get the full length of the string
if ( iEndPos == -1 ) 
iEndPos = sStr.length - 1;

// If greater than to upper boundary return not found    
else if ( iEndPos > sStr.length - 1 ) 
return -1;  

// If less than to lower boundary return not found   
else if ( iEndPos < 0 ) 
return - 1;      

sTmpStr = tolower( sStr[ 0 .. iEndPos + 1 ] );

return rfind( sTmpStr, tolower( sSubStr ) );   

} // end int irfind( char[],char[], int = -1 ) 

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice the
string to be searched. -Walter

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca4lma$1m9r$1 digitaldaemon.com...
 In article <ca3jmn$2pjm$1 digitaldaemon.com>, Walter says...
Post it and let's have a look!

 Walter: Here they be. :) I kept to a very simple "KISS" approach, and

builded
 these functions upon the existing std.string functions. But if you'd like

me to,
 I could make these functions independent of the other std.string functions

so
 that these are in a stand alone raw "D" code format. I just didn't feel at

the
 time that I should to follow a "Recreate the Wheel" approach, when the

existing
 functions worked fine for what I needed.

 Note - My indenting will disappear when I post this thru the Web...sorry

about
 that! :(

 /*******************************************************************
 * Function      : int ifind( in char[], in char[], in int = 0 )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and set
 *                           the third parameter as a default of 0
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 *       with an optional start looking from this "String Position"

parameter.
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr,
 in int    iStartPos = 0
 )
 {
 char[] sTmpStr;
 int    iRtnVal;

 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // If greater than to upper boundary return not found
 if ( iStartPos > sStr.length - 1 )
 return -1;

 // If less than to lower boundary return not found
 else if ( iStartPos < 0 )
 return - 1;

 sTmpStr = tolower( sStr[ iStartPos .. sStr.length ] );

 if ( iStartPos == 0 )
 return find( sTmpStr, tolower( sSubStr ) );
 else
 {
 iRtnVal = find( sTmpStr, tolower( sSubStr ) );

 return ( iRtnVal != -1 ) ? iStartPos + iRtnVal : iRtnVal;
 }
 } // end int ifind( char[],char[], int = 0 )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[], in int = -1 )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and set
 *                           the third parameter as a default of -1
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind
 *       with an optional start looking from this "String Position"

parameter.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr,
 in int    iEndPos = -1
 )
 {
 char[] sTmpStr;

 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // If iEndPos == -1 get the full length of the string
 if ( iEndPos == -1 )
 iEndPos = sStr.length - 1;

 // If greater than to upper boundary return not found
 else if ( iEndPos > sStr.length - 1 )
 return -1;

 // If less than to lower boundary return not found
 else if ( iEndPos < 0 )
 return - 1;

 sTmpStr = tolower( sStr[ 0 .. iEndPos + 1 ] );

 return rfind( sTmpStr, tolower( sSubStr ) );

 } // end int irfind( char[],char[], int = -1 )

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
(in response to David L. Davis)
I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string.

(I'm actually replying to David here, not Walter)
Another big problem with uppercasing the whole string is that it could be very
slow. Imagine if the strings you were comparing were gigabytes long. Now imagine
that the substring you were looking for could have been found right near the
start of the string. Converting the case of all those gigs would have been
unnecessary.

Sorry to be a downer, but I learned this the hard way a couple of years ago. I
actually managed to implement the whole Unicode case comparison algorithm for
real, including special casing and everything. Man, was it S-L-O-W. (It
casefolded everything before the compare even started). Then I optimized it by
making look at only as much of the strings as it needed to, and after that it
whizzed by.

Jill

Jun 08 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca4r7l$20ig$1 digitaldaemon.com>, Arcane Jill says...
In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
(in response to David L. Davis)
I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string.

(I'm actually replying to David here, not Walter)
Another big problem with uppercasing the whole string is that it could be very
slow. Imagine if the strings you were comparing were gigabytes long. Now imagine
that the substring you were looking for could have been found right near the
start of the string. Converting the case of all those gigs would have been
unnecessary.

Sorry to be a downer, but I learned this the hard way a couple of years ago. I
actually managed to implement the whole Unicode case comparison algorithm for
real, including special casing and everything. Man, was it S-L-O-W. (It
casefolded everything before the compare even started). Then I optimized it by
making look at only as much of the strings as it needed to, and after that it
whizzed by.

Jill

Jill: I just missed seeing your message before I posted the code again, sorry.
But course you're right, if a very large string of data in passed in it's will
slow things down a lot... darn, how in the heck did I miss that, cause I too
have had to deal with a similar problem few years ago too.  :)

I think what it is, is with Walter giving me a half a chance to add something
(no matter how small it may be) to the Phoboes.std library, is a real "Honor"
and I don't want to blow it. :) Just knowing how very busy Walter is, I really
appreciate him giving me this golden opportunity to contribute, and to feel a
part of the "D" community. 

I feel like a young Skywalker in training, learning how to best use "The Force!"


-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice the
string to be searched. -Walter

Walter: Ok, I think I've followed your suggestions correctly. :) Please let me
know if I've gotten it right, or if I'm off track somehow?

import std.c.stdio;
import std.string;

/*******************************************************************
* Function      : int ifind( in char[], in char[] )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and the
*                           default parameter, mainly because the 
*                           string being passed in should be already 
*                           sliced so that the next search will find
*                           the matching sub-string value. Also per 
*                           advice from Walter, <g> I've removed every 
*                           tolower() call, and now locally all characters
*                           in the strings are set to lowercase where they  
*                           sit without the need to create a another copy. 
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
*******************************************************************
*
* Note: Meant to be a case insensitive version of std.string.find
*/	
int ifind
(
in char[] sStr,
in char[] sSubStr
)
{
// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

// sStr set to lowercase locally
// lowercase ascii a = '\x61', uppercase ascii A = '\x41'
foreach ( int iStrPos, char cChar; sStr )    
sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] + 0x20 :
sStr[ iStrPos ];

// sSubStr set to lowercase locally    
// lowercase ascii a = '\x61', uppercase ascii A = '\x41' 
foreach ( int iStrPos, char cChar; sSubStr ) 
sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[ iStrPos ] +
0x20 : sSubStr[ iStrPos ];  

return find( sStr, sSubStr );

} // end int ifind( in char[], in char[] ) 


/*******************************************************************
* Function      : int irfind( in char[], in char[] )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and the
*                           default parameter, mainly because the 
*                           string being passed in should be already 
*                           sliced so that the next search will find
*                           the matching sub-string value. Also per 
*                           advice from Walter, <g> I've removed every 
*                           tolower() call, and now locally all characters
*                           in the strings are set to lowercase where they  
*                           sit without the need to create a another copy. 
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
*******************************************************************
*
* Note: Meant to be a case insensitive version of std.string.rfind.
*/	
int irfind
(
in char[] sStr,
in char[] sSubStr
)
{
// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

// sStr set to lowercase locally
// lowercase ascii a = '\x61', uppercase ascii A = '\x41'
foreach ( int iStrPos, char cChar; sStr )    
sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] + 0x20 :
sStr[ iStrPos ];

// sSubStr set to lowercase locally    
// lowercase ascii a = '\x61', uppercase ascii A = '\x41' 
foreach ( int iStrPos, char cChar; sSubStr ) 
sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[ iStrPos ] +
0x20 : sSubStr[ iStrPos ];

return rfind( sStr, sSubStr ); 

} // end int irfind( in char[], in char[] ) 

// Test ifind() and irfind() to find multiples of the same sub-string
int main()
{

int    iStrPos   = 0;
int    iSlicePos = 0;
char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland, Texas";

printf( "Original = %.*s\n", sStrTest );

iStrPos   = 0;
iSlicePos = 0;

while ( iSlicePos != -1 )
{
iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

if ( iSlicePos != -1 ) 
{
printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos + iSlicePos );
iStrPos = iStrPos + iSlicePos + "PO".length; 
}
}

printf("\n\n"); 

iStrPos  = sStrTest.length - 1;
iSlicePos = 0;

while ( iSlicePos != -1 && iStrPos >= 0 )
{
iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" );

if ( iSlicePos != -1 )
{
printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
iStrPos = iSlicePos - "PO".length;
}
}

return 0;

} // end int main()

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

The function uppercases the input string. It shouldn't modify its inputs.

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca54is$2h2r$1 digitaldaemon.com...
 In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to


avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice


the
string to be searched. -Walter

 Walter: Ok, I think I've followed your suggestions correctly. :) Please

let me
 know if I've gotten it right, or if I'm off track somehow?

 import std.c.stdio;
 import std.string;

 /*******************************************************************
 * Function      : int ifind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all characters
 *                           in the strings are set to lowercase where they
 *                           sit without the need to create a another copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +

0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[

iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return find( sStr, sSubStr );

 } // end int ifind( in char[], in char[] )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all characters
 *                           in the strings are set to lowercase where they
 *                           sit without the need to create a another copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +

0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[

iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return rfind( sStr, sSubStr );

 } // end int irfind( in char[], in char[] )

 // Test ifind() and irfind() to find multiples of the same sub-string
 int main()
 {

 int    iStrPos   = 0;
 int    iSlicePos = 0;
 char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland,

Texas";
 printf( "Original = %.*s\n", sStrTest );

 iStrPos   = 0;
 iSlicePos = 0;

 while ( iSlicePos != -1 )
 {
 iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos +

iSlicePos );
 iStrPos = iStrPos + iSlicePos + "PO".length;
 }
 }

 printf("\n\n");

 iStrPos  = sStrTest.length - 1;
 iSlicePos = 0;

 while ( iSlicePos != -1 && iStrPos >= 0 )
 {
 iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
 iStrPos = iSlicePos - "PO".length;
 }
 }

 return 0;

 } // end int main()

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com> 
wrote:
 The function uppercases the input string. It shouldn't modify its inputs.

A perfect example of where 'in' should mean 'const' and the compiler 
should catch this error.

Regan

 "David L. Davis" <SpottedTiger yahoo.com> wrote in message
 news:ca54is$2h2r$1 digitaldaemon.com...
 In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to


 avoid
it if you can, i.e. just go through character by character and 

 comparing
that way. Second, I prefer to avoid a starting index; instead just 

 slice

 the
string to be searched. -Walter

 Walter: Ok, I think I've followed your suggestions correctly. :) Please

 let me
 know if I've gotten it right, or if I'm off track somehow?

 import std.c.stdio;
 import std.string;

 /*******************************************************************
 * Function      : int ifind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all 
 characters
 *                           in the strings are set to lowercase where 
 they
 *                           sit without the need to create a another 
 copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +

 0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[

 iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return find( sStr, sSubStr );

 } // end int ifind( in char[], in char[] )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all 
 characters
 *                           in the strings are set to lowercase where 
 they
 *                           sit without the need to create a another 
 copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +

 0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[

 iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return rfind( sStr, sSubStr );

 } // end int irfind( in char[], in char[] )

 // Test ifind() and irfind() to find multiples of the same sub-string
 int main()
 {

 int    iStrPos   = 0;
 int    iSlicePos = 0;
 char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland,

 Texas";
 printf( "Original = %.*s\n", sStrTest );

 iStrPos   = 0;
 iSlicePos = 0;

 while ( iSlicePos != -1 )
 {
 iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos +

 iSlicePos );
 iStrPos = iStrPos + iSlicePos + "PO".length;
 }
 }

 printf("\n\n");

 iStrPos  = sStrTest.length - 1;
 iSlicePos = 0;

 while ( iSlicePos != -1 && iStrPos >= 0 )
 {
 iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
 iStrPos = iSlicePos - "PO".length;
 }
 }

 return 0;

 } // end int main()

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"




-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 08 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <opr9aqpftm5a2sq9 digitalmars.com>, Regan Heath says...
On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com> 
wrote:
 The function uppercases the input string. It shouldn't modify its inputs.

A perfect example of where 'in' should mean 'const' and the compiler 
should catch this error.

Regan

Regan: If an "in" acts like an "inout" for strings, does it do this for all the
other different types (int, real, long, etc.) too?  :(  Seems confusing, when is
an "in" and "in" and not an "inout?"

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 8 Jun 2004 22:35:23 +0000 (UTC), David L. Davis 
<SpottedTiger yahoo.com> wrote:

 In article <opr9aqpftm5a2sq9 digitalmars.com>, Regan Heath says...
 On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com>
 wrote:
 The function uppercases the input string. It shouldn't modify its 
 inputs.

 A perfect example of where 'in' should mean 'const' and the compiler
 should catch this error.

 Regan

 Regan: If an "in" acts like an "inout" for strings, does it do this for 
 all the
 other different types (int, real, long, etc.) too?

No. Strings are passed by reference, (int, real, long, etc.) are not, see 
below..

  :(  Seems confusing, when is
 an "in" and "in" and not an "inout?"

Exactly!

I believe strings and other arrays are all passed by reference and due to 
this you can change the *contents* of the string, but not the *reference* 
to the string. If you passed it as an inout you could change both the 
*contents* and the *reference*.

Regan.

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca5evb$1d8$1 digitaldaemon.com>, David L. Davis says...
If an "in" acts like an "inout" for strings, does it do this for all the
other different types (int, real, long, etc.) too?

For array, classes and pointers, but not for primitive types or for structs.


 :(

I echo that sentiment.


Seems confusing, when is
an "in" and "in" and not an "inout?"

Unlike C and C++, D provides no DbC mechanism for catching const errors. You can
do it, but you have to try REALLY hard. The following example WILL assert as a
consequence of a DbC const error (I've tested it):

   private char[][] backup;
   void f(in char[] s)
   in
   {
       backup.length = backup.length + 1;
       backup[backup.length-1] = s.dup;
   }
   out
   {
       assert(s == backup[backup.length-1]);
       backup.length = backup.length - 1;
   }
   body
   {
       s[0] ='*'; // violates my DbC assertion of s's constness
   }

   int main(char[][] args)
   {
       char[] s = "hello";
       f(s);
       return 0;
   }

However - even THAT won't work if an exception is thrown or if the code is
multi-threaded. You'd have to also make the whole thing synchronized AND wrapped
in try/catch to ensure you got that. (And, so far as I know, there is no way to
introduce either "synchronized" or "try/catch" in a release build only, without
writing the whole function twice).

So - like you so eloquently put it earlier,

:(

Jill

Jun 08 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca5ct6$2vif$1 digitaldaemon.com>, Walter says...
The function uppercases the input string. It shouldn't modify its inputs.

Walter: Third time around is normally the "Charm!" Anywayz, I've been hammering
away at these two functions ifind() and irfind(), and I believe I've make them
much better than before, thanks to both you and Jill for the advice.

Please, let me know if I've still missed something, but if not I may ask some of
the folks here to do a little testing of these functions. Right now tho I'm
tried and seeing double (it's late here), so I'll check what's up in the
morning. I will be bright and brushly tailed tomorrow to fix any problems found.

Thxs for giving a chance at denoting some code...I've learned a few more things
about how to use "D", and that's all good indeed! :))


import std.c.stdio;
import std.string;

/****************************************************************************
* Function      : int ifind( in char[], in char[] )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and the
*                           default parameter, mainly because the 
*                           string being passed in should be already 
*                           sliced so that the next search will find
*                           the matching sub-string value. Also per 
*                           advice from Walter, <g> I've removed every 
*                           tolower() call, and now locally all characters
*                           in the strings are set to lowercase where they  
*                           sit without the need to create a another copy. 
*               : 09.Jun.04 Reworked the whole thing! Fixed the problem
*                           with the input string getting stepped on, and
*                           now only the sSubStr to duped to another string.
*                           While the sStr string is looked at in a loop
*                           looking for the matchng SubString...a character
*                           at a time.
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
*****************************************************************************
*
* Note: Meant to be a case insensitive version of std.string.find
*/	
int ifind
(
in char[] sStr,
in char[] sSubStr
)
{
char[] sSubStrTmp;
bool   bFoundMatch   = false;
int    iFound1stPos  = -1;  
int    iSubStrRunner = 0;
char   cCharTmp; 

// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 || sSubStr.length > sStr.length )
return -1;

// Get a working copy of sSubStr  
sSubStrTmp = sSubStr.dup;

// sSubStrTmp set to lowercase locally    
// lowercase ascii a = '\x61', uppercase ascii A = '\x41' 
foreach ( int iStrPos, char cChar; sSubStrTmp ) 
sSubStrTmp[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStrTmp[ iStrPos
] + 0x20 : sSubStrTmp[ iStrPos ];  

foreach ( int iStrPos, char cChar; sStr )    
{   
cCharTmp = cChar;

// If cChar is an uppercase ASCII, make it lowercase for the compare
cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' :
cCharTmp );

//printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, cChar,
cCharTmp, sStr.length );

// Find the very 1st character of the Sub String is found within the Main String
if ( cCharTmp == sSubStrTmp[ 0 ] && bFoundMatch == false )
{
iFound1stPos  = iStrPos;
bFoundMatch   = true;
iSubStrRunner = 1;

if ( sSubStrTmp.length == 1 ) return iFound1stPos;
continue;
}
// Match the rest of the characters in the Sub String is found within the Main
String
else if ( cCharTmp == sSubStrTmp[ iSubStrRunner ] && bFoundMatch == true )
{
iSubStrRunner++;
if ( iSubStrRunner > sSubStrTmp.length - 1 ) return iFound1stPos;               
continue;
}
// Not all characters match, reset
else if ( bFoundMatch == true )
{
// Not a total match, reset back to defaults
iFound1stPos  = -1;
bFoundMatch   = false;
iSubStrRunner = 0;
}    

}     

return -1;

} // end int ifind( in char[], in char[] ) 


/****************************************************************************
* Function      : int irfind( in char[], in char[] )
* Author        : David L. 'SpottedTiger' Davis
* Language      : DigitalMars "D" aka Mars v0.92
* Created Date  : 03.Jun.04 
* Modified Date : 08.Jun.04 Removed the wrapper function and the
*                           default parameter, mainly because the 
*                           string being passed in should be already 
*                           sliced so that the next search will find
*                           the matching sub-string value. Also per 
*                           advice from Walter, <g> I've removed every 
*                           tolower() call, and now locally all characters
*                           in the strings are set to lowercase where they  
*                           sit without the need to create a another copy. 
*               : 09.Jun.04 Reworked the whole thing! Fixed the problem
*                           with the input string getting stepped on, and
*                           now only the sSubStr to duped to another string.
*                           While the sStr string is looked at in a loop
*                           looking for the matchng SubString...a character
*                           at a time.
* Requirements  : std.string
* Licence       : Same as those for the Phobos (Runtime Library)
****************************************************************************
*
* Note: Meant to be a case insensitive version of std.string.rfind.
*/	
int irfind
(
in char[] sStr,
in char[] sSubStr
)
{

char[] sSubStrTmp;
bool   bFoundMatch   = false;
int    iFound1stPos  = -1;  
int    iSubStrRunner = 0;
char   cCharTmp; 

// If either of the string parameters are empty, return not found
if ( sStr.length < 1 || sSubStr.length < 1 || sSubStr.length > sStr.length )
return -1;

// Get a working copy of sSubStr  
sSubStrTmp = sSubStr.dup;

// sSubStrTmp set to lowercase locally    
// lowercase ascii a = '\x61', uppercase ascii A = '\x41' 
foreach ( int iStrPos, char cChar; sSubStrTmp ) 
sSubStrTmp[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStrTmp[ iStrPos
] + 0x20 : sSubStrTmp[ iStrPos ];

for ( int iStrPos = sStr.length - 1; iStrPos >= 0; iStrPos-- )
{   

cCharTmp = sStr[ iStrPos ];

// If cChar is an uppercase ASCII, make it lowercase for the compare
cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' :
cCharTmp );

//printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, sStr[
iStrPos ], cCharTmp, sStr.length );


// Find the very 1st character of the Sub String is found within the Main String
if ( cCharTmp == sSubStrTmp[ 0 ] && bFoundMatch == false )
{
iFound1stPos  = iStrPos;
bFoundMatch   = true;
iSubStrRunner = 1;

//printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, sStr[
iStrPos ], cCharTmp, sStr.length );

if ( sSubStrTmp.length == 1 ) return iFound1stPos;

if ( iStrPos + 1 > sStr.length - 1 ) continue;

for ( int iInnerLoop = iStrPos + 1; iInnerLoop < sStr.length; iInnerLoop++ )
{ 
cCharTmp = sStr[ iInnerLoop ];

// If cChar is an uppercase ASCII, make it lowercase for the compare
cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' :
cCharTmp );

// Match the rest of the characters in the Sub String is found within the Main
String
if ( cCharTmp == sSubStrTmp[ iSubStrRunner ] && bFoundMatch == true )
{
iSubStrRunner++;
if ( iSubStrRunner > sSubStrTmp.length - 1 ) return iFound1stPos;               
continue;
}
// Not all characters match, reset
else if ( bFoundMatch == true )
{
// Not a total match, reset back to defaults
iFound1stPos  = -1;
bFoundMatch   = false;
iSubStrRunner = 0;
break;
}
}    
}
}

return -1;

} // end int irfind( in char[], in char[] ) 

// Test ifind() and irfind() to find multiple of the same sub-string
int main()
{

int    iStrPos;
int    iSlicePos;
char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland, Texas";

printf( "Original Before = %.*s\n\n", sStrTest );

iStrPos   = 0;
iSlicePos = 0;

while ( iSlicePos != -1 )
{
iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

if ( iSlicePos != -1 ) 
{
printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos + iSlicePos );
iStrPos = iStrPos + iSlicePos + "PO".length; 
}
}

printf("\n\n"); 

iStrPos  = sStrTest.length - 1;
iSlicePos = 0;

while ( iSlicePos != -1 && iStrPos >= 0 )
{
iSlicePos = irfind( sStrTest[ 0 .. iStrPos + 1 ], "PO" );

if ( iSlicePos != -1 )
{
printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
iStrPos = iSlicePos - "PO".length;
}
}

printf("\n\n"); 
printf( "Original After = %.*s\n", sStrTest );

return 0;

} // end int main()

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 08 2004

"Vathix" <vathixSpamFix dprogramming.com> writes:

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca69ie$1813$1 digitaldaemon.com...
 In article <ca5ct6$2vif$1 digitaldaemon.com>, Walter says...
The function uppercases the input string. It shouldn't modify its inputs.

 Walter: Third time around is normally the "Charm!" Anywayz, I've been

hammering
 away at these two functions ifind() and irfind(), and I believe I've make

them
 much better than before, thanks to both you and Jill for the advice.

Hello, I just wanted to let you know that I wrote those functions awhile ago
for a String class that can be found at www.dprogramming.com/stringclass.d .
It contains all the free functions, and a few others such as findany(),
endswith(), etc; and case insensitive versions. I haven't said much about it
because it's completely based off Walter's code, so it belongs to him. The
class can be stripped out to just use the functions. If the code isn't good
enough, just ignore me; have fun!

Jun 09 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca7pet$fn7$1 digitaldaemon.com>, Vathix says...
Hello, I just wanted to let you know that I wrote those functions awhile ago
for a String class that can be found at www.dprogramming.com/stringclass.d .
It contains all the free functions, and a few others such as findany(),
endswith(), etc; and case insensitive versions. I haven't said much about it
because it's completely based off Walter's code, so it belongs to him. The
class can be stripped out to just use the functions. If the code isn't good
enough, just ignore me; have fun!

Vathix: I looked over your stringclass.d code that's based off of Walter's
string.d, and it does looks a lot more in line with what he'll accept. Myself,
I'd just like to have the ifind(char[],char[])/ifind(char[], char) and
irfind(char[], char[])/irfind(char[], char) functions in std.string. 

Anyways, it I would seem my "C" skills have gotten a bit rusty, cause I've been
relearning a lot of things I had forgotten about in trying to create a good
version of the ifind() and irfind() functions for general use.    

So please feel free to post those ifind code potions and see what Walter thinks.
Cause currently I'm beginning to think my versions are going to still be a
little bulkier than what Walter will allow in, and I would like to move on and
finish up my propercase() function (which is why I even mention that the
ifind()/irfind() were missing in the first place). :) And beside that, Walter is
a very busy man and I don't want to waste his time.

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

There's no need to .dup the strings. Just have a loop that looks like this:

for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing memchr
and memcmp with case insensitive loops, write some unit tests, and you'll be
there.

Jun 09 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca7qp4$hrl$1 digitaldaemon.com>, Walter says...
There's no need to .dup the strings. Just have a loop that looks like this:

for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing memchr
and memcmp with case insensitive loops, write some unit tests, and you'll be
there.

Walter: Ok, per your advice I've copied the original find() / rfind() functions
from std.string, and modified them into ifind() / irfind(). I sure hope these
will make the grade <g>. 


"" and all the code will be ready to copy and paste. 

Thxs for giving this chance to add something to "D!" :)








#debug=string;		// uncomment to turn on debugging printf's

#debug(string)




#import std.string;









#int ifind




















#unittest























#int irfind
















c1, c2);








#unittest























#int ifind



































return x;








#unittest



























#int irfind(char[] s, char[] sub)



































return x;







#unittest




















#int main()




cast(char)'a' ) );  

cast(char)'a' ) );

cast(char)'a' ) );

cast(char)'f' ) );





cast(char)'a' ) );

cast(char)'a') );

cast(char)'a') );

cast(char)'f') );









"fff" ) );

"dfeffgfff", "fff" ) );





"abcdefcdef", "c" ) );  

"abcdefcdef", "cd" ) );

"abcdefcdef", "x" ) );

"abcdefcdef", "xy" ) );

"abcdefcdef", "" ) );

"abcdefcdef", "def" ) );




-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 10 2004

"Walter" <newshound digitalmars.com> writes:

That's more like it! Now, can the unit tests also test the case
insensitivity?

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:caa4h3$150f$1 digitaldaemon.com...
 In article <ca7qp4$hrl$1 digitaldaemon.com>, Walter says...
There's no need to .dup the strings. Just have a loop that looks like


this:
for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing


memchr
and memcmp with case insensitive loops, write some unit tests, and you'll


be
there.

 Walter: Ok, per your advice I've copied the original find() / rfind()

functions
 from std.string, and modified them into ifind() / irfind(). I sure hope

these
 will make the grade <g>.




 "" and all the code will be ready to copy and paste.

 Thxs for giving this chance to add something to "D!" :)








 #debug=string; // uncomment to turn on debugging printf's

 #debug(string)




 #import std.string;









 #int ifind




















 #unittest























 #int irfind

















s, c,
 c1, c2);








 #unittest























 #int ifind



































x++ )

 return x;








 #unittest



























 #int irfind(char[] s, char[] sub)



































 return x;







 #unittest




















 #int main()




 cast(char)'a' ) );


"def",
 cast(char)'a' ) );


"abba",
 cast(char)'a' ) );


"def",
 cast(char)'f' ) );






null,
 cast(char)'a' ) );


"def",
 cast(char)'a') );


"abba",
 cast(char)'a') );


"def",
 cast(char)'f') );







"a" ) );


"a" ) );


"f" ) );


"dfefffg",
 "fff" ) );

 "dfeffgfff", "fff" ) );





 "abcdefcdef", "c" ) );

 "abcdefcdef", "cd" ) );

 "abcdefcdef", "x" ) );

 "abcdefcdef", "xy" ) );

 "abcdefcdef", "" ) );

 "abcdefcdef", "def" ) );




 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 10 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <cabh23$6m1$1 digitaldaemon.com>, Walter says...
That's more like it! Now, can the unit tests also test the case
insensitivity?

Walter: Opps! Sorry about that...I've now added in some additional unittest
entrys for all four functions.








#debug=string;		// uncomment to turn on debugging printf's

#debug(string)




#import std.string;









#int ifind





















#unittest





























#int irfind
















c1, c2);








#unittest





























#int ifind



































return x;








#unittest








































#int irfind(char[] s, char[] sub)



































return x;







#unittest





























#int main()























































































-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 11 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <cabh23$6m1$1 digitaldaemon.com>, Walter says...
That's more like it! Now, can the unit tests also test the case
insensitivity?

Walter: Darn, I had to fix just one more thing in the code...I discovered if I
the sub == the s I was giving an -1 instead of a 0. This newest version now
fixes that. 

Thxs!, for giving the chance to write these functions!! :))








#debug=string;		// uncomment to turn on debugging printf's

#debug(string)




#import std.string;









#int ifind





















#unittest





























#int irfind
















c1, c2);








#unittest





























#int ifind



































return x;








#unittest










































#int irfind







































return x;







#unittest































#int main()



























































































-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 12 2004

David L. Davis <SpottedTiger yahoo.com> writes:

Walter: It would sure be nice to also have the ireplace() and icount() functions
added to Phobos...if possible. Anyway, I decided to go ahead and wrote them up
from your replace() and count() as I had done with ifind() and irfind() per your
advice. I sure hope it's ok to bug you just a little for theses kind of updates.
;)

// Note these functions build off the ifind() defined in a previous post.







#char[] ireplace
















































#unittest




















#int icount























#unittest












-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 15 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20

Ah! Now these old ASCII habits really should be dropped. Hauke has written this
magnificent charToUpper() routine. It should be used.

 I feel like a young Skywalker in training, learning how to best use "The
Force!"

Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too
discouraging).  :)

Jill

Jun 09 2004

Chr. Grade <Chr._member pathlink.com> writes:

Haven't read any related postings before, so I don't know if I'm far off with my
reply. Anyway, if you try to convert a lower char to an upper char in a clever
and old-fashioned way, do it like this, young Jedi:

lower to upper -> var &= 0x5F;

Chr. Grade

In article <ca71c4$2b8l$1 digitaldaemon.com>, Arcane Jill says...
In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20

Ah! Now these old ASCII habits really should be dropped. Hauke has written this
magnificent charToUpper() routine. It should be used.

 I feel like a young Skywalker in training, learning how to best use "The
Force!"

Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too
discouraging).  :)

Jill

Jun 09 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

 In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...
 
 
sStr[ iStrPos ] + 0x20

 
 
 Ah! Now these old ASCII habits really should be dropped. Hauke has written this
 magnificent charToUpper() routine. It should be used.

<snip>

Except that that snippet converts upper to lower.

There's always

     sStr[iStrPos] + 'a' - 'A'

which'll work as long as the uppercase alphabet is a constant offset 
from the lowercase alphabet.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 09 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Stewart Gordon wrote:
 Arcane Jill wrote:
 
 In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...


 sStr[ iStrPos ] + 0x20



 Ah! Now these old ASCII habits really should be dropped. Hauke has 
 written this
 magnificent charToUpper() routine. It should be used.

 
 <snip>
 
 Except that that snippet converts upper to lower.

Well, charToLower then.


 There's always
 
     sStr[iStrPos] + 'a' - 'A'
 
 which'll work as long as the uppercase alphabet is a constant offset 
 from the lowercase alphabet.

That is exactly the same as using 0x20 directly since D's character 
literals are always unicode (no codepage stuff involved). So 'a'-'A' 
always equals 0x20.

In any case, in Unicode upper and lower case characters do not have a 
constant offset to each other. That is only true for the ASCII subset.

Hauke

Jun 09 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Hauke Duden wrote:

<snip>
 In any case, in Unicode upper and lower case characters do not have a 
 constant offset to each other. That is only true for the ASCII subset.

Yes, you do have a point there.  What's more, there isn't a 1:1 mapping 
between uppercase and lowercase characters.  And the mappings that there 
are aren't language independent.  So we can't write a single formula 
that'll correctly case-convert all text in all languages.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 09 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Stewart Gordon wrote:
 Hauke Duden wrote:
 
 <snip>
 
 In any case, in Unicode upper and lower case characters do not have a 
 constant offset to each other. That is only true for the ASCII subset.

 
 
 Yes, you do have a point there.  What's more, there isn't a 1:1 mapping 
 between uppercase and lowercase characters.

You're wrong. the Unicode standard defines 1:1 case mappings (see 
http://www.unicode.org/Public/UNIDATA/UCD.html). There is also an 
additional "special casing" with one-to-many mappings but only a handful 
of characters are effected. It would be nice to support that too, but 
for everyday work the 1:1 mappings are usually sufficient.


   And the mappings that there
 are aren't language independent.

Huh? Casing is not effected by locale. Maybe you are thinking about 
collation?

Hauke

Jun 09 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Hauke Duden wrote:

 Stewart Gordon wrote:

<snip>
 Yes, you do have a point there.  What's more, there isn't a 1:1 
 mapping between uppercase and lowercase characters.

 
 You're wrong. the Unicode standard defines 1:1 case mappings (see 
 http://www.unicode.org/Public/UNIDATA/UCD.html).

There seems to be a contradiction here.  That file indicates that 
UnicodeData.txt only contains 1:1 mappings.  But just as I wondered, 
there's a 2:1 mapping in 03C2 and 03C3.

 There is also an additional "special casing" with one-to-many 
 mappings but only a handful of characters are effected. It would be 
 nice to support that too, but for everyday work the 1:1 mappings are 
 usually sufficient.

So, which characters do the one-to-many mappings bring about?

 And the mappings that there are aren't language independent.

 
 Huh? Casing is not effected by locale. Maybe you are thinking about 
 collation?

What do you mean by that?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca7cgp$2svc$1 digitaldaemon.com>, Stewart Gordon says...
There seems to be a contradiction here.  That file indicates that 
UnicodeData.txt only contains 1:1 mappings.  But just as I wondered, 
there's a 2:1 mapping in 03C2 and 03C3.

Look, it's perfectly simple. Everybody's right. And because everybody's right,
everybody's accusing everybody else of being wrong. THERE ARE TWO ANSWERS.

"Simple casing" is a one to mapping from character to character, and is
locale-independent.

"Full casing" is a a one to many mapping from string to string, and is ALMOST
locale independent, but not quite.

Hauke's brilliant library supports simple casing, not full casing. That's why
both the input and the output are characters, not strings.



So, which characters do the one-to-many mappings bring about?

For example, the German character '�' uppercases to "SS" when using full casing,
but it stays as '�' using simple casing.



 And the mappings that there are aren't language independent.

 
 Huh? Casing is not effected by locale. Maybe you are thinking about 
 collation?

What do you mean by that?

Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
Lithuanian and Azeri. In principle, other exceptions could be added in the
future. Simple casing is completely locale independent.

Collation is a different kettle of fish, and we currently have no libraries to
support it.

Arcane Jill

Jun 09 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
 Lithuanian and Azeri. In principle, other exceptions could be added in the
 future. Simple casing is completely locale independent.

Ouch. I didn't know that. Makes me feel happy that I stayed away from 
the special casings up to now ;).

Thanks for clearing up the misunderstanding!

Hauke

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca7lhq$9eo$1 digitaldaemon.com>, Hauke Duden says...
Arcane Jill wrote:
 Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
 Lithuanian and Azeri. In principle, other exceptions could be added in the
 future. Simple casing is completely locale independent.

Ouch. I didn't know that. Makes me feel happy that I stayed away from 
the special casings up to now ;).

Thanks for clearing up the misunderstanding!

Hauke


Actually, I think I'd quite like to have a bash at writing some of the Unicode
algorithms. I've finished the Int class now (Well, almost. I've just got to add
a little bit of memory management, but I know how to do that now). After that, I
was planning to move onto the next bit of my crypto lib (random numbers). But -
the Unicode functions would be relatively quick to write (compared with the
crypto stuff), and it would be quite nice to have a break and do something else
for a change.

If I do that, I'll need to collaborate with you, Hauke. There's no point in
duplicating effort, and we could do with a common format for the compiled
unicode data files. In a way, that's YOUR area of expertize, not mine, because
you seemed to know that >>9 was more efficient than [n], something I wouldn't
have known. Also, we mustn't forget the UPR format I mentioned, which has the
benefits of being binary, easily parsable, extendable, publicly available, open
source, and easily updateable with each new version of Unicode.

I could do normalization functions first - canonical/compatibility equivalence;
finding glyph boundaries, that sort of thing. But I don't want to be treading on
your toes, which I would be if I went and invented a new format for the compiled
data. So I don't want to do that without collaboration.

My concern is that you probably only compiled in enough information to do simple
casing, so I wouldn't be able to extract normalization/boundary information from
your compiled format. (But I'm guessing, as I haven't studied your source code
in depth).

I think it would be great if a D standard library had FULL Unicode support. Even
C++ and Java don't do that. (And that's not even mentioning Java's crippled
16-bit chars). It would effectively turn D into the language of choice for
Unicode apps.

Anyway, let me know what you think (and check out UPR, if you have time. The URL
is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format
itself documented at
http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we
could invent our own, but why re-invent the wheel?).

Arcane Jill

Jun 10 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 Actually, I think I'd quite like to have a bash at writing some of the Unicode
 algorithms. I've finished the Int class now (Well, almost. I've just got to add
 a little bit of memory management, but I know how to do that now). After that,
I
 was planning to move onto the next bit of my crypto lib (random numbers). But -
 the Unicode functions would be relatively quick to write (compared with the
 crypto stuff), and it would be quite nice to have a break and do something else
 for a change.
 
 If I do that, I'll need to collaborate with you, Hauke. There's no point in
 duplicating effort, and we could do with a common format for the compiled
 unicode data files. In a way, that's YOUR area of expertize, not mine, because
 you seemed to know that >>9 was more efficient than [n], something I wouldn't
 have known. Also, we mustn't forget the UPR format I mentioned, which has the
 benefits of being binary, easily parsable, extendable, publicly available, open
 source, and easily updateable with each new version of Unicode.
 
 I could do normalization functions first - canonical/compatibility equivalence;
 finding glyph boundaries, that sort of thing. But I don't want to be treading
on
 your toes, which I would be if I went and invented a new format for the
compiled
 data. So I don't want to do that without collaboration.

I think it is a good idea to coordinate our Unicode efforts. I haven't 
written any code for normalization, so this would be really useful.

What I have worked on lately (when I found a few minutes of spare time) 
is a string interface that abstracts from the specific encoding plus 
implementations for some common encodings (UTF-X, Latin-1, ASCII, system 
code page). It includes the usual string functions like comparing 
(characters ordered by index - no collation), searching, concatenation, 
etc. Caseless comparison and searching is also implemented (using the 
simple lower mapping - no full case folding).

So if you write normalization routines that would be great!

 My concern is that you probably only compiled in enough information to do
simple
 casing, so I wouldn't be able to extract normalization/boundary information
from
 your compiled format. (But I'm guessing, as I haven't studied your source code
 in depth).

That's true - the unichar data only contains the case mappings and 
character type info. I think it is important to separate the different 
Unicode tables, so that using a single Unicode routine won't cause ALL 
the data to be linked into the program.

 I think it would be great if a D standard library had FULL Unicode support.
Even
 C++ and Java don't do that. (And that's not even mentioning Java's crippled
 16-bit chars). It would effectively turn D into the language of choice for
 Unicode apps.

I agree - that is my goal as well. In fact I see it as an opportunity to 
influence the language in its early stages so that it will have 
standardized(!) Unicode support. It prevents every component developer 
from implementing his own, which can cause lots of unnecessary bloat 
(Unicode data isn't small...).

 Anyway, let me know what you think (and check out UPR, if you have time. The
URL
 is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format
 itself documented at
 http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we
 could invent our own, but why re-invent the wheel?).

I haven't had time to look at it yet, but I promise to do so ;).


Hauke

Jun 10 2004

"Walter" <newshound digitalmars.com> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:ca9csf$2ujf$1 digitaldaemon.com...
 Arcane Jill wrote:
 I think it would be great if a D standard library had FULL Unicode


support. Even
 C++ and Java don't do that. (And that's not even mentioning Java's


crippled
 16-bit chars). It would effectively turn D into the language of choice


for
 Unicode apps.

 I agree - that is my goal as well. In fact I see it as an opportunity to
 influence the language in its early stages so that it will have
 standardized(!) Unicode support. It prevents every component developer
 from implementing his own, which can cause lots of unnecessary bloat
 (Unicode data isn't small...).

I agree too, and am glad you two are taking the lead on it.

Jun 10 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 Anyway, let me know what you think (and check out UPR, if you have time. The
URL
 is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format
 itself documented at
 http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we
 could invent our own, but why re-invent the wheel?).

Ok, I finally got around to looking at it. It seems that UPR simply 
defines a binary representation that contains the normal Unicode data 
files in a more organized, easier-to-access way.

But the data does not seem to be compressed at all (please correct me if 
I have missed something). Also, each entry can be 1,2 or 4 bytes in 
size, but 3 bytes is actually the most size-efficient representation for 
uncompressed Unicode code points.

I fear that UPR doesn't quite cut it for data that is compiled 
statically into executables. After all, we don't want a "hello world" 
program to be several megabytes in size, right? That'd only cause people 
to ignore Unicode even more than they do now.

Hauke

Jun 12 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <caep38$1vfr$1 digitaldaemon.com>, Hauke Duden says...
Ok, I finally got around to looking at it. It seems that UPR simply 
defines a binary representation that contains the normal Unicode data 
files in a more organized, easier-to-access way.

Yeah, I stand corrected. The format isn't useful to us. I thought it would be,
from reading the blurb, but it's just as easy for us to ignore it. I say we
forget UPR then.

I've got some ideas, but it's too early in the morning for me right now. Will
get back to you later when I've woken up a bit.

Jill

Jun 12 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cagrbo$1rlb$1 digitaldaemon.com>, Arcane Jill says...

I've got some ideas, but it's too early in the morning for me right now. Will
get back to you later when I've woken up a bit.

I'm a bit more awake now. The approach that I took when I had to do this sort of
thing for my employer some years ago turned out to be wrong, in hindsight. But I
learn from my mistakes. Back then, I had a tool to parse the Unicode database
files (no problem there) into a custom format binary file, which then got turned
into a byte array and stuck into the source code. It was a bad idea because my
format was not extendable, and it included only those parts of the database
which I actually needed. For this project, we need all of it.

I have an approach now which will work, and I've got some tests up and running
to prove that it works. It's a two-stage process. In stage one, a function I
wrote parses the Unicode database files and produces some HUMUNGOUSLY large
binary files, containing every scrap of information there is on Unicode, in an
easily accessible form. Then a second phase function comes along and uses those
large files as input, creating as its output - D source files. These end up
quite small, because the function figures out the best way to pack the data. The
source files created declare const lookup tables (using your 12-bit/9-bit split
with duplicate tables removed).

This approach leaves each Unicode property having its own independent source
file(s). Since each source file will become a single .obj file, when you link
with the library, you will only get the data for those properties you actually
need. If you never call a function to get the bidi-combining-class, for example,
then that function, and the data to support it, won't even get linked in.

And the beauty of this approach is that it is completely extendable to all
Unicode properties in all of their files.

Oh, and there's another good thing too. Since the source code writing is
automated, it follows that variations of lookup algorithms has just happen
automagically. For example, the isASCIIHexDigit() property can be implemented
very efficiently with only a tiny amount of data and a slightly modified lookup
function. The source-code-writing tool could figure that out and use the smaller
data lookup.

I haven't started on any actual Unicode /algorithms/ yet - just getting the
fast+small property lookups working was quite a challenge.

So it's going well. If you want to email me privately to consult, you can head
over to dsource and post me a message there. This is going to be fun!

Jill

Jun 13 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 In article <cagrbo$1rlb$1 digitaldaemon.com>, Arcane Jill says...
 
 
I've got some ideas, but it's too early in the morning for me right now. Will
get back to you later when I've woken up a bit.

 
 
 I'm a bit more awake now. The approach that I took when I had to do this sort
of
 thing for my employer some years ago turned out to be wrong, in hindsight. But
I
 learn from my mistakes. Back then, I had a tool to parse the Unicode database
 files (no problem there) into a custom format binary file, which then got
turned
 into a byte array and stuck into the source code. It was a bad idea because my
 format was not extendable, and it included only those parts of the database
 which I actually needed. For this project, we need all of it.
 
 I have an approach now which will work, and I've got some tests up and running
 to prove that it works. It's a two-stage process. In stage one, a function I
 wrote parses the Unicode database files and produces some HUMUNGOUSLY large
 binary files, containing every scrap of information there is on Unicode, in an
 easily accessible form. Then a second phase function comes along and uses those
 large files as input, creating as its output - D source files. These end up
 quite small, because the function figures out the best way to pack the data.
The
 source files created declare const lookup tables (using your 12-bit/9-bit split
 with duplicate tables removed).
 
 This approach leaves each Unicode property having its own independent source
 file(s). Since each source file will become a single .obj file, when you link
 with the library, you will only get the data for those properties you actually
 need. If you never call a function to get the bidi-combining-class, for
example,
 then that function, and the data to support it, won't even get linked in.
 
 And the beauty of this approach is that it is completely extendable to all
 Unicode properties in all of their files.
 
 Oh, and there's another good thing too. Since the source code writing is
 automated, it follows that variations of lookup algorithms has just happen
 automagically. For example, the isASCIIHexDigit() property can be implemented
 very efficiently with only a tiny amount of data and a slightly modified lookup
 function. The source-code-writing tool could figure that out and use the
smaller
 data lookup.
 
 I haven't started on any actual Unicode /algorithms/ yet - just getting the
 fast+small property lookups working was quite a challenge.
 
 So it's going well. If you want to email me privately to consult, you can head
 over to dsource and post me a message there. This is going to be fun!


It sounds like you're really excited about this one ;). Your ideas sound 
good as well, but some comments:

- the optimal page size is different for each Unicode property. For 
example, in the new unichar module uses 128 elements per page for the 
character types and 512 elements per page for the mapping tables.
It would be optimal if the data creation tool would automatically figure 
out the best size and include the corresponding constants in the source 
file. A simple brute-force try-them-all approach should suffice, since 
there really are only about half a dozen realistic page sizes (they need 
to be a power of 2).

- a simple RLE compression of the final data that is compiled into the 
executable has proven to be very effective, since Unicode data usually 
contains lots of big gaps and ranges with the same properties. This 
dramatically reduces the size of the compiled executable.

- if you have multiple properties of the same type it can be a huge 
space saver to use the same "page pool" when decomposing the data into 
small pages. This worked well in unichar, which now has a single page 
pool for the lower, upper and title mappings. The combined pool data is 
only slightly larger than the pool for a single mapping.

- I'm not convinced that the lookup tables and algorithms should be 
created in a completely automatic way. A certain level of automation is 
obviously necessary, but I think it would pay off if the data can be 
filtered before it is decomposed into the lookup tables. The algorithms 
for accessing those tables would have to be adaptable too. Another 
example from the unichar module: the case mapping data is now stored as 
offsets relative to the original character index. This has two 
advantages. Number one is that the biggest offset fits very comfortably 
into 2 bytes, so we save one byte per element. The second advantage is 
that this dramatically increases the number of pages with the same 
contents, so the page pool ends up being a lot smaller.

- and my last concern: it seems that you want to develop a very general 
tool to implement every aspect of the Unicode standard. That is very 
comendable and nothing is wrong with it in itself, but I would advice 
you to reflect on the amount of work that is necessary to implement all 
that stuff. I have no idea how much time you can put into this project, 
but I know that my own time is unfortunately very limited. If you are in 
a similar situation it may be wise to tune the goals down a bit and 
progress in smaller steps, implementing one Unicode algorithm at a time. 
There is not much use in a full Unicode library that ends up being 
vaporware.
Then again if you DO have the time, please do not let my skepticism 
dampen your enthusiasm ;).


Hauke

P.S.: I haven't found any Unicode-related project on dsource.org. What 
were you referring to when you said I can contact you there?

Jun 13 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <caigii$15jj$1 digitaldaemon.com>, Hauke Duden says...

It would be optimal if the data creation tool would automatically figure 
out the best size and include the corresponding constants in the source 
file.

That's what I figured.

- I'm not convinced that the lookup tables and algorithms should be 
created in a completely automatic way. A certain level of automation is 
obviously necessary, but I think it would pay off if the data can be 
filtered before it is decomposed into the lookup tables.

I'd thought of that.

The algorithms 
for accessing those tables would have to be adaptable too. Another 
example from the unichar module: the case mapping data is now stored as 
offsets relative to the original character index.

Did that too. And I reduced the titlecase mapping down to almost nothing by
subtracting it from the uppercase mapping.


- and my last concern: it seems that you want to develop a very general 
tool to implement every aspect of the Unicode standard. That is very 
comendable and nothing is wrong with it in itself, but I would advice 
you to reflect on the amount of work that is necessary to implement all 
that stuff.

Panick ye not. I'm just thinking ahead. For the moment, it's really just
property access I'm doing, then come the normalization algorithms. And then I'll
stop, and go and redo Ints a bit better. I do have an idea of how much work is
involved, but I've done this before (in C++, and less well) so I know what's
involved.


There is not much use in a full Unicode library that ends up being 
vaporware.

Well it can't ever be that if it's open source. If you or I get bored with it
and drop out, someone else can carry on.


Then again if you DO have the time, please do not let my skepticism 
dampen your enthusiasm ;).

Cool.


P.S.: I haven't found any Unicode-related project on dsource.org. What 
were you referring to when you said I can contact you there?

Ah, no, there isn't any. But I have a user account there, and the Deimos
project. My username is "Arcane Jill". It seems to be possible to send private
messages to members. I mentioned that as a possiblity because I am reluctant to
post my email address on a public forum.

Jill

Jun 13 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
- and my last concern: it seems that you want to develop a very general 
tool to implement every aspect of the Unicode standard. That is very 
comendable and nothing is wrong with it in itself, but I would advice 
you to reflect on the amount of work that is necessary to implement all 
that stuff.

 
 
 Panick ye not. I'm just thinking ahead. For the moment, it's really just
 property access I'm doing, then come the normalization algorithms. And then
I'll
 stop, and go and redo Ints a bit better. I do have an idea of how much work is
 involved, but I've done this before (in C++, and less well) so I know what's
 involved.

Me, panicking? No chance ;).
Just wanted to make sure you know about the scope of this.


There is not much use in a full Unicode library that ends up being 
vaporware.

 
 
 Well it can't ever be that if it's open source. If you or I get bored with it
 and drop out, someone else can carry on.

With luck someone might. But it is an old story in open source projects 
that there are lots of initial enthusiasts, but very little people who 
have enough endurance to stay active over a longer period of time. And 
there's also the question of experience and skill...

I think it is prudent to not count on external help to magically show 
up. With some projects it does, with many it doesn't. So we should make 
sure that we don't bite off more than we can chew.

Then again if you DO have the time, please do not let my skepticism 
dampen your enthusiasm ;).

 
 
 Cool.
 
 
 
P.S.: I haven't found any Unicode-related project on dsource.org. What 
were you referring to when you said I can contact you there?

 
 
 Ah, no, there isn't any. But I have a user account there, and the Deimos
 project. My username is "Arcane Jill". It seems to be possible to send private
 messages to members. I mentioned that as a possiblity because I am reluctant to
 post my email address on a public forum.

Ah, ok. So the reply address you use with your NG posts 
(Arcane_member ...) is invalid? Anyway, mine is not, so if you need to 
contact me...

Hauke

Jun 13 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Stewart Gordon wrote:
 Hauke Duden wrote:
 
 Stewart Gordon wrote:

 
 <snip>
 
 Yes, you do have a point there.  What's more, there isn't a 1:1 
 mapping between uppercase and lowercase characters.


 You're wrong. the Unicode standard defines 1:1 case mappings (see 
 http://www.unicode.org/Public/UNIDATA/UCD.html).

 
 
 There seems to be a contradiction here.  That file indicates that 
 UnicodeData.txt only contains 1:1 mappings.  But just as I wondered, 
 there's a 2:1 mapping in 03C2 and 03C3.

Where did you get that information? From the data file
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt:

03C2;GREEK SMALL LETTER FINAL SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3
03C3;GREEK SMALL LETTER SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3

The interesting entries are the last three. Their format is 
UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 
03A3 and no lower mapping.

  >> There is also an additional "special casing" with one-to-many mappings
 but only a handful of characters are effected. It would be nice to 
 support that too, but for everyday work the 1:1 mappings are usually 
 sufficient.

 
 
 So, which characters do the one-to-many mappings bring about?

An example of a character with special casing is 1FB2 (GREEK SMALL 
LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI). Its upper case maps to 1FBA 
+ 0399 (GREEK CAPITAL LETTER ALPHA WITH VARIA + GREEK CAPITAL LETTER IOTA).

 And the mappings that there are aren't language independent.


 Huh? Casing is not effected by locale. Maybe you are thinking about 
 collation?

 
 
 What do you mean by that?

Collation is a locale dependent comparison of strings. I.e. it defines 
the "phone book" ordering of strings in a particular language.

Hauke

Jun 09 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Hauke Duden wrote:
<snip>
 The interesting entries are the last three. Their format is 
 UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 
 03A3 and no lower mapping.

So that's why the uppercase form is given twice.  I couldn't find a key 
to the columns anywhere.

<snip>
 So, which characters do the one-to-many mappings bring about?

 
 An example of a character with special casing is 1FB2 (GREEK SMALL 
 LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI). Its upper case maps to 1FBA 
 + 0399 (GREEK CAPITAL LETTER ALPHA WITH VARIA + GREEK CAPITAL LETTER IOTA).

Yes, there are two possible meanings of "one-to-many mapping".  A letter 
  splitting into two letters when the case is changed.  Or a letter that 
case-converts to different letters depending on context, like Greek sigma.

How does it handle the title case of Welsh digraphs, for example?  Or is 
that another localised exception yet to be written into the standard?

 And the mappings that there are aren't language independent.

 Huh? Casing is not effected by locale. Maybe you are thinking about 
 collation?

 What do you mean by that?

 
 Collation is a locale dependent comparison of strings. I.e. it defines 
 the "phone book" ordering of strings in a particular language.

I didn't think that had anything to do with the fact that, e.g. in 
Turkish, the uppercase form of 0069 is 0130 instead of 0049.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 10 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Stewart Gordon wrote:
 Hauke Duden wrote:
 <snip>
 
 The interesting entries are the last three. Their format is 
 UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 
 03A3 and no lower mapping.

 
 
 So that's why the uppercase form is given twice.  I couldn't find a key 
 to the columns anywhere.

The file format is described here:
http://www.unicode.org/Public/UNIDATA/UCD.html#UCD_Files
(see the section about UnicodeData.txt)

 How does it handle the title case of Welsh digraphs, for example?  Or is 
 that another localised exception yet to be written into the standard?

If you know the index of that character you can look it up in this file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt


Hauke

Jun 10 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Hauke Duden wrote:

<snip>
 The file format is described here:
 http://www.unicode.org/Public/UNIDATA/UCD.html#UCD_Files
 (see the section about UnicodeData.txt)

The first column has been omitted from that list.

 How does it handle the title case of Welsh digraphs, for example?  Or 
 is that another localised exception yet to be written into the standard?

 
 If you know the index of that character you can look it up in this file:
 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

No, Welsh digraphs isn't a character.  They are various single letters 
each composed of two characters.  CH, DD, FF, LL, NG, RH, TH (have I 
missed one?) are all single letters in Welsh, but AFAICF each doesn't 
have its own Unicode character, leaving the regular Latin letters in the 
ranges 0043..0054 and 0063..0074 to be combined to make them.

FWIS these digraphs are title-cased together, e.g. properly 
LLanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch not
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch

OST, looking at the Welsh Wikipedia, they generally seem to be written 
in mixed case.  Who is right?  Or is it a matter of preference?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 10 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca9gkh$2f2$1 digitaldaemon.com>, Stewart Gordon says...
 How does it handle the title case of Welsh digraphs, for example?  Or 
 is that another localised exception yet to be written into the standard?



I know nothing about Welsh, but, I do know that Welsh is NOT an exception
according to the rules of Unicode. Therefore, according to the rules:

Lowercase:  llan...
Titlecase:  Llan...
Uppercase:  LLAN...

Now, what I'm about to say may possibly make me a little unpopular. I /hope/
not, but I wish to be accurate, and, well, if what can I say? Please don't shoot
the messenger! The fact is, /if/ Unicode has got it wrong, then the place to
complain about it is the Unicode Consortium public forum at
http://www.unicode.org/consortium/distlist.html - NOT the D forum. Our job is to
implement the Unicode standard as it exists today at revision 4.1 - even if we
think that standard is wrong. It would be inappropriate for us to start tweaking
it here and there just because we don't like bits of it. Errors and omissions in
the standard are certainly possible (and even likely), but a standard is a
standard, and such errors will inevitably be fixed in the course of time. If and
when the standard changes, that's when we should change with it.

I mean - it's not like C++ or Java can do any better!

My sincere apologies in advance if I've offended anyone.
Arcane Jill

Jun 10 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca9if2$5ag$1 digitaldaemon.com...
 Now, what I'm about to say may possibly make me a little unpopular. I

/hope/
 not, but I wish to be accurate, and, well, if what can I say? Please don't

shoot
 the messenger! The fact is, /if/ Unicode has got it wrong, then the place

to
 complain about it is the Unicode Consortium public forum at
 http://www.unicode.org/consortium/distlist.html - NOT the D forum. Our job

is to
 implement the Unicode standard as it exists today at revision 4.1 - even

if we
 think that standard is wrong. It would be inappropriate for us to start

tweaking
 it here and there just because we don't like bits of it. Errors and

omissions in
 the standard are certainly possible (and even likely), but a standard is a
 standard, and such errors will inevitably be fixed in the course of time.

If and
 when the standard changes, that's when we should change with it.

My experience with implementing Standards is that the right way is to shut
off one's brain and pedantically, exactly, implement it, right or wrong. All
trying to fix bugs in the Standards does is cause "your implementation is
different from the Standard, therefore you are wrong" bug reports. And to be
frank, they're right.

I agree with you, Jill.

Jun 10 2004

David L. Davis <SpottedTiger yahoo.com> writes:

In article <ca71c4$2b8l$1 digitaldaemon.com>, Arcane Jill says...
In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20

Ah! Now these old ASCII habits really should be dropped. Hauke has written this
magnificent charToUpper() routine. It should be used.

 I feel like a young Skywalker in training, learning how to best use "The
Force!"

Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too
discouraging).  :)

Jill

Jill: Don't sweat it, all your advice has been encouraging! :) If I wasn't
getting any feedback at all from anyone, now that would be "discouraging" in my
mind...again thxs for your advice. 

Afterall, if these functions meet Walter and the "D" forum's approval, they just
might become a part of the std.string for everyone to use. After work, I'll
check out Hauke's charToLower() function, and see what kind of requirements it
has. And if it looks like a good fix, I'll ask Hauke if I may use it...giving
him full credit for his work of course. :)

David

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"

Jun 09 2004

Charlie <Charlie_member pathlink.com> writes:

In article <ca2u66$1lui$1 digitaldaemon.com>, David L. Davis says...
In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html

Walter: Thxs! For the "Added default arguments to function parameters." :)) Now
I can pull out all my wrapper functions...this is some really Great News!!

<*Wonders*> To you think Phobos.std.string could get a non-case sensitive
version of find (ifind) and rfind (irfind) added to it sometime in the near
future? It would be very useful (even if it just does ASCII). Thxs for your
reply in advance. :)

Why not tolower() both of the strings ?

Jun 07 2004

How to compare case in Unicode <How_member pathlink.com> writes:

In article <ca3asj$2a3o$1 digitaldaemon.com>, Charlie says...

Why not tolower() both of the strings ?

That won't cover all cases in Unicode, but there is a similar function designed
for just that purpose. It's called casefold(). In general, two normalized
strings a and b are considered case-insensitively-equal iff casefold(a) ==
casefold(b).

I just had a quick look through Hauke's unichar code and noticed, however, that
charToCasefold() is not present? Hauke - any reason why you missed that one out?
Did you assume it to be the same thing as charToLower()? It isn't, of course.

Jill

Jun 07 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca3asj$2a3o$1 digitaldaemon.com>, Charlie says...

Why not tolower() both of the strings ?

That won't cover all cases in Unicode, but there is a similar function designed
for just that purpose. It's called casefold(). In general, two normalized
strings a and b are considered case-insensitively-equal iff casefold(a) ==
casefold(b).

I just had a quick look through Hauke's unichar code and noticed, however, that
charToCasefold() is not present? Hauke - any reason why you missed that one out?
Did you assume it to be the same thing as charToLower()? It isn't, of course.

Jill

Jun 07 2004

Arcane Jill <Arcane_member pathlink.com> writes:

A truly brilliant release. WOW! Thanks Walter.


I just need to reply to this:

<*Wonders*> Do you think Phobos.std.string could get a non-case sensitive
version of find (ifind) and rfind (irfind) added to it sometime in the near
future? It would be very useful (even if it just does ASCII). Thxs for your
reply in advance. :)

Now that we have Hauke's Unichar stuff, we can do better than ASCII, we can do
the whole of Unicode.

However - as a TEMPORARY MEASURE - we should do case-comparison on a
character-by-character basis, just like you do for ASCII. This will get it right
for something 99% of all a cases. (To catch the remaining cases we'd need the
Unicode normalization and case folding algorithms, which no-one's implemented
yet, but which we will have one day).

So long as we document that D's case-comparison rules CURRENTLY use simple
casing instead of special casing, and do not YET handle normalization issues,
no-one is going to complain, and - as you say - the ability to do
case-insensitive stuff is very useful.

Jill

Jun 07 2004

"Jeroen van Bemmel" <someone somewhere.com> writes:

Binary operator overloading: for consistency, I would suggest replacing 
opCmp() with opLt(), opLe(), etc and to split opEquals() into opEquals() and 
opNotEquals() (or is the latter a typo in the documentation?)

Jun 07 2004

Ant <duitoolkit yahoo.ca> writes:

On Tue, 08 Jun 2004 08:01:35 +0200, Jeroen van Bemmel wrote:

 Binary operator overloading: for consistency, I would suggest replacing 
 opCmp() with opLt(), opLe(), etc and to split opEquals() into opEquals() and 
 opNotEquals() (or is the latter a typo in the documentation?)

I think this was discussed before.
I think the rational is on the web documentation

Ant

Jun 07 2004

"Ivan Senji" <ivan.senji public.srce.hr> writes:

	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

"Walter" <newshound digitalmars.com> wrote in message =
news:ca2nau$1ath$1 digitaldaemon.com...
 It's now possible to do assymmetrical operator overloads with =

commutative
 operators like +.
=20
 And it's now possible to create a << stream operator overloading in D. =

Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions =

or
 needing ADL, either!).
=20
 http://www.digitalmars.com/d/changelog.html
=20
=20

WOW! Walter you are really making it harder and harder to complain about
the language!!! :)

Just a little question:
a.. The Expression within an array's brackets is now an AssignExpression =
(meaning that commas are no longer allowed).=20

Could this mean that rectangular arrays are coming in some future? :)

Jun 08 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Ivan Senji wrote:

<snip>
 Just a little question:

 (meaning that commas are no longer allowed). /
  
 Could this mean that rectangular arrays are coming in some future? :)

Maybe.  It could also clear up what's FWIS a common coding error.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I want to at least make rectangular arrays possible with opIndex().
  "Ivan Senji" <ivan.senji public.srce.hr> wrote in message =
news:ca3ps0$3am$3 digitaldaemon.com...
  WOW! Walter you are really making it harder and harder to complain =
about
  the language!!! :)

  Just a little question:
  a.. The Expression within an array's brackets is now an =
AssignExpression (meaning that commas are no longer allowed).=20

  Could this mean that rectangular arrays are coming in some future? :)

Jun 08 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Walter wrote:
 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
 
 http://www.digitalmars.com/d/changelog.html

Default function arguments! Yay! Thanks Walter!

I can now delete about 1/3 of my functions :) :)

Hauke

Jun 08 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).

Kris's dsc.io project did that all along, so what's new?

And any particular reason for not inventing a whole new operator or two 
for stream I/O, as I briefly suggested?

http://www.digitalmars.com/drn-bin/wwwnews?D/25096


And have you actually seen my functions to fix bit array slicing?

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 08 2004

Ant <Ant_member pathlink.com> writes:

In article <ca459b$pbu$1 digitaldaemon.com>, Stewart Gordon says...
Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).

Kris's dsc.io project did that all along, so what's new?

And any particular reason for not inventing a whole new operator or two 
for stream I/O, as I briefly suggested?

http://www.digitalmars.com/drn-bin/wwwnews?D/25096


And have you actually seen my functions to fix bit array slicing?

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

Give him a break, he said before he can't cope with every thing
that is offered here.

Ant

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...
 Walter wrote:

 It's now possible to do assymmetrical operator overloads with


commutative
 operators like +.

 And it's now possible to create a << stream operator overloading in D.


Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions


or
 needing ADL, either!).

 Kris's dsc.io project did that all along, so what's new?

 And any particular reason for not inventing a whole new operator or two
 for stream I/O, as I briefly suggested?

 http://www.digitalmars.com/drn-bin/wwwnews?D/25096

I still think there's got to be a better way.

 And have you actually seen my functions to fix bit array slicing?

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

It fixes it by making a copy. I'm not sure this is the right approach, since
all the other array slicing points to the original.

Jun 08 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter wrote:

<snip>
 And have you actually seen my functions to fix bit array slicing?

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313

 
 
 It fixes it by making a copy. I'm not sure this is the right approach, since
 all the other array slicing points to the original.

Well, just after I posted it, there was a bit of a debate over this and 
the alternative: modifying the representation of a bit[] to include a 
bit offset.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524

But you can't please all of the people (or programs) all of the time. 
Copying would break generic programming as far as slices would no longer 
be necessarily into the original array.  (Obviously, this discrepancy 
would need to be clearly documented.)  But it wouldn't break any 
existing, working programs, since bit slicing doesn't work at this time. 
  (Assuming that the GDC crowd haven't created their own fix....)

OTOH, code that casts bit arrays to pointers would fall apart if we 
introduced bit offsets.  Maybe we'd need to either disallow such casts 
or allow them only along with some 'byteAlign' property that returns 
either the original array (if already byte-aligned) or a copy.

Maybe we could start a vote.  Put me down as a 'don't know'....

Even if we do take the bit offset path, it wouldn't be tricky to adapt 
my functions to this representation.  Of course, we'd need access to the 
internals.  Maybe (as I think I've seen in some of the internal modules 
dealing with general arrays) they'd be modified to take a struct 
representing the internal representation of a bit[], rather than the 
bit[] itself.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

My thought as well was to include a starting bit offset. The downside to
this is the performance loss.

"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca4sd9$22j1$1 digitaldaemon.com...
 Well, just after I posted it, there was a bit of a debate over this and
 the alternative: modifying the representation of a bit[] to include a
 bit offset.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524

 But you can't please all of the people (or programs) all of the time.
 Copying would break generic programming as far as slices would no longer
 be necessarily into the original array.  (Obviously, this discrepancy
 would need to be clearly documented.)  But it wouldn't break any
 existing, working programs, since bit slicing doesn't work at this time.
   (Assuming that the GDC crowd haven't created their own fix....)

 OTOH, code that casts bit arrays to pointers would fall apart if we
 introduced bit offsets.  Maybe we'd need to either disallow such casts
 or allow them only along with some 'byteAlign' property that returns
 either the original array (if already byte-aligned) or a copy.

 Maybe we could start a vote.  Put me down as a 'don't know'....

 Even if we do take the bit offset path, it wouldn't be tricky to adapt
 my functions to this representation.  Of course, we'd need access to the
 internals.  Maybe (as I think I've seen in some of the internal modules
 dealing with general arrays) they'd be modified to take a struct
 representing the internal representation of a bit[], rather than the
 bit[] itself.

 Stewart.

 -- 
 My e-mail is valid but not my primary mailbox, aside from its being the
 unfortunate victim of intensive mail-bombing at the moment.  Please keep
 replies on the 'group where everyone may benefit.

Jun 08 2004

"Ivan Senji" <ivan.senji public.srce.hr> writes:

"Walter" <newshound digitalmars.com> wrote in message
news:ca54d7$2gmg$1 digitaldaemon.com...
 My thought as well was to include a starting bit offset. The downside to
 this is the performance loss.

But if bit slicing would work then, then it isn't a loss but a gain!

 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
 news:ca4sd9$22j1$1 digitaldaemon.com...
 Well, just after I posted it, there was a bit of a debate over this and
 the alternative: modifying the representation of a bit[] to include a
 bit offset.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524

 But you can't please all of the people (or programs) all of the time.
 Copying would break generic programming as far as slices would no longer
 be necessarily into the original array.  (Obviously, this discrepancy
 would need to be clearly documented.)  But it wouldn't break any
 existing, working programs, since bit slicing doesn't work at this time.
   (Assuming that the GDC crowd haven't created their own fix....)

 OTOH, code that casts bit arrays to pointers would fall apart if we
 introduced bit offsets.  Maybe we'd need to either disallow such casts
 or allow them only along with some 'byteAlign' property that returns
 either the original array (if already byte-aligned) or a copy.

 Maybe we could start a vote.  Put me down as a 'don't know'....

 Even if we do take the bit offset path, it wouldn't be tricky to adapt
 my functions to this representation.  Of course, we'd need access to the
 internals.  Maybe (as I think I've seen in some of the internal modules
 dealing with general arrays) they'd be modified to take a struct
 representing the internal representation of a bit[], rather than the
 bit[] itself.

 Stewart.

 --
 My e-mail is valid but not my primary mailbox, aside from its being the
 unfortunate victim of intensive mail-bombing at the moment.  Please keep
 replies on the 'group where everyone may benefit.

Jun 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca55jc$2ika$1 digitaldaemon.com>, Ivan Senji says...
"Walter" <newshound digitalmars.com> wrote in message
news:ca54d7$2gmg$1 digitaldaemon.com...
 My thought as well was to include a starting bit offset. The downside to
 this is the performance loss.

But if bit slicing would work then, then it isn't a loss but a gain!

I don't think Walter will have any problem getting bit slicing to work. Both
Stewart and myself have our own workarounds (his by copy, mine by reference), so
it's obviously easily doable.

The bit SLICE (or bit array, depending on your point of view) was never a
problem (apart from the bugs). The bit /itself/ is the problem. Walter's
suggestion will make bit slicing work, but the code below will still fall over:

       bit[] b;
       b.length = 64;
       bit* p = &b[3];
       *p = 1;

Anywhere where you get a pointer to a bit, or a reference to a bit (and this
includes passing a bit as an out or inout function parameter) you get a problem.
However, if Walter could make all of these situations compile-errors, he may
have got it sussed!

Jill

Jun 08 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 The bit SLICE (or bit array, depending on your point of view) was never a
 problem (apart from the bugs). The bit /itself/ is the problem. Walter's
 suggestion will make bit slicing work, but the code below will still fall over:
 
 
      bit[] b;
      b.length = 64;
      bit* p = &b[3];
      *p = 1;

 
 
 Anywhere where you get a pointer to a bit, or a reference to a bit (and this
 includes passing a bit as an out or inout function parameter) you get a
problem.

I thought that inout bits were already not supported.  Unless that's 
only in foreach....

 However, if Walter could make all of these situations compile-errors, he may
 have got it sussed!

Or have a bit offset in the bit pointer itself.  Which would turn it 
into a 35-bit object....

Of course, it could be 64-bit, in the form (byteAddress << 32) | 
(bitOffset << 29), which would make incrementing it a doddle....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca7qp5$hrl$2 digitaldaemon.com>, Walter says...
Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.

That's EXACTLY what my workaround does. You can have the code for free if you
want.

Jill

Jun 09 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7ri6$iuo$1 digitaldaemon.com...
 In article <ca7qp5$hrl$2 digitaldaemon.com>, Walter says...
Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.

 That's EXACTLY what my workaround does. You can have the code for free if

you
 want.

Thanks!

Jun 10 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cabhl0$7f2$2 digitaldaemon.com>, Walter says...
 That's EXACTLY what my workaround does. You can have the code for free if

you
 want.

Thanks!

Oky doke - here goes. One thing though - this is merely a workaround for
existing bugs, it does not really add any new functionality beyond what such
arrays are supposed to do already. So I don't imagine you will use this code.
You'd probably prefer to just fix the bugs, then a workaround won't be needed at
all.

module etc.workaround.bitslice;

/*    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    This is a workaround for the bug whereby:

        // given a non-null bit[] b;
        b[i..j]

    references the wrong data, often causing an access violation.

    Usage:    Replace                With
            --------------------------------------------------
            b[i..j]                bitSlice(b, i, j)
            b[] = expr            bitSliceAssign(b, expr)
            b[i..j] = expr;        bitSliceAssign(b, i, j, expr);
            b ~ c                bitSliceCat(b, c);
            b ~= c                bitSliceCatAssign(b, c);
*/

//===============================================
// This version is for reading from a bit slice
//
unittest
{
    bit b[256];
    b[0] = b[16] = 1;
    confirm(bitSlice(b,0,16) == bitSlice(b,16,32));
}

bit[] bitSlice(bit[] b, uint i, uint j)
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion u;

        // Convert from a bit slice to a ubyte slice
        u.bitRef = b;
        assert((u.length & 7) == 0);
        u.length >>= 3;

        // Take the desired slice
        u.ubyteRef = u.ubyteRef[i>>3..j>>3];

        // Convert it back to a bit slice
        u.length <<= 3;
        return u.bitRef;
    }
    else
    {
        return b[i..j];    // Assumes bit slicing works. This will be unit
tested.
    }
}

//===================================================================
// This version is for writing to a bit slice with a constant value
//

unittest
{
    bit b[256];
    bitSliceAssign(b,16,32,1);
    confirm(b[16] == 1);
}

bit[] bitSliceAssign(bit[] b, bit e)
{
    return bitSliceAssign(b, 0, b.length, e);
}

bit[] bitSliceAssign(bit[] b, uint i, uint j, bit e)
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion u;

        // Convert from a bit slice to a ubyte slice
        u.bitRef = b;
        assert((u.length & 7) == 0);
        u.length >>= 3;

        // Write into the desired slice
        u.ubyteRef[i>>3..j>>3] = (e ? 0xFF : 0);

        // Convert back to a bit slice
        u.length <<= 3;
        return u.bitRef;
    }
    else
    {
        return b[i..j] = e;    // Assumes bit slicing works. This will be unit
tested.
    }
}

//=========================================================
// This version is for pasting one bit slice into another
//

unittest
{
    bit b[256];
    bit e[16];
    e[0] = 1;
    bitSliceAssign(b, 16, 32, bitSlice(e, 0, 16));
    confirm(b[16] == 1);
}

bit[] bitSliceAssign(bit[] b, bit[] e)
{
    return bitSliceAssign(b, 0, b.length, e);
}

bit[] bitSliceAssign(bit[] b, uint i, uint j, bit[] e)
in
{
    assert(j - i == e.length);
}
body
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion ub, ue;

        // Convert from bit slices to ubyte slices
        ub.bitRef = b;
        assert((ub.length & 7) == 0);
        ub.length >>= 3;

        ue.bitRef = e;
        assert((ue.length & 7) == 0);
        ue.length >>= 3;

        // Write the desired slice
        ub.ubyteRef[i>>3..j>>3] = ue.ubyteRef[0..(j-i)>>3];

        // Convert everything back to bit slices
        ub.length <<= 3;
        ue.length <<= 3;
        return ub.bitRef;
    }
    else
    {
        return b[i..j] = e[0..j-i];    // Assumes bit slicing works. This will
be unit tested.
    }
}

//=============================================================
// This version is for concatenating one bit slice onto another
//

bit[] bitSliceCat(bit[] b, bit[] c)
{
    bit[] r;
    r.length = b.length + c.length;
    bitSliceAssign(r, 0, b.length, b);
    bitSliceAssign(r, b.length, r.length, c);
    return r;
}

//=============================================================
// This version is for concatenating one bit slice onto another and assigning
the result back
// onto the original
//

bit[] bitSliceCatAssign(inout bit[] b, bit[] c)
{
    uint bLen = b.length;
    b.length = bLen + c.length;
    bitSliceAssign(b, bLen, b.length, c);
    return b;
}

// Supporting stuff

private union BitSliceUnion
{
    bit[]    bitRef;
    ubyte[] ubyteRef;
    uint    length;
}

class BitSliceException : Exception
{
    this(char[] s)
    {
        super(s);
    }
}

void confirm(int assertion)
{
    debug
    {
        if (!assertion)
        {
            printf("This version of the D compiler contains a bug which
prevents\n");
            printf("slicing of bit arrays from working properly\n\n");
            printf("You need to define the symbol BitSliceWorkaround, and
replace\n");
            printf("all bit array slicing operations with the appropriate
function\n");
            printf("from etc.workaround.bitslice\n");
        }
    }
    assert(assertion);
}

Jun 11 2004

Sean Kelly <sean f4.ca> writes:

In article <cad0gf$29ad$1 digitaldaemon.com>, Arcane Jill says...
In article <cabhl0$7f2$2 digitaldaemon.com>, Walter says...
 That's EXACTLY what my workaround does. You can have the code for free if

you
 want.

Thanks!

Oky doke - here goes. One thing though - this is merely a workaround for
existing bugs, it does not really add any new functionality beyond what such
arrays are supposed to do already. So I don't imagine you will use this code.
You'd probably prefer to just fix the bugs, then a workaround won't be needed at
all.

Very nice.  This brings up a question though... assuming I want to know how many
bits are in a byte, should I use the standard C defines or will D provide its
own method?  I know it will probably be quite a long time before D is ported to
a system that doesn't use 8-bit bytes, but I'm the careful type :)

Sean

Jun 11 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 Oky doke - here goes. One thing though - this is merely a workaround for
 existing bugs, it does not really add any new functionality beyond what such
 arrays are supposed to do already. So I don't imagine you will use this code.
 You'd probably prefer to just fix the bugs, then a workaround won't be needed
at
 all.

<snip>

Just as I've been writing the bit offset implementation that we've been 
talking about....

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/495

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 14 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cak50j$n70$1 digitaldaemon.com>, Stewart Gordon says...
Just as I've been writing the bit offset implementation that we've been 
talking about....

Excellent!

Jun 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Walter wrote:

Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.
  

Yeah I've been thinking this is probably the best option.  Users could 
make there own bit-pointers for the extra 3 bits.  Parhaps you could 
enable something like (for the boundary approach):

bit [] array;
...
bit * bp = &array[0];

bp[1] = 1; //Access bit 1 in bp pointer 1

bp[1000] //Try to access bit 1000

A third option I was thinking about:

Use an 32-bits to always offset the bit array from the start of the bit 
array (thus requiring 64-bits).  I'm sure that that would enable some 
parts of the algorithm to be optimised, such as interating though the 
loop and slicing, only one value out of the two would need to be 
incremented.

When converting to another pointer type (such as void) compute the byte 
boundary, and lose the bit location information <- that could also be 
done with stewards suggestion.  Now if the user wrote something like:

byte * bp = cast(byte*) &array[0];

Then the compiler could optimise out the extra 32-bits.

Rational: We don't have any bit pointer at the moment so we have no 
performace to lose.  This way a bit pointer is more functional and you 
still can slice along the boundary.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 09 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter wrote:

 Another option is to only allow bit slicing on byte boundaries, and only 
 allow pointers to bits if they are in bit 0 of a byte.

Yes, that was another suggestion in the debate.  But I'm inclined to 
believe some of my experiments could be put to practical use.

I'll probably do some more experimenting over the weekend....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 10 2004

Sean Kelly <sean f4.ca> writes:

In article <ca4pg0$1tfs$1 digitaldaemon.com>, Walter says...
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...

 And any particular reason for not inventing a whole new operator or two
 for stream I/O, as I briefly suggested?

 http://www.digitalmars.com/drn-bin/wwwnews?D/25096

I still think there's got to be a better way.

Me too, but darned if I've come up with the answer so far.  But if push comes to
shove I do prefer:

ostream << a << b << c;

to:

print( ostream, a );
print( ostream, b );
print( ostream, c );

I'm going to play around with the possibilities in the next few days and see if
I can't come up with an alternative :p


Sean

Jun 08 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

I'l be watching for your alternative Sean ...

"Sean Kelly" <sean f4.ca> wrote in message
news:ca4sn3$231k$1 digitaldaemon.com...
 In article <ca4pg0$1tfs$1 digitaldaemon.com>, Walter says...
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...

 And any particular reason for not inventing a whole new operator or two
 for stream I/O, as I briefly suggested?

 http://www.digitalmars.com/drn-bin/wwwnews?D/25096

I still think there's got to be a better way.

 Me too, but darned if I've come up with the answer so far.  But if push

comes to
 shove I do prefer:

 ostream << a << b << c;

 to:

 print( ostream, a );
 print( ostream, b );
 print( ostream, c );

 I'm going to play around with the possibilities in the next few days and

see if
 I can't come up with an alternative :p


 Sean

Jun 08 2004

D Programming

C/C++ Programming

Other

digitalmars.D - DMD 0.92 release