www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DMD 0.92 release

reply "Walter" <newshound digitalmars.com> writes:
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html
Jun 07 2004
next sibling parent DemmeGod <me demmegod.com> writes:
Package attribute!  Yeee-haw!

On Mon, 07 Jun 2004 14:33:37 -0700, Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
 
 http://www.digitalmars.com/d/changelog.html
Jun 07 2004
prev sibling next sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Walter wrote:

It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html
  
Wow, Walter you must have been visited by an angle or something last night. You've done a complete backflip on so many issues. Not that I mind, its a good thing you keep an opened mind. default arguments yay. -- -Anderson: http://badmama.com.au/~anderson/
Jun 07 2004
next sibling parent reply J C Calvarese <jcc7 cox.net> writes:
J Anderson wrote:
...
 default arguments yay.
Hip! Hip! Hooray! -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Jun 07 2004
parent Charlie <Charlie_member pathlink.com> writes:
<sings>For hes a jolly good fellow</sings>

In article <ca2v2q$1n14$1 digitaldaemon.com>, J C Calvarese says...
J Anderson wrote:
...
 default arguments yay.
Hip! Hip! Hooray! -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Jun 07 2004
prev sibling parent James Widman <james jwidman.com> writes:
In article <ca2oi1$1d41$1 digitaldaemon.com>,
 J Anderson <REMOVEanderson badmama.com.au> wrote:
 Wow, Walter you must have been visited by an angle or something last 
 night. 
Angles? Dude, that's sick. I can't help but picture some poor Flatlander's 2-dimensional body parts strewn about... :-) All of us healthy-minded people know Walter was really visited by Mr. Hyper-sphere from 4-space.
Jun 07 2004
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).
Great timing :) I had just started looking at streams today. Just to clarify, it looks like the old rules would not evaluate b.opfunc_r(a) if a.opfunc is defined, whether or not there was an overload for a.opfunc(b). Do I have this right? Sean
Jun 07 2004
parent "Walter" <newshound digitalmars.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:ca2pn3$1et4$1 digitaldaemon.com...
 In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D.
Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).
Great timing :) I had just started looking at streams today. Just to
clarify,
 it looks like the old rules would not evaluate b.opfunc_r(a) if a.opfunc
is
 defined, whether or not there was an overload for a.opfunc(b).  Do I have
this
 right?
Right.
Jun 07 2004
prev sibling next sibling parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html
Walter: Thxs! For the "Added default arguments to function parameters." :)) Now I can pull out all my wrapper functions...this is some really Great News!! <*Wonders*> To you think Phobos.std.string could get a non-case sensitive version of find (ifind) and rfind (irfind) added to it sometime in the near future? It would be very useful (even if it just does ASCII). Thxs for your reply in advance. :)
Jun 07 2004
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca2u66$1lui$1 digitaldaemon.com...
 <*Wonders*> To you think Phobos.std.string could get a non-case sensitive
 version of find (ifind) and rfind (irfind) added to it sometime in the
near
 future? It would be very useful (even if it just does ASCII). Thxs for
your
 reply in advance. :)
Do you want to write one and donate it?
Jun 07 2004
parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca37ns$24qn$1 digitaldaemon.com>, Walter says...
"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca2u66$1lui$1 digitaldaemon.com...
 <*Wonders*> To you think Phobos.std.string could get a non-case sensitive
 version of find (ifind) and rfind (irfind) added to it sometime in the
near
 future? It would be very useful (even if it just does ASCII). Thxs for
your
 reply in advance. :)
Do you want to write one and donate it?
Walter: Sure, I'd be happy to donate the ifind() and irfind() functions I've already written to solve my problem. Both have the normal (char[], char[]) parameters of find() and rfind(), but I've added a third int parameter that allows setting the starting position for the search within the String, which nomally defaults to a "0" when only the first two are passed in. But it's nothing fancy compared to the code I've seen from others here in the forum. So how would go about donating code toward making "D" better developer tool? :)) "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 07 2004
parent reply "Walter" <newshound digitalmars.com> writes:
Post it and let's have a look!
Jun 07 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
This is the best release of D I've seen so far. It's brilliant. Well done.
Jill
Jun 08 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca3pa5$1r7$1 digitaldaemon.com...
 This is the best release of D I've seen so far. It's brilliant. Well done.
Thanks! I hope each one is better than the last <g>.
Jun 08 2004
prev sibling parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca3jmn$2pjm$1 digitaldaemon.com>, Walter says...
Post it and let's have a look!
Walter: Here they be. :) I kept to a very simple "KISS" approach, and builded these functions upon the existing std.string functions. But if you'd like me to, I could make these functions independent of the other std.string functions so that these are in a stand alone raw "D" code format. I just didn't feel at the time that I should to follow a "Recreate the Wheel" approach, when the existing functions worked fine for what I needed. Note - My indenting will disappear when I post this thru the Web...sorry about that! :( /******************************************************************* * Function : int ifind( in char[], in char[], in int = 0 ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and set * the third parameter as a default of 0 * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) ******************************************************************* * * Note: Meant to be a case insensitive version of std.string.find * with an optional start looking from this "String Position" parameter. */ int ifind ( in char[] sStr, in char[] sSubStr, in int iStartPos = 0 ) { char[] sTmpStr; int iRtnVal; // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 ) return -1; // If greater than to upper boundary return not found if ( iStartPos > sStr.length - 1 ) return -1; // If less than to lower boundary return not found else if ( iStartPos < 0 ) return - 1; sTmpStr = tolower( sStr[ iStartPos .. sStr.length ] ); if ( iStartPos == 0 ) return find( sTmpStr, tolower( sSubStr ) ); else { iRtnVal = find( sTmpStr, tolower( sSubStr ) ); return ( iRtnVal != -1 ) ? iStartPos + iRtnVal : iRtnVal; } } // end int ifind( char[],char[], int = 0 ) /******************************************************************* * Function : int irfind( in char[], in char[], in int = -1 ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and set * the third parameter as a default of -1 * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) ******************************************************************* * * Note: Meant to be a case insensitive version of std.string.rfind * with an optional start looking from this "String Position" parameter. */ int irfind ( in char[] sStr, in char[] sSubStr, in int iEndPos = -1 ) { char[] sTmpStr; // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 ) return -1; // If iEndPos == -1 get the full length of the string if ( iEndPos == -1 ) iEndPos = sStr.length - 1; // If greater than to upper boundary return not found else if ( iEndPos > sStr.length - 1 ) return -1; // If less than to lower boundary return not found else if ( iEndPos < 0 ) return - 1; sTmpStr = tolower( sStr[ 0 .. iEndPos + 1 ] ); return rfind( sTmpStr, tolower( sSubStr ) ); } // end int irfind( char[],char[], int = -1 ) ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
parent reply "Walter" <newshound digitalmars.com> writes:
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice the
string to be searched. -Walter

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca4lma$1m9r$1 digitaldaemon.com...
 In article <ca3jmn$2pjm$1 digitaldaemon.com>, Walter says...
Post it and let's have a look!
Walter: Here they be. :) I kept to a very simple "KISS" approach, and
builded
 these functions upon the existing std.string functions. But if you'd like
me to,
 I could make these functions independent of the other std.string functions
so
 that these are in a stand alone raw "D" code format. I just didn't feel at
the
 time that I should to follow a "Recreate the Wheel" approach, when the
existing
 functions worked fine for what I needed.

 Note - My indenting will disappear when I post this thru the Web...sorry
about
 that! :(

 /*******************************************************************
 * Function      : int ifind( in char[], in char[], in int = 0 )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and set
 *                           the third parameter as a default of 0
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 *       with an optional start looking from this "String Position"
parameter.
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr,
 in int    iStartPos = 0
 )
 {
 char[] sTmpStr;
 int    iRtnVal;

 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // If greater than to upper boundary return not found
 if ( iStartPos > sStr.length - 1 )
 return -1;

 // If less than to lower boundary return not found
 else if ( iStartPos < 0 )
 return - 1;

 sTmpStr = tolower( sStr[ iStartPos .. sStr.length ] );

 if ( iStartPos == 0 )
 return find( sTmpStr, tolower( sSubStr ) );
 else
 {
 iRtnVal = find( sTmpStr, tolower( sSubStr ) );

 return ( iRtnVal != -1 ) ? iStartPos + iRtnVal : iRtnVal;
 }
 } // end int ifind( char[],char[], int = 0 )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[], in int = -1 )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and set
 *                           the third parameter as a default of -1
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind
 *       with an optional start looking from this "String Position"
parameter.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr,
 in int    iEndPos = -1
 )
 {
 char[] sTmpStr;

 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // If iEndPos == -1 get the full length of the string
 if ( iEndPos == -1 )
 iEndPos = sStr.length - 1;

 // If greater than to upper boundary return not found
 else if ( iEndPos > sStr.length - 1 )
 return -1;

 // If less than to lower boundary return not found
 else if ( iEndPos < 0 )
 return - 1;

 sTmpStr = tolower( sStr[ 0 .. iEndPos + 1 ] );

 return rfind( sTmpStr, tolower( sSubStr ) );

 } // end int irfind( char[],char[], int = -1 )

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
(in response to David L. Davis)
I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string.
(I'm actually replying to David here, not Walter) Another big problem with uppercasing the whole string is that it could be very slow. Imagine if the strings you were comparing were gigabytes long. Now imagine that the substring you were looking for could have been found right near the start of the string. Converting the case of all those gigs would have been unnecessary. Sorry to be a downer, but I learned this the hard way a couple of years ago. I actually managed to implement the whole Unicode case comparison algorithm for real, including special casing and everything. Man, was it S-L-O-W. (It casefolded everything before the compare even started). Then I optimized it by making look at only as much of the strings as it needed to, and after that it whizzed by. Jill
Jun 08 2004
parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca4r7l$20ig$1 digitaldaemon.com>, Arcane Jill says...
In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
(in response to David L. Davis)
I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string.
(I'm actually replying to David here, not Walter) Another big problem with uppercasing the whole string is that it could be very slow. Imagine if the strings you were comparing were gigabytes long. Now imagine that the substring you were looking for could have been found right near the start of the string. Converting the case of all those gigs would have been unnecessary. Sorry to be a downer, but I learned this the hard way a couple of years ago. I actually managed to implement the whole Unicode case comparison algorithm for real, including special casing and everything. Man, was it S-L-O-W. (It casefolded everything before the compare even started). Then I optimized it by making look at only as much of the strings as it needed to, and after that it whizzed by. Jill
Jill: I just missed seeing your message before I posted the code again, sorry. But course you're right, if a very large string of data in passed in it's will slow things down a lot... darn, how in the heck did I miss that, cause I too have had to deal with a similar problem few years ago too. :) I think what it is, is with Walter giving me a half a chance to add something (no matter how small it may be) to the Phoboes.std library, is a real "Honor" and I don't want to blow it. :) Just knowing how very busy Walter is, I really appreciate him giving me this golden opportunity to contribute, and to feel a part of the "D" community. I feel like a young Skywalker in training, learning how to best use "The Force!" ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
prev sibling parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice the
string to be searched. -Walter
Walter: Ok, I think I've followed your suggestions correctly. :) Please let me know if I've gotten it right, or if I'm off track somehow? import std.c.stdio; import std.string; /******************************************************************* * Function : int ifind( in char[], in char[] ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and the * default parameter, mainly because the * string being passed in should be already * sliced so that the next search will find * the matching sub-string value. Also per * advice from Walter, <g> I've removed every * tolower() call, and now locally all characters * in the strings are set to lowercase where they * sit without the need to create a another copy. * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) ******************************************************************* * * Note: Meant to be a case insensitive version of std.string.find */ int ifind ( in char[] sStr, in char[] sSubStr ) { // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 ) return -1; // sStr set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sStr ) sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] + 0x20 : sStr[ iStrPos ]; // sSubStr set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sSubStr ) sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[ iStrPos ] + 0x20 : sSubStr[ iStrPos ]; return find( sStr, sSubStr ); } // end int ifind( in char[], in char[] ) /******************************************************************* * Function : int irfind( in char[], in char[] ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and the * default parameter, mainly because the * string being passed in should be already * sliced so that the next search will find * the matching sub-string value. Also per * advice from Walter, <g> I've removed every * tolower() call, and now locally all characters * in the strings are set to lowercase where they * sit without the need to create a another copy. * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) ******************************************************************* * * Note: Meant to be a case insensitive version of std.string.rfind. */ int irfind ( in char[] sStr, in char[] sSubStr ) { // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 ) return -1; // sStr set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sStr ) sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] + 0x20 : sStr[ iStrPos ]; // sSubStr set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sSubStr ) sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[ iStrPos ] + 0x20 : sSubStr[ iStrPos ]; return rfind( sStr, sSubStr ); } // end int irfind( in char[], in char[] ) // Test ifind() and irfind() to find multiples of the same sub-string int main() { int iStrPos = 0; int iSlicePos = 0; char[] sStrTest = "ApO 123355 PO Box 23, Waterpool Street Portland, Texas"; printf( "Original = %.*s\n", sStrTest ); iStrPos = 0; iSlicePos = 0; while ( iSlicePos != -1 ) { iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" ); if ( iSlicePos != -1 ) { printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos + iSlicePos ); iStrPos = iStrPos + iSlicePos + "PO".length; } } printf("\n\n"); iStrPos = sStrTest.length - 1; iSlicePos = 0; while ( iSlicePos != -1 && iStrPos >= 0 ) { iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" ); if ( iSlicePos != -1 ) { printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos ); iStrPos = iSlicePos - "PO".length; } } return 0; } // end int main() ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
The function uppercases the input string. It shouldn't modify its inputs.

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca54is$2h2r$1 digitaldaemon.com...
 In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to
avoid
it if you can, i.e. just go through character by character and comparing
that way. Second, I prefer to avoid a starting index; instead just slice
the
string to be searched. -Walter
Walter: Ok, I think I've followed your suggestions correctly. :) Please
let me
 know if I've gotten it right, or if I'm off track somehow?

 import std.c.stdio;
 import std.string;

 /*******************************************************************
 * Function      : int ifind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all characters
 *                           in the strings are set to lowercase where they
 *                           sit without the need to create a another copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +
0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[
iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return find( sStr, sSubStr );

 } // end int ifind( in char[], in char[] )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all characters
 *                           in the strings are set to lowercase where they
 *                           sit without the need to create a another copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +
0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[
iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return rfind( sStr, sSubStr );

 } // end int irfind( in char[], in char[] )

 // Test ifind() and irfind() to find multiples of the same sub-string
 int main()
 {

 int    iStrPos   = 0;
 int    iSlicePos = 0;
 char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland,
Texas";
 printf( "Original = %.*s\n", sStrTest );

 iStrPos   = 0;
 iSlicePos = 0;

 while ( iSlicePos != -1 )
 {
 iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos +
iSlicePos );
 iStrPos = iStrPos + iSlicePos + "PO".length;
 }
 }

 printf("\n\n");

 iStrPos  = sStrTest.length - 1;
 iSlicePos = 0;

 while ( iSlicePos != -1 && iStrPos >= 0 )
 {
 iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
 iStrPos = iSlicePos - "PO".length;
 }
 }

 return 0;

 } // end int main()

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com> 
wrote:
 The function uppercases the input string. It shouldn't modify its inputs.
A perfect example of where 'in' should mean 'const' and the compiler should catch this error. Regan
 "David L. Davis" <SpottedTiger yahoo.com> wrote in message
 news:ca54is$2h2r$1 digitaldaemon.com...
 In article <ca4ot4$1sek$2 digitaldaemon.com>, Walter says...
Thanks! I have some suggestions, though <g>. First, using toupper()
allocates memory for the new string. While this works, it's better to
avoid
it if you can, i.e. just go through character by character and 
comparing
that way. Second, I prefer to avoid a starting index; instead just 
slice
the
string to be searched. -Walter
Walter: Ok, I think I've followed your suggestions correctly. :) Please
let me
 know if I've gotten it right, or if I'm off track somehow?

 import std.c.stdio;
 import std.string;

 /*******************************************************************
 * Function      : int ifind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all 
 characters
 *                           in the strings are set to lowercase where 
 they
 *                           sit without the need to create a another 
 copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.find
 */
 int ifind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +
0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[
iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return find( sStr, sSubStr );

 } // end int ifind( in char[], in char[] )


 /*******************************************************************
 * Function      : int irfind( in char[], in char[] )
 * Author        : David L. 'SpottedTiger' Davis
 * Language      : DigitalMars "D" aka Mars v0.92
 * Created Date  : 03.Jun.04
 * Modified Date : 08.Jun.04 Removed the wrapper function and the
 *                           default parameter, mainly because the
 *                           string being passed in should be already
 *                           sliced so that the next search will find
 *                           the matching sub-string value. Also per
 *                           advice from Walter, <g> I've removed every
 *                           tolower() call, and now locally all 
 characters
 *                           in the strings are set to lowercase where 
 they
 *                           sit without the need to create a another 
 copy.
 * Requirements  : std.string
 * Licence       : Same as those for the Phobos (Runtime Library)
 *******************************************************************
 *
 * Note: Meant to be a case insensitive version of std.string.rfind.
 */
 int irfind
 (
 in char[] sStr,
 in char[] sSubStr
 )
 {
 // If either of the string parameters are empty, return not found
 if ( sStr.length < 1 || sSubStr.length < 1 ) return -1;

 // sStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sStr )
 sStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sStr[ iStrPos ] +
0x20 :
 sStr[ iStrPos ];

 // sSubStr set to lowercase locally
 // lowercase ascii a = '\x61', uppercase ascii A = '\x41'
 foreach ( int iStrPos, char cChar; sSubStr )
 sSubStr[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStr[
iStrPos ] +
 0x20 : sSubStr[ iStrPos ];

 return rfind( sStr, sSubStr );

 } // end int irfind( in char[], in char[] )

 // Test ifind() and irfind() to find multiples of the same sub-string
 int main()
 {

 int    iStrPos   = 0;
 int    iSlicePos = 0;
 char[] sStrTest  = "ApO 123355 PO Box 23, Waterpool Street Portland,
Texas";
 printf( "Original = %.*s\n", sStrTest );

 iStrPos   = 0;
 iSlicePos = 0;

 while ( iSlicePos != -1 )
 {
 iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos +
iSlicePos );
 iStrPos = iStrPos + iSlicePos + "PO".length;
 }
 }

 printf("\n\n");

 iStrPos  = sStrTest.length - 1;
 iSlicePos = 0;

 while ( iSlicePos != -1 && iStrPos >= 0 )
 {
 iSlicePos = irfind( sStrTest[ 0 .. iStrPos ], "PO" );

 if ( iSlicePos != -1 )
 {
 printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos );
 iStrPos = iSlicePos - "PO".length;
 }
 }

 return 0;

 } // end int main()

 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 08 2004
parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <opr9aqpftm5a2sq9 digitalmars.com>, Regan Heath says...
On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com> 
wrote:
 The function uppercases the input string. It shouldn't modify its inputs.
A perfect example of where 'in' should mean 'const' and the compiler should catch this error. Regan
Regan: If an "in" acts like an "inout" for strings, does it do this for all the other different types (int, real, long, etc.) too? :( Seems confusing, when is an "in" and "in" and not an "inout?" ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 8 Jun 2004 22:35:23 +0000 (UTC), David L. Davis 
<SpottedTiger yahoo.com> wrote:

 In article <opr9aqpftm5a2sq9 digitalmars.com>, Regan Heath says...
 On Tue, 8 Jun 2004 14:56:32 -0700, Walter <newshound digitalmars.com>
 wrote:
 The function uppercases the input string. It shouldn't modify its 
 inputs.
A perfect example of where 'in' should mean 'const' and the compiler should catch this error. Regan
Regan: If an "in" acts like an "inout" for strings, does it do this for all the other different types (int, real, long, etc.) too?
No. Strings are passed by reference, (int, real, long, etc.) are not, see below..
  :(  Seems confusing, when is
 an "in" and "in" and not an "inout?"
Exactly! I believe strings and other arrays are all passed by reference and due to this you can change the *contents* of the string, but not the *reference* to the string. If you passed it as an inout you could change both the *contents* and the *reference*. Regan.
 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 08 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca5evb$1d8$1 digitaldaemon.com>, David L. Davis says...
If an "in" acts like an "inout" for strings, does it do this for all the
other different types (int, real, long, etc.) too?
For array, classes and pointers, but not for primitive types or for structs.
 :(
I echo that sentiment.
Seems confusing, when is
an "in" and "in" and not an "inout?"
Unlike C and C++, D provides no DbC mechanism for catching const errors. You can do it, but you have to try REALLY hard. The following example WILL assert as a consequence of a DbC const error (I've tested it):
   private char[][] backup;
   void f(in char[] s)
   in
   {
       backup.length = backup.length + 1;
       backup[backup.length-1] = s.dup;
   }
   out
   {
       assert(s == backup[backup.length-1]);
       backup.length = backup.length - 1;
   }
   body
   {
       s[0] ='*'; // violates my DbC assertion of s's constness
   }

   int main(char[][] args)
   {
       char[] s = "hello";
       f(s);
       return 0;
   }
However - even THAT won't work if an exception is thrown or if the code is multi-threaded. You'd have to also make the whole thing synchronized AND wrapped in try/catch to ensure you got that. (And, so far as I know, there is no way to introduce either "synchronized" or "try/catch" in a release build only, without writing the whole function twice). So - like you so eloquently put it earlier, :( Jill
Jun 08 2004
prev sibling parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca5ct6$2vif$1 digitaldaemon.com>, Walter says...
The function uppercases the input string. It shouldn't modify its inputs.
Walter: Third time around is normally the "Charm!" Anywayz, I've been hammering away at these two functions ifind() and irfind(), and I believe I've make them much better than before, thanks to both you and Jill for the advice. Please, let me know if I've still missed something, but if not I may ask some of the folks here to do a little testing of these functions. Right now tho I'm tried and seeing double (it's late here), so I'll check what's up in the morning. I will be bright and brushly tailed tomorrow to fix any problems found. Thxs for giving a chance at denoting some code...I've learned a few more things about how to use "D", and that's all good indeed! :)) import std.c.stdio; import std.string; /**************************************************************************** * Function : int ifind( in char[], in char[] ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and the * default parameter, mainly because the * string being passed in should be already * sliced so that the next search will find * the matching sub-string value. Also per * advice from Walter, <g> I've removed every * tolower() call, and now locally all characters * in the strings are set to lowercase where they * sit without the need to create a another copy. * : 09.Jun.04 Reworked the whole thing! Fixed the problem * with the input string getting stepped on, and * now only the sSubStr to duped to another string. * While the sStr string is looked at in a loop * looking for the matchng SubString...a character * at a time. * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) ***************************************************************************** * * Note: Meant to be a case insensitive version of std.string.find */ int ifind ( in char[] sStr, in char[] sSubStr ) { char[] sSubStrTmp; bool bFoundMatch = false; int iFound1stPos = -1; int iSubStrRunner = 0; char cCharTmp; // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 || sSubStr.length > sStr.length ) return -1; // Get a working copy of sSubStr sSubStrTmp = sSubStr.dup; // sSubStrTmp set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sSubStrTmp ) sSubStrTmp[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStrTmp[ iStrPos ] + 0x20 : sSubStrTmp[ iStrPos ]; foreach ( int iStrPos, char cChar; sStr ) { cCharTmp = cChar; // If cChar is an uppercase ASCII, make it lowercase for the compare cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' : cCharTmp ); //printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, cChar, cCharTmp, sStr.length ); // Find the very 1st character of the Sub String is found within the Main String if ( cCharTmp == sSubStrTmp[ 0 ] && bFoundMatch == false ) { iFound1stPos = iStrPos; bFoundMatch = true; iSubStrRunner = 1; if ( sSubStrTmp.length == 1 ) return iFound1stPos; continue; } // Match the rest of the characters in the Sub String is found within the Main String else if ( cCharTmp == sSubStrTmp[ iSubStrRunner ] && bFoundMatch == true ) { iSubStrRunner++; if ( iSubStrRunner > sSubStrTmp.length - 1 ) return iFound1stPos; continue; } // Not all characters match, reset else if ( bFoundMatch == true ) { // Not a total match, reset back to defaults iFound1stPos = -1; bFoundMatch = false; iSubStrRunner = 0; } } return -1; } // end int ifind( in char[], in char[] ) /**************************************************************************** * Function : int irfind( in char[], in char[] ) * Author : David L. 'SpottedTiger' Davis * Language : DigitalMars "D" aka Mars v0.92 * Created Date : 03.Jun.04 * Modified Date : 08.Jun.04 Removed the wrapper function and the * default parameter, mainly because the * string being passed in should be already * sliced so that the next search will find * the matching sub-string value. Also per * advice from Walter, <g> I've removed every * tolower() call, and now locally all characters * in the strings are set to lowercase where they * sit without the need to create a another copy. * : 09.Jun.04 Reworked the whole thing! Fixed the problem * with the input string getting stepped on, and * now only the sSubStr to duped to another string. * While the sStr string is looked at in a loop * looking for the matchng SubString...a character * at a time. * Requirements : std.string * Licence : Same as those for the Phobos (Runtime Library) **************************************************************************** * * Note: Meant to be a case insensitive version of std.string.rfind. */ int irfind ( in char[] sStr, in char[] sSubStr ) { char[] sSubStrTmp; bool bFoundMatch = false; int iFound1stPos = -1; int iSubStrRunner = 0; char cCharTmp; // If either of the string parameters are empty, return not found if ( sStr.length < 1 || sSubStr.length < 1 || sSubStr.length > sStr.length ) return -1; // Get a working copy of sSubStr sSubStrTmp = sSubStr.dup; // sSubStrTmp set to lowercase locally // lowercase ascii a = '\x61', uppercase ascii A = '\x41' foreach ( int iStrPos, char cChar; sSubStrTmp ) sSubStrTmp[ iStrPos ] = ( find( uppercase, cChar ) != -1 ) ? sSubStrTmp[ iStrPos ] + 0x20 : sSubStrTmp[ iStrPos ]; for ( int iStrPos = sStr.length - 1; iStrPos >= 0; iStrPos-- ) { cCharTmp = sStr[ iStrPos ]; // If cChar is an uppercase ASCII, make it lowercase for the compare cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' : cCharTmp ); //printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, sStr[ iStrPos ], cCharTmp, sStr.length ); // Find the very 1st character of the Sub String is found within the Main String if ( cCharTmp == sSubStrTmp[ 0 ] && bFoundMatch == false ) { iFound1stPos = iStrPos; bFoundMatch = true; iSubStrRunner = 1; //printf( "iStrPos=%d, cChar=%c, cCharTmp=%c, sStr.length=%d\n", iStrPos, sStr[ iStrPos ], cCharTmp, sStr.length ); if ( sSubStrTmp.length == 1 ) return iFound1stPos; if ( iStrPos + 1 > sStr.length - 1 ) continue; for ( int iInnerLoop = iStrPos + 1; iInnerLoop < sStr.length; iInnerLoop++ ) { cCharTmp = sStr[ iInnerLoop ]; // If cChar is an uppercase ASCII, make it lowercase for the compare cCharTmp = ( cCharTmp >= '\x41' && cCharTmp <= '\x5A' ? cCharTmp + '\x20' : cCharTmp ); // Match the rest of the characters in the Sub String is found within the Main String if ( cCharTmp == sSubStrTmp[ iSubStrRunner ] && bFoundMatch == true ) { iSubStrRunner++; if ( iSubStrRunner > sSubStrTmp.length - 1 ) return iFound1stPos; continue; } // Not all characters match, reset else if ( bFoundMatch == true ) { // Not a total match, reset back to defaults iFound1stPos = -1; bFoundMatch = false; iSubStrRunner = 0; break; } } } } return -1; } // end int irfind( in char[], in char[] ) // Test ifind() and irfind() to find multiple of the same sub-string int main() { int iStrPos; int iSlicePos; char[] sStrTest = "ApO 123355 PO Box 23, Waterpool Street Portland, Texas"; printf( "Original Before = %.*s\n\n", sStrTest ); iStrPos = 0; iSlicePos = 0; while ( iSlicePos != -1 ) { iSlicePos = ifind( sStrTest[ iStrPos .. sStrTest.length - 1 ], "PO" ); if ( iSlicePos != -1 ) { printf( "Found \'PO\' at position with ifind()= %d\n", iStrPos + iSlicePos ); iStrPos = iStrPos + iSlicePos + "PO".length; } } printf("\n\n"); iStrPos = sStrTest.length - 1; iSlicePos = 0; while ( iSlicePos != -1 && iStrPos >= 0 ) { iSlicePos = irfind( sStrTest[ 0 .. iStrPos + 1 ], "PO" ); if ( iSlicePos != -1 ) { printf( "Found \'PO\' at position with irfind()= %d\n", iSlicePos ); iStrPos = iSlicePos - "PO".length; } } printf("\n\n"); printf( "Original After = %.*s\n", sStrTest ); return 0; } // end int main() ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 08 2004
next sibling parent reply "Vathix" <vathixSpamFix dprogramming.com> writes:
"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:ca69ie$1813$1 digitaldaemon.com...
 In article <ca5ct6$2vif$1 digitaldaemon.com>, Walter says...
The function uppercases the input string. It shouldn't modify its inputs.
Walter: Third time around is normally the "Charm!" Anywayz, I've been
hammering
 away at these two functions ifind() and irfind(), and I believe I've make
them
 much better than before, thanks to both you and Jill for the advice.
Hello, I just wanted to let you know that I wrote those functions awhile ago for a String class that can be found at www.dprogramming.com/stringclass.d . It contains all the free functions, and a few others such as findany(), endswith(), etc; and case insensitive versions. I haven't said much about it because it's completely based off Walter's code, so it belongs to him. The class can be stripped out to just use the functions. If the code isn't good enough, just ignore me; have fun!
Jun 09 2004
parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca7pet$fn7$1 digitaldaemon.com>, Vathix says...
Hello, I just wanted to let you know that I wrote those functions awhile ago
for a String class that can be found at www.dprogramming.com/stringclass.d .
It contains all the free functions, and a few others such as findany(),
endswith(), etc; and case insensitive versions. I haven't said much about it
because it's completely based off Walter's code, so it belongs to him. The
class can be stripped out to just use the functions. If the code isn't good
enough, just ignore me; have fun!
Vathix: I looked over your stringclass.d code that's based off of Walter's string.d, and it does looks a lot more in line with what he'll accept. Myself, I'd just like to have the ifind(char[],char[])/ifind(char[], char) and irfind(char[], char[])/irfind(char[], char) functions in std.string. Anyways, it I would seem my "C" skills have gotten a bit rusty, cause I've been relearning a lot of things I had forgotten about in trying to create a good version of the ifind() and irfind() functions for general use. So please feel free to post those ifind code potions and see what Walter thinks. Cause currently I'm beginning to think my versions are going to still be a little bulkier than what Walter will allow in, and I would like to move on and finish up my propercase() function (which is why I even mention that the ifind()/irfind() were missing in the first place). :) And beside that, Walter is a very busy man and I don't want to waste his time. ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 09 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
There's no need to .dup the strings. Just have a loop that looks like this:

for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing memchr
and memcmp with case insensitive loops, write some unit tests, and you'll be
there.
Jun 09 2004
parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca7qp4$hrl$1 digitaldaemon.com>, Walter says...
There's no need to .dup the strings. Just have a loop that looks like this:

for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing memchr
and memcmp with case insensitive loops, write some unit tests, and you'll be
there.
Walter: Ok, per your advice I've copied the original find() / rfind() functions from std.string, and modified them into ifind() / irfind(). I sure hope these will make the grade <g>. "" and all the code will be ready to copy and paste. Thxs for giving this chance to add something to "D!" :) #debug=string; // uncomment to turn on debugging printf's #debug(string) #import std.string; #int ifind #unittest #int irfind c1, c2); #unittest #int ifind return x; #unittest #int irfind(char[] s, char[] sub) return x; #unittest #int main() cast(char)'a' ) ); cast(char)'a' ) ); cast(char)'a' ) ); cast(char)'f' ) ); cast(char)'a' ) ); cast(char)'a') ); cast(char)'a') ); cast(char)'f') ); "fff" ) ); "dfeffgfff", "fff" ) ); "abcdefcdef", "c" ) ); "abcdefcdef", "cd" ) ); "abcdefcdef", "x" ) ); "abcdefcdef", "xy" ) ); "abcdefcdef", "" ) ); "abcdefcdef", "def" ) ); ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 10 2004
parent reply "Walter" <newshound digitalmars.com> writes:
That's more like it! Now, can the unit tests also test the case
insensitivity?

"David L. Davis" <SpottedTiger yahoo.com> wrote in message
news:caa4h3$150f$1 digitaldaemon.com...
 In article <ca7qp4$hrl$1 digitaldaemon.com>, Walter says...
There's no need to .dup the strings. Just have a loop that looks like
this:
for (i = 0; i < string1.length; i++)
{    char c = toupper(string1[i]);
    if (c != toupper(string2[i]))
        goto nomatch;
}

Note that it compares character by character without needing to allocate
memory. In fact, just copy the logic in find() and rfind(), replacing
memchr
and memcmp with case insensitive loops, write some unit tests, and you'll
be
there.
Walter: Ok, per your advice I've copied the original find() / rfind()
functions
 from std.string, and modified them into ifind() / irfind(). I sure hope
these
 will make the grade <g>.


 "" and all the code will be ready to copy and paste.

 Thxs for giving this chance to add something to "D!" :)








 #debug=string; // uncomment to turn on debugging printf's

 #debug(string)




 #import std.string;









 #int ifind




















 #unittest























 #int irfind
















s, c,
 c1, c2);








 #unittest























 #int ifind


































x++ )

 return x;








 #unittest



























 #int irfind(char[] s, char[] sub)



































 return x;







 #unittest




















 #int main()




 cast(char)'a' ) );

"def",
 cast(char)'a' ) );

"abba",
 cast(char)'a' ) );

"def",
 cast(char)'f' ) );





null,
 cast(char)'a' ) );

"def",
 cast(char)'a') );

"abba",
 cast(char)'a') );

"def",
 cast(char)'f') );






"a" ) );

"a" ) );

"f" ) );

"dfefffg",
 "fff" ) );

 "dfeffgfff", "fff" ) );





 "abcdefcdef", "c" ) );

 "abcdefcdef", "cd" ) );

 "abcdefcdef", "x" ) );

 "abcdefcdef", "xy" ) );

 "abcdefcdef", "" ) );

 "abcdefcdef", "def" ) );




 -------------------------------------------------------------------
 "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 10 2004
next sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <cabh23$6m1$1 digitaldaemon.com>, Walter says...
That's more like it! Now, can the unit tests also test the case
insensitivity?
Walter: Opps! Sorry about that...I've now added in some additional unittest entrys for all four functions. #debug=string; // uncomment to turn on debugging printf's #debug(string) #import std.string; #int ifind #unittest #int irfind c1, c2); #unittest #int ifind return x; #unittest #int irfind(char[] s, char[] sub) return x; #unittest #int main() ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 11 2004
prev sibling next sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <cabh23$6m1$1 digitaldaemon.com>, Walter says...
That's more like it! Now, can the unit tests also test the case
insensitivity?
Walter: Darn, I had to fix just one more thing in the code...I discovered if I the sub == the s I was giving an -1 instead of a 0. This newest version now fixes that. Thxs!, for giving the chance to write these functions!! :)) #debug=string; // uncomment to turn on debugging printf's #debug(string) #import std.string; #int ifind #unittest #int irfind c1, c2); #unittest #int ifind return x; #unittest #int irfind return x; #unittest #int main() ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 12 2004
prev sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
Walter: It would sure be nice to also have the ireplace() and icount() functions
added to Phobos...if possible. Anyway, I decided to go ahead and wrote them up
from your replace() and count() as I had done with ifind() and irfind() per your
advice. I sure hope it's ok to bug you just a little for theses kind of updates.
;)

// Note these functions build off the ifind() defined in a previous post.







#char[] ireplace
















































#unittest




















#int icount























#unittest












-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 15 2004
prev sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20
Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used.
 I feel like a young Skywalker in training, learning how to best use "The
Force!"
Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too discouraging). :) Jill
Jun 09 2004
next sibling parent Chr. Grade <Chr._member pathlink.com> writes:
Haven't read any related postings before, so I don't know if I'm far off with my
reply. Anyway, if you try to convert a lower char to an upper char in a clever
and old-fashioned way, do it like this, young Jedi:

lower to upper -> var &= 0x5F;

Chr. Grade

In article <ca71c4$2b8l$1 digitaldaemon.com>, Arcane Jill says...
In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20
Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used.
 I feel like a young Skywalker in training, learning how to best use "The
Force!"
Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too discouraging). :) Jill
Jun 09 2004
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

 In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...
 
 
sStr[ iStrPos ] + 0x20
Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used.
<snip> Except that that snippet converts upper to lower. There's always sStr[iStrPos] + 'a' - 'A' which'll work as long as the uppercase alphabet is a constant offset from the lowercase alphabet. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 09 2004
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Stewart Gordon wrote:
 Arcane Jill wrote:
 
 In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...


 sStr[ iStrPos ] + 0x20
Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used.
<snip> Except that that snippet converts upper to lower.
Well, charToLower then.
 There's always
 
     sStr[iStrPos] + 'a' - 'A'
 
 which'll work as long as the uppercase alphabet is a constant offset 
 from the lowercase alphabet.
That is exactly the same as using 0x20 directly since D's character literals are always unicode (no codepage stuff involved). So 'a'-'A' always equals 0x20. In any case, in Unicode upper and lower case characters do not have a constant offset to each other. That is only true for the ASCII subset. Hauke
Jun 09 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Hauke Duden wrote:

<snip>
 In any case, in Unicode upper and lower case characters do not have a 
 constant offset to each other. That is only true for the ASCII subset.
Yes, you do have a point there. What's more, there isn't a 1:1 mapping between uppercase and lowercase characters. And the mappings that there are aren't language independent. So we can't write a single formula that'll correctly case-convert all text in all languages. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 09 2004
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Stewart Gordon wrote:
 Hauke Duden wrote:
 
 <snip>
 
 In any case, in Unicode upper and lower case characters do not have a 
 constant offset to each other. That is only true for the ASCII subset.
Yes, you do have a point there. What's more, there isn't a 1:1 mapping between uppercase and lowercase characters.
You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html). There is also an additional "special casing" with one-to-many mappings but only a handful of characters are effected. It would be nice to support that too, but for everyday work the 1:1 mappings are usually sufficient. And the mappings that there
 are aren't language independent.
Huh? Casing is not effected by locale. Maybe you are thinking about collation? Hauke
Jun 09 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Hauke Duden wrote:

 Stewart Gordon wrote:
<snip>
 Yes, you do have a point there.  What's more, there isn't a 1:1 
 mapping between uppercase and lowercase characters.
You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html).
There seems to be a contradiction here. That file indicates that UnicodeData.txt only contains 1:1 mappings. But just as I wondered, there's a 2:1 mapping in 03C2 and 03C3.
 There is also an additional "special casing" with one-to-many 
 mappings but only a handful of characters are effected. It would be 
 nice to support that too, but for everyday work the 1:1 mappings are 
 usually sufficient.
So, which characters do the one-to-many mappings bring about?
 And the mappings that there are aren't language independent.
Huh? Casing is not effected by locale. Maybe you are thinking about collation?
What do you mean by that? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 09 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca7cgp$2svc$1 digitaldaemon.com>, Stewart Gordon says...
There seems to be a contradiction here.  That file indicates that 
UnicodeData.txt only contains 1:1 mappings.  But just as I wondered, 
there's a 2:1 mapping in 03C2 and 03C3.
Look, it's perfectly simple. Everybody's right. And because everybody's right, everybody's accusing everybody else of being wrong. THERE ARE TWO ANSWERS. "Simple casing" is a one to mapping from character to character, and is locale-independent. "Full casing" is a a one to many mapping from string to string, and is ALMOST locale independent, but not quite. Hauke's brilliant library supports simple casing, not full casing. That's why both the input and the output are characters, not strings.
So, which characters do the one-to-many mappings bring about?
For example, the German character 'ß' uppercases to "SS" when using full casing, but it stays as 'ß' using simple casing.
 And the mappings that there are aren't language independent.
Huh? Casing is not effected by locale. Maybe you are thinking about collation?
What do you mean by that?
Full casing (but not simple casing) has localized exceptions ONLY for Tukish, Lithuanian and Azeri. In principle, other exceptions could be added in the future. Simple casing is completely locale independent. Collation is a different kettle of fish, and we currently have no libraries to support it. Arcane Jill
Jun 09 2004
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
 Lithuanian and Azeri. In principle, other exceptions could be added in the
 future. Simple casing is completely locale independent.
Ouch. I didn't know that. Makes me feel happy that I stayed away from the special casings up to now ;). Thanks for clearing up the misunderstanding! Hauke
Jun 09 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca7lhq$9eo$1 digitaldaemon.com>, Hauke Duden says...
Arcane Jill wrote:
 Full casing (but not simple casing) has localized exceptions ONLY for Tukish,
 Lithuanian and Azeri. In principle, other exceptions could be added in the
 future. Simple casing is completely locale independent.
Ouch. I didn't know that. Makes me feel happy that I stayed away from the special casings up to now ;). Thanks for clearing up the misunderstanding! Hauke
Actually, I think I'd quite like to have a bash at writing some of the Unicode algorithms. I've finished the Int class now (Well, almost. I've just got to add a little bit of memory management, but I know how to do that now). After that, I was planning to move onto the next bit of my crypto lib (random numbers). But - the Unicode functions would be relatively quick to write (compared with the crypto stuff), and it would be quite nice to have a break and do something else for a change. If I do that, I'll need to collaborate with you, Hauke. There's no point in duplicating effort, and we could do with a common format for the compiled unicode data files. In a way, that's YOUR area of expertize, not mine, because you seemed to know that >>9 was more efficient than [n], something I wouldn't have known. Also, we mustn't forget the UPR format I mentioned, which has the benefits of being binary, easily parsable, extendable, publicly available, open source, and easily updateable with each new version of Unicode. I could do normalization functions first - canonical/compatibility equivalence; finding glyph boundaries, that sort of thing. But I don't want to be treading on your toes, which I would be if I went and invented a new format for the compiled data. So I don't want to do that without collaboration. My concern is that you probably only compiled in enough information to do simple casing, so I wouldn't be able to extract normalization/boundary information from your compiled format. (But I'm guessing, as I haven't studied your source code in depth). I think it would be great if a D standard library had FULL Unicode support. Even C++ and Java don't do that. (And that's not even mentioning Java's crippled 16-bit chars). It would effectively turn D into the language of choice for Unicode apps. Anyway, let me know what you think (and check out UPR, if you have time. The URL is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format itself documented at http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we could invent our own, but why re-invent the wheel?). Arcane Jill
Jun 10 2004
next sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 Actually, I think I'd quite like to have a bash at writing some of the Unicode
 algorithms. I've finished the Int class now (Well, almost. I've just got to add
 a little bit of memory management, but I know how to do that now). After that,
I
 was planning to move onto the next bit of my crypto lib (random numbers). But -
 the Unicode functions would be relatively quick to write (compared with the
 crypto stuff), and it would be quite nice to have a break and do something else
 for a change.
 
 If I do that, I'll need to collaborate with you, Hauke. There's no point in
 duplicating effort, and we could do with a common format for the compiled
 unicode data files. In a way, that's YOUR area of expertize, not mine, because
 you seemed to know that >>9 was more efficient than [n], something I wouldn't
 have known. Also, we mustn't forget the UPR format I mentioned, which has the
 benefits of being binary, easily parsable, extendable, publicly available, open
 source, and easily updateable with each new version of Unicode.
 
 I could do normalization functions first - canonical/compatibility equivalence;
 finding glyph boundaries, that sort of thing. But I don't want to be treading
on
 your toes, which I would be if I went and invented a new format for the
compiled
 data. So I don't want to do that without collaboration.
I think it is a good idea to coordinate our Unicode efforts. I haven't written any code for normalization, so this would be really useful. What I have worked on lately (when I found a few minutes of spare time) is a string interface that abstracts from the specific encoding plus implementations for some common encodings (UTF-X, Latin-1, ASCII, system code page). It includes the usual string functions like comparing (characters ordered by index - no collation), searching, concatenation, etc. Caseless comparison and searching is also implemented (using the simple lower mapping - no full case folding). So if you write normalization routines that would be great!
 My concern is that you probably only compiled in enough information to do
simple
 casing, so I wouldn't be able to extract normalization/boundary information
from
 your compiled format. (But I'm guessing, as I haven't studied your source code
 in depth).
That's true - the unichar data only contains the case mappings and character type info. I think it is important to separate the different Unicode tables, so that using a single Unicode routine won't cause ALL the data to be linked into the program.
 I think it would be great if a D standard library had FULL Unicode support.
Even
 C++ and Java don't do that. (And that's not even mentioning Java's crippled
 16-bit chars). It would effectively turn D into the language of choice for
 Unicode apps.
I agree - that is my goal as well. In fact I see it as an opportunity to influence the language in its early stages so that it will have standardized(!) Unicode support. It prevents every component developer from implementing his own, which can cause lots of unnecessary bloat (Unicode data isn't small...).
 Anyway, let me know what you think (and check out UPR, if you have time. The
URL
 is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format
 itself documented at
 http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we
 could invent our own, but why re-invent the wheel?).
I haven't had time to look at it yet, but I promise to do so ;). Hauke
Jun 10 2004
parent "Walter" <newshound digitalmars.com> writes:
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:ca9csf$2ujf$1 digitaldaemon.com...
 Arcane Jill wrote:
 I think it would be great if a D standard library had FULL Unicode
support. Even
 C++ and Java don't do that. (And that's not even mentioning Java's
crippled
 16-bit chars). It would effectively turn D into the language of choice
for
 Unicode apps.
I agree - that is my goal as well. In fact I see it as an opportunity to influence the language in its early stages so that it will have standardized(!) Unicode support. It prevents every component developer from implementing his own, which can cause lots of unnecessary bloat (Unicode data isn't small...).
I agree too, and am glad you two are taking the lead on it.
Jun 10 2004
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 Anyway, let me know what you think (and check out UPR, if you have time. The
URL
 is http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/, with the format
 itself documented at
 http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/format.html - or we
 could invent our own, but why re-invent the wheel?).
Ok, I finally got around to looking at it. It seems that UPR simply defines a binary representation that contains the normal Unicode data files in a more organized, easier-to-access way. But the data does not seem to be compressed at all (please correct me if I have missed something). Also, each entry can be 1,2 or 4 bytes in size, but 3 bytes is actually the most size-efficient representation for uncompressed Unicode code points. I fear that UPR doesn't quite cut it for data that is compiled statically into executables. After all, we don't want a "hello world" program to be several megabytes in size, right? That'd only cause people to ignore Unicode even more than they do now. Hauke
Jun 12 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <caep38$1vfr$1 digitaldaemon.com>, Hauke Duden says...
Ok, I finally got around to looking at it. It seems that UPR simply 
defines a binary representation that contains the normal Unicode data 
files in a more organized, easier-to-access way.
Yeah, I stand corrected. The format isn't useful to us. I thought it would be, from reading the blurb, but it's just as easy for us to ignore it. I say we forget UPR then. I've got some ideas, but it's too early in the morning for me right now. Will get back to you later when I've woken up a bit. Jill
Jun 12 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cagrbo$1rlb$1 digitaldaemon.com>, Arcane Jill says...

I've got some ideas, but it's too early in the morning for me right now. Will
get back to you later when I've woken up a bit.
I'm a bit more awake now. The approach that I took when I had to do this sort of thing for my employer some years ago turned out to be wrong, in hindsight. But I learn from my mistakes. Back then, I had a tool to parse the Unicode database files (no problem there) into a custom format binary file, which then got turned into a byte array and stuck into the source code. It was a bad idea because my format was not extendable, and it included only those parts of the database which I actually needed. For this project, we need all of it. I have an approach now which will work, and I've got some tests up and running to prove that it works. It's a two-stage process. In stage one, a function I wrote parses the Unicode database files and produces some HUMUNGOUSLY large binary files, containing every scrap of information there is on Unicode, in an easily accessible form. Then a second phase function comes along and uses those large files as input, creating as its output - D source files. These end up quite small, because the function figures out the best way to pack the data. The source files created declare const lookup tables (using your 12-bit/9-bit split with duplicate tables removed). This approach leaves each Unicode property having its own independent source file(s). Since each source file will become a single .obj file, when you link with the library, you will only get the data for those properties you actually need. If you never call a function to get the bidi-combining-class, for example, then that function, and the data to support it, won't even get linked in. And the beauty of this approach is that it is completely extendable to all Unicode properties in all of their files. Oh, and there's another good thing too. Since the source code writing is automated, it follows that variations of lookup algorithms has just happen automagically. For example, the isASCIIHexDigit() property can be implemented very efficiently with only a tiny amount of data and a slightly modified lookup function. The source-code-writing tool could figure that out and use the smaller data lookup. I haven't started on any actual Unicode /algorithms/ yet - just getting the fast+small property lookups working was quite a challenge. So it's going well. If you want to email me privately to consult, you can head over to dsource and post me a message there. This is going to be fun! Jill
Jun 13 2004
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 In article <cagrbo$1rlb$1 digitaldaemon.com>, Arcane Jill says...
 
 
I've got some ideas, but it's too early in the morning for me right now. Will
get back to you later when I've woken up a bit.
I'm a bit more awake now. The approach that I took when I had to do this sort of thing for my employer some years ago turned out to be wrong, in hindsight. But I learn from my mistakes. Back then, I had a tool to parse the Unicode database files (no problem there) into a custom format binary file, which then got turned into a byte array and stuck into the source code. It was a bad idea because my format was not extendable, and it included only those parts of the database which I actually needed. For this project, we need all of it. I have an approach now which will work, and I've got some tests up and running to prove that it works. It's a two-stage process. In stage one, a function I wrote parses the Unicode database files and produces some HUMUNGOUSLY large binary files, containing every scrap of information there is on Unicode, in an easily accessible form. Then a second phase function comes along and uses those large files as input, creating as its output - D source files. These end up quite small, because the function figures out the best way to pack the data. The source files created declare const lookup tables (using your 12-bit/9-bit split with duplicate tables removed). This approach leaves each Unicode property having its own independent source file(s). Since each source file will become a single .obj file, when you link with the library, you will only get the data for those properties you actually need. If you never call a function to get the bidi-combining-class, for example, then that function, and the data to support it, won't even get linked in. And the beauty of this approach is that it is completely extendable to all Unicode properties in all of their files. Oh, and there's another good thing too. Since the source code writing is automated, it follows that variations of lookup algorithms has just happen automagically. For example, the isASCIIHexDigit() property can be implemented very efficiently with only a tiny amount of data and a slightly modified lookup function. The source-code-writing tool could figure that out and use the smaller data lookup. I haven't started on any actual Unicode /algorithms/ yet - just getting the fast+small property lookups working was quite a challenge. So it's going well. If you want to email me privately to consult, you can head over to dsource and post me a message there. This is going to be fun!
It sounds like you're really excited about this one ;). Your ideas sound good as well, but some comments: - the optimal page size is different for each Unicode property. For example, in the new unichar module uses 128 elements per page for the character types and 512 elements per page for the mapping tables. It would be optimal if the data creation tool would automatically figure out the best size and include the corresponding constants in the source file. A simple brute-force try-them-all approach should suffice, since there really are only about half a dozen realistic page sizes (they need to be a power of 2). - a simple RLE compression of the final data that is compiled into the executable has proven to be very effective, since Unicode data usually contains lots of big gaps and ranges with the same properties. This dramatically reduces the size of the compiled executable. - if you have multiple properties of the same type it can be a huge space saver to use the same "page pool" when decomposing the data into small pages. This worked well in unichar, which now has a single page pool for the lower, upper and title mappings. The combined pool data is only slightly larger than the pool for a single mapping. - I'm not convinced that the lookup tables and algorithms should be created in a completely automatic way. A certain level of automation is obviously necessary, but I think it would pay off if the data can be filtered before it is decomposed into the lookup tables. The algorithms for accessing those tables would have to be adaptable too. Another example from the unichar module: the case mapping data is now stored as offsets relative to the original character index. This has two advantages. Number one is that the biggest offset fits very comfortably into 2 bytes, so we save one byte per element. The second advantage is that this dramatically increases the number of pages with the same contents, so the page pool ends up being a lot smaller. - and my last concern: it seems that you want to develop a very general tool to implement every aspect of the Unicode standard. That is very comendable and nothing is wrong with it in itself, but I would advice you to reflect on the amount of work that is necessary to implement all that stuff. I have no idea how much time you can put into this project, but I know that my own time is unfortunately very limited. If you are in a similar situation it may be wise to tune the goals down a bit and progress in smaller steps, implementing one Unicode algorithm at a time. There is not much use in a full Unicode library that ends up being vaporware. Then again if you DO have the time, please do not let my skepticism dampen your enthusiasm ;). Hauke P.S.: I haven't found any Unicode-related project on dsource.org. What were you referring to when you said I can contact you there?
Jun 13 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <caigii$15jj$1 digitaldaemon.com>, Hauke Duden says...

It would be optimal if the data creation tool would automatically figure 
out the best size and include the corresponding constants in the source 
file.
That's what I figured.
- I'm not convinced that the lookup tables and algorithms should be 
created in a completely automatic way. A certain level of automation is 
obviously necessary, but I think it would pay off if the data can be 
filtered before it is decomposed into the lookup tables.
I'd thought of that.
The algorithms 
for accessing those tables would have to be adaptable too. Another 
example from the unichar module: the case mapping data is now stored as 
offsets relative to the original character index.
Did that too. And I reduced the titlecase mapping down to almost nothing by subtracting it from the uppercase mapping.
- and my last concern: it seems that you want to develop a very general 
tool to implement every aspect of the Unicode standard. That is very 
comendable and nothing is wrong with it in itself, but I would advice 
you to reflect on the amount of work that is necessary to implement all 
that stuff.
Panick ye not. I'm just thinking ahead. For the moment, it's really just property access I'm doing, then come the normalization algorithms. And then I'll stop, and go and redo Ints a bit better. I do have an idea of how much work is involved, but I've done this before (in C++, and less well) so I know what's involved.
There is not much use in a full Unicode library that ends up being 
vaporware.
Well it can't ever be that if it's open source. If you or I get bored with it and drop out, someone else can carry on.
Then again if you DO have the time, please do not let my skepticism 
dampen your enthusiasm ;).
Cool.
P.S.: I haven't found any Unicode-related project on dsource.org. What 
were you referring to when you said I can contact you there?
Ah, no, there isn't any. But I have a user account there, and the Deimos project. My username is "Arcane Jill". It seems to be possible to send private messages to members. I mentioned that as a possiblity because I am reluctant to post my email address on a public forum. Jill
Jun 13 2004
parent Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
- and my last concern: it seems that you want to develop a very general 
tool to implement every aspect of the Unicode standard. That is very 
comendable and nothing is wrong with it in itself, but I would advice 
you to reflect on the amount of work that is necessary to implement all 
that stuff.
Panick ye not. I'm just thinking ahead. For the moment, it's really just property access I'm doing, then come the normalization algorithms. And then I'll stop, and go and redo Ints a bit better. I do have an idea of how much work is involved, but I've done this before (in C++, and less well) so I know what's involved.
Me, panicking? No chance ;). Just wanted to make sure you know about the scope of this.
There is not much use in a full Unicode library that ends up being 
vaporware.
Well it can't ever be that if it's open source. If you or I get bored with it and drop out, someone else can carry on.
With luck someone might. But it is an old story in open source projects that there are lots of initial enthusiasts, but very little people who have enough endurance to stay active over a longer period of time. And there's also the question of experience and skill... I think it is prudent to not count on external help to magically show up. With some projects it does, with many it doesn't. So we should make sure that we don't bite off more than we can chew.
Then again if you DO have the time, please do not let my skepticism 
dampen your enthusiasm ;).
Cool.
P.S.: I haven't found any Unicode-related project on dsource.org. What 
were you referring to when you said I can contact you there?
Ah, no, there isn't any. But I have a user account there, and the Deimos project. My username is "Arcane Jill". It seems to be possible to send private messages to members. I mentioned that as a possiblity because I am reluctant to post my email address on a public forum.
Ah, ok. So the reply address you use with your NG posts (Arcane_member ...) is invalid? Anyway, mine is not, so if you need to contact me... Hauke
Jun 13 2004
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Stewart Gordon wrote:
 Hauke Duden wrote:
 
 Stewart Gordon wrote:
<snip>
 Yes, you do have a point there.  What's more, there isn't a 1:1 
 mapping between uppercase and lowercase characters.
You're wrong. the Unicode standard defines 1:1 case mappings (see http://www.unicode.org/Public/UNIDATA/UCD.html).
There seems to be a contradiction here. That file indicates that UnicodeData.txt only contains 1:1 mappings. But just as I wondered, there's a 2:1 mapping in 03C2 and 03C3.
Where did you get that information? From the data file http://www.unicode.org/Public/UNIDATA/UnicodeData.txt: 03C2;GREEK SMALL LETTER FINAL SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3 03C3;GREEK SMALL LETTER SIGMA;Ll;0;L;;;;;N;;;03A3;;03A3 The interesting entries are the last three. Their format is UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 03A3 and no lower mapping. >> There is also an additional "special casing" with one-to-many mappings
 but only a handful of characters are effected. It would be nice to 
 support that too, but for everyday work the 1:1 mappings are usually 
 sufficient.
So, which characters do the one-to-many mappings bring about?
An example of a character with special casing is 1FB2 (GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI). Its upper case maps to 1FBA + 0399 (GREEK CAPITAL LETTER ALPHA WITH VARIA + GREEK CAPITAL LETTER IOTA).
 And the mappings that there are aren't language independent.
Huh? Casing is not effected by locale. Maybe you are thinking about collation?
What do you mean by that?
Collation is a locale dependent comparison of strings. I.e. it defines the "phone book" ordering of strings in a particular language. Hauke
Jun 09 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Hauke Duden wrote:
<snip>
 The interesting entries are the last three. Their format is 
 UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 
 03A3 and no lower mapping.
So that's why the uppercase form is given twice. I couldn't find a key to the columns anywhere. <snip>
 So, which characters do the one-to-many mappings bring about?
An example of a character with special casing is 1FB2 (GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI). Its upper case maps to 1FBA + 0399 (GREEK CAPITAL LETTER ALPHA WITH VARIA + GREEK CAPITAL LETTER IOTA).
Yes, there are two possible meanings of "one-to-many mapping". A letter splitting into two letters when the case is changed. Or a letter that case-converts to different letters depending on context, like Greek sigma. How does it handle the title case of Welsh digraphs, for example? Or is that another localised exception yet to be written into the standard?
 And the mappings that there are aren't language independent.
Huh? Casing is not effected by locale. Maybe you are thinking about collation?
What do you mean by that?
Collation is a locale dependent comparison of strings. I.e. it defines the "phone book" ordering of strings in a particular language.
I didn't think that had anything to do with the fact that, e.g. in Turkish, the uppercase form of 0069 is 0130 instead of 0049. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 10 2004
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Stewart Gordon wrote:
 Hauke Duden wrote:
 <snip>
 
 The interesting entries are the last three. Their format is 
 UPPER;LOWER;TITLE. So both letters have an upper and title mapping to 
 03A3 and no lower mapping.
So that's why the uppercase form is given twice. I couldn't find a key to the columns anywhere.
The file format is described here: http://www.unicode.org/Public/UNIDATA/UCD.html#UCD_Files (see the section about UnicodeData.txt)
 How does it handle the title case of Welsh digraphs, for example?  Or is 
 that another localised exception yet to be written into the standard?
If you know the index of that character you can look it up in this file: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt Hauke
Jun 10 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Hauke Duden wrote:

<snip>
 The file format is described here:
 http://www.unicode.org/Public/UNIDATA/UCD.html#UCD_Files
 (see the section about UnicodeData.txt)
The first column has been omitted from that list.
 How does it handle the title case of Welsh digraphs, for example?  Or 
 is that another localised exception yet to be written into the standard?
If you know the index of that character you can look it up in this file: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
No, Welsh digraphs isn't a character. They are various single letters each composed of two characters. CH, DD, FF, LL, NG, RH, TH (have I missed one?) are all single letters in Welsh, but AFAICF each doesn't have its own Unicode character, leaving the regular Latin letters in the ranges 0043..0054 and 0063..0074 to be combined to make them. FWIS these digraphs are title-cased together, e.g. properly LLanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch not Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch OST, looking at the Welsh Wikipedia, they generally seem to be written in mixed case. Who is right? Or is it a matter of preference? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 10 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca9gkh$2f2$1 digitaldaemon.com>, Stewart Gordon says...
 How does it handle the title case of Welsh digraphs, for example?  Or 
 is that another localised exception yet to be written into the standard?
I know nothing about Welsh, but, I do know that Welsh is NOT an exception according to the rules of Unicode. Therefore, according to the rules: Lowercase: llan... Titlecase: Llan... Uppercase: LLAN... Now, what I'm about to say may possibly make me a little unpopular. I /hope/ not, but I wish to be accurate, and, well, if what can I say? Please don't shoot the messenger! The fact is, /if/ Unicode has got it wrong, then the place to complain about it is the Unicode Consortium public forum at http://www.unicode.org/consortium/distlist.html - NOT the D forum. Our job is to implement the Unicode standard as it exists today at revision 4.1 - even if we think that standard is wrong. It would be inappropriate for us to start tweaking it here and there just because we don't like bits of it. Errors and omissions in the standard are certainly possible (and even likely), but a standard is a standard, and such errors will inevitably be fixed in the course of time. If and when the standard changes, that's when we should change with it. I mean - it's not like C++ or Java can do any better! My sincere apologies in advance if I've offended anyone. Arcane Jill
Jun 10 2004
parent "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca9if2$5ag$1 digitaldaemon.com...
 Now, what I'm about to say may possibly make me a little unpopular. I
/hope/
 not, but I wish to be accurate, and, well, if what can I say? Please don't
shoot
 the messenger! The fact is, /if/ Unicode has got it wrong, then the place
to
 complain about it is the Unicode Consortium public forum at
 http://www.unicode.org/consortium/distlist.html - NOT the D forum. Our job
is to
 implement the Unicode standard as it exists today at revision 4.1 - even
if we
 think that standard is wrong. It would be inappropriate for us to start
tweaking
 it here and there just because we don't like bits of it. Errors and
omissions in
 the standard are certainly possible (and even likely), but a standard is a
 standard, and such errors will inevitably be fixed in the course of time.
If and
 when the standard changes, that's when we should change with it.
My experience with implementing Standards is that the right way is to shut off one's brain and pedantically, exactly, implement it, right or wrong. All trying to fix bugs in the Standards does is cause "your implementation is different from the Standard, therefore you are wrong" bug reports. And to be frank, they're right. I agree with you, Jill.
Jun 10 2004
prev sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <ca71c4$2b8l$1 digitaldaemon.com>, Arcane Jill says...
In article <ca54is$2h2r$1 digitaldaemon.com>, David L. Davis says...

 sStr[ iStrPos ] + 0x20
Ah! Now these old ASCII habits really should be dropped. Hauke has written this magnificent charToUpper() routine. It should be used.
 I feel like a young Skywalker in training, learning how to best use "The
Force!"
Other than that: Impressive - Obi Won has taught you well. (Hope I'm not too discouraging). :) Jill
Jill: Don't sweat it, all your advice has been encouraging! :) If I wasn't getting any feedback at all from anyone, now that would be "discouraging" in my mind...again thxs for your advice. Afterall, if these functions meet Walter and the "D" forum's approval, they just might become a part of the std.string for everyone to use. After work, I'll check out Hauke's charToLower() function, and see what kind of requirements it has. And if it looks like a good fix, I'll ask Hauke if I may use it...giving him full credit for his work of course. :) David ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Jun 09 2004
prev sibling next sibling parent reply Charlie <Charlie_member pathlink.com> writes:
In article <ca2u66$1lui$1 digitaldaemon.com>, David L. Davis says...
In article <ca2nau$1ath$1 digitaldaemon.com>, Walter says...
It's now possible to do assymmetrical operator overloads with commutative
operators like +.

And it's now possible to create a << stream operator overloading in D. Not
that I endorse such a use of operator overloading for non-arithmetic
purposes, but it's now possible (without doing free operator functions or
needing ADL, either!).

http://www.digitalmars.com/d/changelog.html
Walter: Thxs! For the "Added default arguments to function parameters." :)) Now I can pull out all my wrapper functions...this is some really Great News!! <*Wonders*> To you think Phobos.std.string could get a non-case sensitive version of find (ifind) and rfind (irfind) added to it sometime in the near future? It would be very useful (even if it just does ASCII). Thxs for your reply in advance. :)
Why not tolower() both of the strings ?
Jun 07 2004
next sibling parent How to compare case in Unicode <How_member pathlink.com> writes:
In article <ca3asj$2a3o$1 digitaldaemon.com>, Charlie says...

Why not tolower() both of the strings ?
That won't cover all cases in Unicode, but there is a similar function designed for just that purpose. It's called casefold(). In general, two normalized strings a and b are considered case-insensitively-equal iff casefold(a) == casefold(b). I just had a quick look through Hauke's unichar code and noticed, however, that charToCasefold() is not present? Hauke - any reason why you missed that one out? Did you assume it to be the same thing as charToLower()? It isn't, of course. Jill
Jun 07 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca3asj$2a3o$1 digitaldaemon.com>, Charlie says...

Why not tolower() both of the strings ?
That won't cover all cases in Unicode, but there is a similar function designed for just that purpose. It's called casefold(). In general, two normalized strings a and b are considered case-insensitively-equal iff casefold(a) == casefold(b). I just had a quick look through Hauke's unichar code and noticed, however, that charToCasefold() is not present? Hauke - any reason why you missed that one out? Did you assume it to be the same thing as charToLower()? It isn't, of course. Jill
Jun 07 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
A truly brilliant release. WOW! Thanks Walter.


I just need to reply to this:

<*Wonders*> Do you think Phobos.std.string could get a non-case sensitive
version of find (ifind) and rfind (irfind) added to it sometime in the near
future? It would be very useful (even if it just does ASCII). Thxs for your
reply in advance. :)
Now that we have Hauke's Unichar stuff, we can do better than ASCII, we can do the whole of Unicode. However - as a TEMPORARY MEASURE - we should do case-comparison on a character-by-character basis, just like you do for ASCII. This will get it right for something 99% of all a cases. (To catch the remaining cases we'd need the Unicode normalization and case folding algorithms, which no-one's implemented yet, but which we will have one day). So long as we document that D's case-comparison rules CURRENTLY use simple casing instead of special casing, and do not YET handle normalization issues, no-one is going to complain, and - as you say - the ability to do case-insensitive stuff is very useful. Jill
Jun 07 2004
prev sibling next sibling parent reply "Jeroen van Bemmel" <someone somewhere.com> writes:
Binary operator overloading: for consistency, I would suggest replacing 
opCmp() with opLt(), opLe(), etc and to split opEquals() into opEquals() and 
opNotEquals() (or is the latter a typo in the documentation?) 
Jun 07 2004
parent Ant <duitoolkit yahoo.ca> writes:
On Tue, 08 Jun 2004 08:01:35 +0200, Jeroen van Bemmel wrote:

 Binary operator overloading: for consistency, I would suggest replacing 
 opCmp() with opLt(), opLe(), etc and to split opEquals() into opEquals() and 
 opNotEquals() (or is the latter a typo in the documentation?)
I think this was discussed before. I think the rational is on the web documentation Ant
Jun 07 2004
prev sibling next sibling parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

"Walter" <newshound digitalmars.com> wrote in message =
news:ca2nau$1ath$1 digitaldaemon.com...
 It's now possible to do assymmetrical operator overloads with =
commutative
 operators like +.
=20
 And it's now possible to create a << stream operator overloading in D. =
Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions =
or
 needing ADL, either!).
=20
 http://www.digitalmars.com/d/changelog.html
=20
=20
WOW! Walter you are really making it harder and harder to complain about the language!!! :) Just a little question: a.. The Expression within an array's brackets is now an AssignExpression = (meaning that commas are no longer allowed).=20 Could this mean that rectangular arrays are coming in some future? :)
Jun 08 2004
next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Ivan Senji wrote:

<snip>
 Just a little question:

 (meaning that commas are no longer allowed). /
  
 Could this mean that rectangular arrays are coming in some future? :)
Maybe. It could also clear up what's FWIS a common coding error. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 08 2004
prev sibling parent "Walter" <newshound digitalmars.com> writes:
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I want to at least make rectangular arrays possible with opIndex().
  "Ivan Senji" <ivan.senji public.srce.hr> wrote in message =
news:ca3ps0$3am$3 digitaldaemon.com...
  WOW! Walter you are really making it harder and harder to complain =
about
  the language!!! :)

  Just a little question:
  a.. The Expression within an array's brackets is now an =
AssignExpression (meaning that commas are no longer allowed).=20

  Could this mean that rectangular arrays are coming in some future? :)
Jun 08 2004
prev sibling next sibling parent Hauke Duden <H.NS.Duden gmx.net> writes:
Walter wrote:
 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
 
 http://www.digitalmars.com/d/changelog.html
Default function arguments! Yay! Thanks Walter! I can now delete about 1/3 of my functions :) :) Hauke
Jun 08 2004
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
Kris's dsc.io project did that all along, so what's new? And any particular reason for not inventing a whole new operator or two for stream I/O, as I briefly suggested? http://www.digitalmars.com/drn-bin/wwwnews?D/25096 And have you actually seen my functions to fix bit array slicing? http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313 Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 08 2004
next sibling parent Ant <Ant_member pathlink.com> writes:
In article <ca459b$pbu$1 digitaldaemon.com>, Stewart Gordon says...
Walter wrote:

 It's now possible to do assymmetrical operator overloads with commutative
 operators like +.
 
 And it's now possible to create a << stream operator overloading in D. Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions or
 needing ADL, either!).
Kris's dsc.io project did that all along, so what's new? And any particular reason for not inventing a whole new operator or two for stream I/O, as I briefly suggested? http://www.digitalmars.com/drn-bin/wwwnews?D/25096 And have you actually seen my functions to fix bit array slicing? http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313
Give him a break, he said before he can't cope with every thing that is offered here. Ant
Jun 08 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...
 Walter wrote:

 It's now possible to do assymmetrical operator overloads with
commutative
 operators like +.

 And it's now possible to create a << stream operator overloading in D.
Not
 that I endorse such a use of operator overloading for non-arithmetic
 purposes, but it's now possible (without doing free operator functions
or
 needing ADL, either!).
Kris's dsc.io project did that all along, so what's new? And any particular reason for not inventing a whole new operator or two for stream I/O, as I briefly suggested? http://www.digitalmars.com/drn-bin/wwwnews?D/25096
I still think there's got to be a better way.
 And have you actually seen my functions to fix bit array slicing?

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313
It fixes it by making a copy. I'm not sure this is the right approach, since all the other array slicing points to the original.
Jun 08 2004
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter wrote:

<snip>
 And have you actually seen my functions to fix bit array slicing?

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/313
It fixes it by making a copy. I'm not sure this is the right approach, since all the other array slicing points to the original.
Well, just after I posted it, there was a bit of a debate over this and the alternative: modifying the representation of a bit[] to include a bit offset. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524 But you can't please all of the people (or programs) all of the time. Copying would break generic programming as far as slices would no longer be necessarily into the original array. (Obviously, this discrepancy would need to be clearly documented.) But it wouldn't break any existing, working programs, since bit slicing doesn't work at this time. (Assuming that the GDC crowd haven't created their own fix....) OTOH, code that casts bit arrays to pointers would fall apart if we introduced bit offsets. Maybe we'd need to either disallow such casts or allow them only along with some 'byteAlign' property that returns either the original array (if already byte-aligned) or a copy. Maybe we could start a vote. Put me down as a 'don't know'.... Even if we do take the bit offset path, it wouldn't be tricky to adapt my functions to this representation. Of course, we'd need access to the internals. Maybe (as I think I've seen in some of the internal modules dealing with general arrays) they'd be modified to take a struct representing the internal representation of a bit[], rather than the bit[] itself. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 08 2004
parent reply "Walter" <newshound digitalmars.com> writes:
My thought as well was to include a starting bit offset. The downside to
this is the performance loss.

"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca4sd9$22j1$1 digitaldaemon.com...
 Well, just after I posted it, there was a bit of a debate over this and
 the alternative: modifying the representation of a bit[] to include a
 bit offset.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524

 But you can't please all of the people (or programs) all of the time.
 Copying would break generic programming as far as slices would no longer
 be necessarily into the original array.  (Obviously, this discrepancy
 would need to be clearly documented.)  But it wouldn't break any
 existing, working programs, since bit slicing doesn't work at this time.
   (Assuming that the GDC crowd haven't created their own fix....)

 OTOH, code that casts bit arrays to pointers would fall apart if we
 introduced bit offsets.  Maybe we'd need to either disallow such casts
 or allow them only along with some 'byteAlign' property that returns
 either the original array (if already byte-aligned) or a copy.

 Maybe we could start a vote.  Put me down as a 'don't know'....

 Even if we do take the bit offset path, it wouldn't be tricky to adapt
 my functions to this representation.  Of course, we'd need access to the
 internals.  Maybe (as I think I've seen in some of the internal modules
 dealing with general arrays) they'd be modified to take a struct
 representing the internal representation of a bit[], rather than the
 bit[] itself.

 Stewart.

 -- 
 My e-mail is valid but not my primary mailbox, aside from its being the
 unfortunate victim of intensive mail-bombing at the moment.  Please keep
 replies on the 'group where everyone may benefit.
Jun 08 2004
parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:ca54d7$2gmg$1 digitaldaemon.com...
 My thought as well was to include a starting bit offset. The downside to
 this is the performance loss.
But if bit slicing would work then, then it isn't a loss but a gain!
 "Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
 news:ca4sd9$22j1$1 digitaldaemon.com...
 Well, just after I posted it, there was a bit of a debate over this and
 the alternative: modifying the representation of a bit[] to include a
 bit offset.

 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/2524

 But you can't please all of the people (or programs) all of the time.
 Copying would break generic programming as far as slices would no longer
 be necessarily into the original array.  (Obviously, this discrepancy
 would need to be clearly documented.)  But it wouldn't break any
 existing, working programs, since bit slicing doesn't work at this time.
   (Assuming that the GDC crowd haven't created their own fix....)

 OTOH, code that casts bit arrays to pointers would fall apart if we
 introduced bit offsets.  Maybe we'd need to either disallow such casts
 or allow them only along with some 'byteAlign' property that returns
 either the original array (if already byte-aligned) or a copy.

 Maybe we could start a vote.  Put me down as a 'don't know'....

 Even if we do take the bit offset path, it wouldn't be tricky to adapt
 my functions to this representation.  Of course, we'd need access to the
 internals.  Maybe (as I think I've seen in some of the internal modules
 dealing with general arrays) they'd be modified to take a struct
 representing the internal representation of a bit[], rather than the
 bit[] itself.

 Stewart.

 --
 My e-mail is valid but not my primary mailbox, aside from its being the
 unfortunate victim of intensive mail-bombing at the moment.  Please keep
 replies on the 'group where everyone may benefit.
Jun 08 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca55jc$2ika$1 digitaldaemon.com>, Ivan Senji says...
"Walter" <newshound digitalmars.com> wrote in message
news:ca54d7$2gmg$1 digitaldaemon.com...
 My thought as well was to include a starting bit offset. The downside to
 this is the performance loss.
But if bit slicing would work then, then it isn't a loss but a gain!
I don't think Walter will have any problem getting bit slicing to work. Both Stewart and myself have our own workarounds (his by copy, mine by reference), so it's obviously easily doable. The bit SLICE (or bit array, depending on your point of view) was never a problem (apart from the bugs). The bit /itself/ is the problem. Walter's suggestion will make bit slicing work, but the code below will still fall over:
       bit[] b;
       b.length = 64;
       bit* p = &b[3];
       *p = 1;
Anywhere where you get a pointer to a bit, or a reference to a bit (and this includes passing a bit as an out or inout function parameter) you get a problem. However, if Walter could make all of these situations compile-errors, he may have got it sussed! Jill
Jun 08 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 The bit SLICE (or bit array, depending on your point of view) was never a
 problem (apart from the bugs). The bit /itself/ is the problem. Walter's
 suggestion will make bit slicing work, but the code below will still fall over:
 
 
      bit[] b;
      b.length = 64;
      bit* p = &b[3];
      *p = 1;
Anywhere where you get a pointer to a bit, or a reference to a bit (and this includes passing a bit as an out or inout function parameter) you get a problem.
I thought that inout bits were already not supported. Unless that's only in foreach....
 However, if Walter could make all of these situations compile-errors, he may
 have got it sussed!
Or have a bit offset in the bit pointer itself. Which would turn it into a 35-bit object.... Of course, it could be 64-bit, in the form (byteAddress << 32) | (bitOffset << 29), which would make incrementing it a doddle.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 09 2004
parent reply "Walter" <newshound digitalmars.com> writes:
Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.
Jun 09 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca7qp5$hrl$2 digitaldaemon.com>, Walter says...
Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.
That's EXACTLY what my workaround does. You can have the code for free if you want. Jill
Jun 09 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca7ri6$iuo$1 digitaldaemon.com...
 In article <ca7qp5$hrl$2 digitaldaemon.com>, Walter says...
Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.
That's EXACTLY what my workaround does. You can have the code for free if
you
 want.
Thanks!
Jun 10 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cabhl0$7f2$2 digitaldaemon.com>, Walter says...
 That's EXACTLY what my workaround does. You can have the code for free if
you
 want.
Thanks!
Oky doke - here goes. One thing though - this is merely a workaround for existing bugs, it does not really add any new functionality beyond what such arrays are supposed to do already. So I don't imagine you will use this code. You'd probably prefer to just fix the bugs, then a workaround won't be needed at all.
module etc.workaround.bitslice;

/*    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    This is a workaround for the bug whereby:

        // given a non-null bit[] b;
        b[i..j]

    references the wrong data, often causing an access violation.

    Usage:    Replace                With
            --------------------------------------------------
            b[i..j]                bitSlice(b, i, j)
            b[] = expr            bitSliceAssign(b, expr)
            b[i..j] = expr;        bitSliceAssign(b, i, j, expr);
            b ~ c                bitSliceCat(b, c);
            b ~= c                bitSliceCatAssign(b, c);
*/

//===============================================
// This version is for reading from a bit slice
//
unittest
{
    bit b[256];
    b[0] = b[16] = 1;
    confirm(bitSlice(b,0,16) == bitSlice(b,16,32));
}

bit[] bitSlice(bit[] b, uint i, uint j)
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion u;

        // Convert from a bit slice to a ubyte slice
        u.bitRef = b;
        assert((u.length & 7) == 0);
        u.length >>= 3;

        // Take the desired slice
        u.ubyteRef = u.ubyteRef[i>>3..j>>3];

        // Convert it back to a bit slice
        u.length <<= 3;
        return u.bitRef;
    }
    else
    {
        return b[i..j];    // Assumes bit slicing works. This will be unit
tested.
    }
}

//===================================================================
// This version is for writing to a bit slice with a constant value
//

unittest
{
    bit b[256];
    bitSliceAssign(b,16,32,1);
    confirm(b[16] == 1);
}

bit[] bitSliceAssign(bit[] b, bit e)
{
    return bitSliceAssign(b, 0, b.length, e);
}

bit[] bitSliceAssign(bit[] b, uint i, uint j, bit e)
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion u;

        // Convert from a bit slice to a ubyte slice
        u.bitRef = b;
        assert((u.length & 7) == 0);
        u.length >>= 3;

        // Write into the desired slice
        u.ubyteRef[i>>3..j>>3] = (e ? 0xFF : 0);

        // Convert back to a bit slice
        u.length <<= 3;
        return u.bitRef;
    }
    else
    {
        return b[i..j] = e;    // Assumes bit slicing works. This will be unit
tested.
    }
}

//=========================================================
// This version is for pasting one bit slice into another
//

unittest
{
    bit b[256];
    bit e[16];
    e[0] = 1;
    bitSliceAssign(b, 16, 32, bitSlice(e, 0, 16));
    confirm(b[16] == 1);
}

bit[] bitSliceAssign(bit[] b, bit[] e)
{
    return bitSliceAssign(b, 0, b.length, e);
}

bit[] bitSliceAssign(bit[] b, uint i, uint j, bit[] e)
in
{
    assert(j - i == e.length);
}
body
{
    version(BitSliceWorkaround)
    {
        if (((i | j) & 7) != 0) throw new BitSliceException("Can only slice by
whole bytes");
        BitSliceUnion ub, ue;

        // Convert from bit slices to ubyte slices
        ub.bitRef = b;
        assert((ub.length & 7) == 0);
        ub.length >>= 3;

        ue.bitRef = e;
        assert((ue.length & 7) == 0);
        ue.length >>= 3;

        // Write the desired slice
        ub.ubyteRef[i>>3..j>>3] = ue.ubyteRef[0..(j-i)>>3];

        // Convert everything back to bit slices
        ub.length <<= 3;
        ue.length <<= 3;
        return ub.bitRef;
    }
    else
    {
        return b[i..j] = e[0..j-i];    // Assumes bit slicing works. This will
be unit tested.
    }
}

//=============================================================
// This version is for concatenating one bit slice onto another
//

bit[] bitSliceCat(bit[] b, bit[] c)
{
    bit[] r;
    r.length = b.length + c.length;
    bitSliceAssign(r, 0, b.length, b);
    bitSliceAssign(r, b.length, r.length, c);
    return r;
}

//=============================================================
// This version is for concatenating one bit slice onto another and assigning
the result back
// onto the original
//

bit[] bitSliceCatAssign(inout bit[] b, bit[] c)
{
    uint bLen = b.length;
    b.length = bLen + c.length;
    bitSliceAssign(b, bLen, b.length, c);
    return b;
}

// Supporting stuff

private union BitSliceUnion
{
    bit[]    bitRef;
    ubyte[] ubyteRef;
    uint    length;
}

class BitSliceException : Exception
{
    this(char[] s)
    {
        super(s);
    }
}

void confirm(int assertion)
{
    debug
    {
        if (!assertion)
        {
            printf("This version of the D compiler contains a bug which
prevents\n");
            printf("slicing of bit arrays from working properly\n\n");
            printf("You need to define the symbol BitSliceWorkaround, and
replace\n");
            printf("all bit array slicing operations with the appropriate
function\n");
            printf("from etc.workaround.bitslice\n");
        }
    }
    assert(assertion);
}
Jun 11 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <cad0gf$29ad$1 digitaldaemon.com>, Arcane Jill says...
In article <cabhl0$7f2$2 digitaldaemon.com>, Walter says...
 That's EXACTLY what my workaround does. You can have the code for free if
you
 want.
Thanks!
Oky doke - here goes. One thing though - this is merely a workaround for existing bugs, it does not really add any new functionality beyond what such arrays are supposed to do already. So I don't imagine you will use this code. You'd probably prefer to just fix the bugs, then a workaround won't be needed at all.
Very nice. This brings up a question though... assuming I want to know how many bits are in a byte, should I use the standard C defines or will D provide its own method? I know it will probably be quite a long time before D is ported to a system that doesn't use 8-bit bytes, but I'm the careful type :) Sean
Jun 11 2004
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Arcane Jill wrote:

<snip>
 Oky doke - here goes. One thing though - this is merely a workaround for
 existing bugs, it does not really add any new functionality beyond what such
 arrays are supposed to do already. So I don't imagine you will use this code.
 You'd probably prefer to just fix the bugs, then a workaround won't be needed
at
 all.
<snip> Just as I've been writing the bit offset implementation that we've been talking about.... http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/495 Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 14 2004
parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <cak50j$n70$1 digitaldaemon.com>, Stewart Gordon says...
Just as I've been writing the bit offset implementation that we've been 
talking about....
Excellent!
Jun 14 2004
prev sibling next sibling parent J Anderson <REMOVEanderson badmama.com.au> writes:
Walter wrote:

Another option is to only allow bit slicing on byte boundaries, and only
allow pointers to bits if they are in bit 0 of a byte.
  
Yeah I've been thinking this is probably the best option. Users could make there own bit-pointers for the extra 3 bits. Parhaps you could enable something like (for the boundary approach): bit [] array; ... bit * bp = &array[0]; bp[1] = 1; //Access bit 1 in bp pointer 1 bp[1000] //Try to access bit 1000 A third option I was thinking about: Use an 32-bits to always offset the bit array from the start of the bit array (thus requiring 64-bits). I'm sure that that would enable some parts of the algorithm to be optimised, such as interating though the loop and slicing, only one value out of the two would need to be incremented. When converting to another pointer type (such as void) compute the byte boundary, and lose the bit location information <- that could also be done with stewards suggestion. Now if the user wrote something like: byte * bp = cast(byte*) &array[0]; Then the compiler could optimise out the extra 32-bits. Rational: We don't have any bit pointer at the moment so we have no performace to lose. This way a bit pointer is more functional and you still can slice along the boundary. -- -Anderson: http://badmama.com.au/~anderson/
Jun 09 2004
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter wrote:

 Another option is to only allow bit slicing on byte boundaries, and only 
 allow pointers to bits if they are in bit 0 of a byte.
Yes, that was another suggestion in the debate. But I'm inclined to believe some of my experiments could be put to practical use. I'll probably do some more experimenting over the weekend.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 10 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <ca4pg0$1tfs$1 digitaldaemon.com>, Walter says...
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...

 And any particular reason for not inventing a whole new operator or two
 for stream I/O, as I briefly suggested?

 http://www.digitalmars.com/drn-bin/wwwnews?D/25096
I still think there's got to be a better way.
Me too, but darned if I've come up with the answer so far. But if push comes to shove I do prefer: ostream << a << b << c; to: print( ostream, a ); print( ostream, b ); print( ostream, c ); I'm going to play around with the possibilities in the next few days and see if I can't come up with an alternative :p Sean
Jun 08 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
I'l be watching for your alternative Sean ...

"Sean Kelly" <sean f4.ca> wrote in message
news:ca4sn3$231k$1 digitaldaemon.com...
 In article <ca4pg0$1tfs$1 digitaldaemon.com>, Walter says...
"Stewart Gordon" <smjg_1998 yahoo.com> wrote in message
news:ca459b$pbu$1 digitaldaemon.com...

 And any particular reason for not inventing a whole new operator or two
 for stream I/O, as I briefly suggested?

 http://www.digitalmars.com/drn-bin/wwwnews?D/25096
I still think there's got to be a better way.
Me too, but darned if I've come up with the answer so far. But if push
comes to
 shove I do prefer:

 ostream << a << b << c;

 to:

 print( ostream, a );
 print( ostream, b );
 print( ostream, c );

 I'm going to play around with the possibilities in the next few days and
see if
 I can't come up with an alternative :p


 Sean
Jun 08 2004