digitalmars.D - Probabilistic search

digitalmars.D - Probabilistic search - psearch.tar.gz

sundaresh (57/57) Apr 04 2006 This (attached) search function of an unordered array of elements appear...

Wang Zhen (5/9) Apr 04 2006 I believe the time complexity of this algorithm is no better than a
sundaresh (17/17) Apr 04 2006 I was told to post here by walter. I did think of posting it in the C ne...

Oskar Linde (9/30) Apr 04 2006 I'm not sure if this is intended as a joke or not, but I am sorry to

Sean Kelly (7/12) Apr 04 2006 Perhaps, but some algorithms have a better average case than others

Andrew Fedoniouk (11/22) Apr 04 2006 Take your time, no rush, it is impossible anyway.

Sean Kelly (4/8) Apr 04 2006 Oops... you're right, of course. That's what I get for reading these
John Demme (4/35) Apr 04 2006 Come now... I'm sure he's talking about the average case, not the worst

John Demme (7/45) Apr 04 2006 Oops... Didn't read Sean's post.... Searching an unordered set doesn't
Sean Kelly (5/37) Apr 04 2006 I don't think it matters. No matter the algorithm, you must examine

Rioshin an'Harthen (17/55) Apr 04 2006 You have to examine every element of the search string? How come I haven...

Andrew Fedoniouk (13/70) Apr 04 2006 Lets assume that here unordered set (string) has flat distribution of

BCS (6/22) Apr 04 2006 That is true if you are using N for the size of your alphabet

Sean Kelly (5/8) Apr 04 2006 Potentially, yes. Unless the algorithm has notably unique best-case

sundaresh (20/50) Apr 04 2006 FYI, there is absolutely nothing wrong with that statement. It is

BCS (8/32) Apr 04 2006 I haven't looked at the code because I don't have a .tar.gz loaded so...
Georg Wrede (28/98) Apr 04 2006 Point 1: we do have to allow for the different expression of content and...
kwilson nowhere.com (35/35) Apr 04 2006 Hi there Sandresh,

sundaresh (9/44) Apr 05 2006 I believe that I have replied to this already. My first suspisions was w...

kris (180/188) Apr 05 2006 Jolly good! Capital stuff, what!
Rioshin an'Harthen (130/142) Apr 05 2006 That's just it. You don't count how many comparisons it takes to find a

sundaresh (37/72) Apr 05 2006 Sorry about that, you were right, for all but the base case, the value o...

Nic Tiger (11/46) Apr 05 2006 with
Nic Tiger (69/69) Apr 05 2006 this is your latest code, with proper comparisons counting and some

sundaresh (3/38) Apr 05 2006 My apologies, making these changes does result in a linear worst case

pragma (24/25) Apr 04 2006 Well, as you say, "the proof [...] is in the eating". You, my good man,...

Hasan Aljudy (8/12) Apr 04 2006 It doesn't compile, on VC++ 2005 I get:
Hasan Aljudy (95/104) Apr 05 2006 ok, I just wrote a D version of the algorithm, and used the CPU counter

Sean Kelly (14/55) Apr 05 2006 Thanks for posting this. I hadn't taken the time to look at the

Hasan Aljudy (17/78) Apr 05 2006 I think I have a mistake in the last bit of it ..

sundaresh <sundaresh_member pathlink.com> writes:

This (attached) search function of an unordered array of elements appears
to have a logarithmic search time, like binary search of an ordered array.


begin 0644 psearch.tar.gz
M'XL(`%RG)D0``^U9?X\;MQ&]?\5/09\30+I;ZR3=#Q>QW?;BNO`!KA/D+D71
M.C"H7>Z)\&JY67)U5 I_][X9<E>KR]4.&MAN`A&XDT0.A\/'F3>SR\II5:>+

MJJ7<JZWU[Y/[T/CMS?U&VGU3ID63Z<?.9X69CQ=_%,*47J9V6:G:.%NZ1T*L

M,<`A>).R9*8KOY!/Y`3K#DP^).WRR1,Y9<%!K7U3E]$^/0Q6Q*5&\D_RY?<O

M19^:NV&MRFP( [\,0SV,('!XR);3`*SF:<'B`>N'1(L?V28/V19YT&+&:SZ(
M!L:^#L$6.U).VN])UAGUW[E`\DM4`1[^ZS#ZWU7
M#Z/OQ.>.I]]:BX<S7GS$-3[`_Y,3<'['_[-CXO^SR8[_/TF[;_(RT[G\]O+9
M^7=/G[]^+N[CIREUKT<<'<BKA7%2%=>V-GZQE$U9:$<=M5;9&H%:K9GQ=`:Z
M=^!QZ1=:5LV\`*MG=JE,.1:(^O/&+VP-?KY$R"*TW4+^79?-M:U40>,:D 6&
M73O\9^H8 PAH].!("/W6:\3\)\Y1T']?EYG)?V\$T\:_?_M>[_Y5[0/Q?S8[
M/>[B_P2%'U6$T]DN_C]%$U>URHPWME1%L>8X0'BI0 :_,.7U)NH=TO1*R[G6
MJ+^4U]3]D\ZD<HAVB^2NJ KQKN:%EMX*54I;9[J&A-->VKQ-[.`-%`9W3$$_
MF.6_3!H+<05.65K'D;^T98^/<I!*S^+;:HQC/BK`:ZH51+FQ,.F"QI1P^L<&
MBQC:>$J&YSQ!U;5:LW+ZE6EG2&.T:!Q)L:IJJX*BTGJYTO5:ZCPWJ2$A\4U)
M] "X6EU#X]PVH0SM+7!CBB) "PSFL-/:-UBGJ7AIL'%&N[K#A$28H"?^QF22

M!IC!$8H"8VF!T\RP"3!ZU(NU;05745D&)G<P+!%S;-X3:DN-\\_(.)Z2-V5*
M3DC5/WB6*=G9_M9<'R!3.EW[8-)2*]3?-A<DO*4L:3>>FQI.4Q4JU6/Y-2`$
MC`F-U)J/O\!22KHEMB(M#FJ
M<+^PH]8/QO*E]=0-8ND.#IZ>U>IFKM(W+F$KQ<9=:;)<&0X"BHS*5DT!/ZUU
M:IN:,ACUPD!=&]N0O1RCW 0\MKP:^A`HWX1]A[-:8';2BXYPU+?"H]8%*RP`





MX`!NK(1K*"*O8A5CJNP%5AF'.T/\. EZ5;F6U_"W4FRHJ[#E-2]L>]Q8&#XC
MJ0TC%8F95 YKTG[Z5K,OIP ;=F;!^07V0J<FDO[9?'87/$4&/>G"FI39,&`;
MB+H]D>C41";&"7H;8)>4`3%7`V*L7L,3X)MQCY'S79.F(,"\*?J4"WA"("A.
M83B \(0\&W'0"48QM\'7>N<5 JS-F-M>NQW:`<#M%/HM>QKF-DZ[ `:%6T?G

M46`\X24)*HDPHF3DF5*RIB8<X3!IR!EC\;W3G:%=LEVJ-WQVE`<H. .B*8P%
MZ\:%M\#N`9V0>K Q!YC(R$%5ML(T6`"(_HK]V9OH2X0]\P"X/Y0H&_>2A`<C
M&Q "1K;1!S2>CC_7PXH0 \M0`;A>(,$Z9M>.)\G)23T'!C"QM$^.%&358`SY
M>UOMB/B^"[Z(FADH^+HIY8080N$X/+E?2LB$G'%O$M+/C7$ZQ%:7X\-K&R<&
M*BY:MR'?< %T&DY+X1U2T,AO]GHZ+SI22-E8XJG!"AAS;/1S$==7E(NH:KV5
M-HS72[>IN&*5B3)M1=56KDLNCD->KJVW?EUICESX6JR+8L*AJH&XO'VCTO$S
MX!+S+K18?HW$W ^Q));$=_E40`*36LTIU7!.D:"`3=>U" 6PCR34NA7]'J>C
M-ET&IR>V. ^AK.4-%9^V*3(JJG`&!)I!O_'B"YFF\F"<RM:RDF,81PY[OI#C
M(S6F2C:9XT1DS!U4EI,T523CVZ\,>`_B9Z\%DGC8J(FX?.:GBSO>*XQ_;T_>

MRE=$;<]^QW;[HGMS^)>+RZO7?SO_AQQ.)Y-1UWWUW<7YB]`_FYS\8;29<'7^
M]8MGKR\O_OE,#D^GLQ'?0'$M\:_-T`^/-A,N7EY%<4II-A]B`N4]FF?:JZ&M
MI#FYE42W;Y,.2`%29RLL[SVYW3=])-Z%!8BUAJ/N2BDS5.AXI)V"DBC2GVN6
M?)L$3A[2*%]FL9Q\W($3. X/V]L2QY=+!/"0$MU('K)`N/M 5=`=--&7QSW0
MN&>C:3`(T*'S!TP(EU9!S3O^7Z'$\OEP_U49C/I*?IGA.V_AE<>T5YZO6/:3
MO G85;B5:^UA^6!1^/IX<\2QJV]4,'_[&JVWARBU?:<V:;N[E!8*S,V\I/.%

=&T18X>Z&:M=V;==V;==V;=<^?_L/_(LS)0`H``#R
`
end

Apr 04 2006

Wang Zhen <nehzgnaw gmail.com> writes:

sundaresh wrote:
 This (attached) search function of an unordered array of elements appears
 to have a logarithmic search time, like binary search of an ordered array.
 
 

I believe the time complexity of this algorithm is no better than a 
simple linear search. Would you care to prove your claim on the O(log n) 
complexity?

BTW, this newsgroup is for the D programming language.

Apr 04 2006

sundaresh <sundaresh_member pathlink.com> writes:

I was told to post here by walter. I did think of posting it in the C newgroup
but felt otherwise, since D is apparently upward compatible with C, or the
specs in the site told me, so I presumed that every C program was a D program.
As far as proof goes, I have included a test program. The results clearly
indicate that for an array of size 512, of random numbers, with 2048 trials
of different distibutions, and for each trial upto 2048 random selections
of elements in the array to lookup, the number of comparisons is a constant
at 9 or lg(512). Please run the test program. If it is a bug and you can point
it out, I would be indebted. But that was why I posted it here. 
If it is'nt a bug, I was hoping it would be my contribution.

I was hoping for an enlightened response.

You can lie all you want, but that does'nt change the truth.
The sun does rise in the east. There are only 4 seasons in a year, and
the value of pie is a fixed.

I would encourage users to please try it out and respond. The proof of the
pudding is in the eating.

Not many experts here walter.I am dissapointed.

Apr 04 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

sundaresh skrev:
 I was told to post here by walter. I did think of posting it in the C newgroup
 but felt otherwise, since D is apparently upward compatible with C, or the
 specs in the site told me, so I presumed that every C program was a D program.
 As far as proof goes, I have included a test program. The results clearly
 indicate that for an array of size 512, of random numbers, with 2048 trials
 of different distibutions, and for each trial upto 2048 random selections
 of elements in the array to lookup, the number of comparisons is a constant
 at 9 or lg(512). Please run the test program. If it is a bug and you can point
 it out, I would be indebted. But that was why I posted it here. 
 If it is'nt a bug, I was hoping it would be my contribution.
 
 I was hoping for an enlightened response.
 
 You can lie all you want, but that does'nt change the truth.
 The sun does rise in the east. There are only 4 seasons in a year, and
 the value of pie is a fixed.
 
 I would encourage users to please try it out and respond. The proof of the
 pudding is in the eating.
 
 Not many experts here walter.I am dissapointed.

I'm not sure if this is intended as a joke or not, but I am sorry to 
tell that your code is not logarithmic. It is easy to prove that there 
are no better algorithm than O(n) for searching an unordered array on a 
deterministic single thread. I guess the error in your code is the 
computation of the number of comparisons:

comparisons = ++depth;

Cheers,

/Oskar

Apr 04 2006

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 
 I'm not sure if this is intended as a joke or not, but I am sorry to 
 tell that your code is not logarithmic. It is easy to prove that there 
 are no better algorithm than O(n) for searching an unordered array on a 
 deterministic single thread.

Perhaps, but some algorithms have a better average case than others 
assuming certain (typical) conditions.  If there were simply no point, 
why all the bother back in the 60s?

By the way, as I just woke up, I haven't had a chance to look at the 
algorithm yet.


Sean

Apr 04 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to tell 
 that your code is not logarithmic. It is easy to prove that there are no 
 better algorithm than O(n) for searching an unordered array on a 
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others 
 assuming certain (typical) conditions.  If there were simply no point, why 
 all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the 
 algorithm yet.

Take your time, no rush, it is impossible anyway.
To find something in unordered set requires N comparison operations at 
least. Always.

As a morning tea reading I suggest to read US Patent 5,533,051 on 
compression of random data.
It claims that it will compress two arbitrary bits into one bit.

http://gailly.net/05533051.html


Andrew Fedoniouk.
http://terrainformatica.com

Apr 04 2006

Sean Kelly <sean f4.ca> writes:

Andrew Fedoniouk wrote:
 
 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at 
 least. Always.

Oops... you're right, of course.  That's what I get for reading these 
forums before having my coffee ;-)


Sean

Apr 04 2006

John Demme <me teqdruid.com> writes:

Andrew Fedoniouk wrote:

 
 "Sean Kelly" <sean f4.ca> wrote in message
 news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to
 tell that your code is not logarithmic. It is easy to prove that there
 are no better algorithm than O(n) for searching an unordered array on a
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others
 assuming certain (typical) conditions.  If there were simply no point,
 why all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the
 algorithm yet.

 
 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at
 least. Always.
 
 As a morning tea reading I suggest to read US Patent 5,533,051 on
 compression of random data.
 It claims that it will compress two arbitrary bits into one bit.
 
 http://gailly.net/05533051.html
 
 
 Andrew Fedoniouk.
 http://terrainformatica.com

Come now... I'm sure he's talking about the average case, not the worst
case.

~John Demme

Apr 04 2006

John Demme <me teqdruid.com> writes:

John Demme wrote:

 Andrew Fedoniouk wrote:
 
 
 "Sean Kelly" <sean f4.ca> wrote in message
 news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to
 tell that your code is not logarithmic. It is easy to prove that there
 are no better algorithm than O(n) for searching an unordered array on a
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others
 assuming certain (typical) conditions.  If there were simply no point,
 why all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the
 algorithm yet.

 
 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at
 least. Always.
 
 As a morning tea reading I suggest to read US Patent 5,533,051 on
 compression of random data.
 It claims that it will compress two arbitrary bits into one bit.
 
 http://gailly.net/05533051.html
 
 
 Andrew Fedoniouk.
 http://terrainformatica.com

 
 Come now... I'm sure he's talking about the average case, not the worst
 case.
 
 ~John Demme

Oops... Didn't read Sean's post.... Searching an unordered set doesn't
always take n time... If the search terminates after finding the first
entry, it only can only take n in the worse case.  Given that, the average
case can be quite good.  I suspect, however, that given truely random data,
it's tough to beat n/2.

~John Demme

Apr 04 2006

Sean Kelly <sean f4.ca> writes:

John Demme wrote:
 Andrew Fedoniouk wrote:
 
 "Sean Kelly" <sean f4.ca> wrote in message
 news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to
 tell that your code is not logarithmic. It is easy to prove that there
 are no better algorithm than O(n) for searching an unordered array on a
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others
 assuming certain (typical) conditions.  If there were simply no point,
 why all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the
 algorithm yet.

 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at
 least. Always.

 As a morning tea reading I suggest to read US Patent 5,533,051 on
 compression of random data.
 It claims that it will compress two arbitrary bits into one bit.

 http://gailly.net/05533051.html


 Andrew Fedoniouk.
 http://terrainformatica.com

 
 Come now... I'm sure he's talking about the average case, not the worst
 case.

I don't think it matters.  No matter the algorithm, you must examine 
every element of the search string.  It's just that typical searches are 
actually worse than O(N), as they often examine each element more than once.


Sean

Apr 04 2006

"Rioshin an'Harthen" <rharth75 hotmail.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:e0ud2e$1tvn$1 digitaldaemon.com...
 John Demme wrote:
 Andrew Fedoniouk wrote:

 "Sean Kelly" <sean f4.ca> wrote in message
 news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to
 tell that your code is not logarithmic. It is easy to prove that there
 are no better algorithm than O(n) for searching an unordered array on 
 a
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others
 assuming certain (typical) conditions.  If there were simply no point,
 why all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the
 algorithm yet.

 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at
 least. Always.

 As a morning tea reading I suggest to read US Patent 5,533,051 on
 compression of random data.
 It claims that it will compress two arbitrary bits into one bit.

 http://gailly.net/05533051.html


 Andrew Fedoniouk.
 http://terrainformatica.com

 Come now... I'm sure he's talking about the average case, not the worst
 case.

 I don't think it matters.  No matter the algorithm, you must examine every 
 element of the search string.  It's just that typical searches are 
 actually worse than O(N), as they often examine each element more than 
 once.

You have to examine every element of the search string? How come I haven't 
heard that? :)

Try (approximately - my C(++) experience shows too much):

char *needleInHaystack(char *haystack, char needle)
{
    if( !haystack )
        return null;

    for( char *loc = haystack; *loc != '\0'; loc++ )
        if( *loc == needle )
            return loc;

    return null;
}

This definitely does not examine every element of the string, except in the 
worst-case scenario - just enough to find the first occurence, no matter 
what the string is or what character the needle is.

Apr 04 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Rioshin an'Harthen" <rharth75 hotmail.com> wrote in message 
news:e0ugi0$229d$1 digitaldaemon.com...
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:e0ud2e$1tvn$1 digitaldaemon.com...
 John Demme wrote:
 Andrew Fedoniouk wrote:

 "Sean Kelly" <sean f4.ca> wrote in message
 news:e0u9gl$1ppd$1 digitaldaemon.com...
 Oskar Linde wrote:
 I'm not sure if this is intended as a joke or not, but I am sorry to
 tell that your code is not logarithmic. It is easy to prove that 
 there
 are no better algorithm than O(n) for searching an unordered array on 
 a
 deterministic single thread.

 Perhaps, but some algorithms have a better average case than others
 assuming certain (typical) conditions.  If there were simply no point,
 why all the bother back in the 60s?

 By the way, as I just woke up, I haven't had a chance to look at the
 algorithm yet.

 Take your time, no rush, it is impossible anyway.
 To find something in unordered set requires N comparison operations at
 least. Always.

 As a morning tea reading I suggest to read US Patent 5,533,051 on
 compression of random data.
 It claims that it will compress two arbitrary bits into one bit.

 http://gailly.net/05533051.html


 Andrew Fedoniouk.
 http://terrainformatica.com

 Come now... I'm sure he's talking about the average case, not the worst
 case.

 I don't think it matters.  No matter the algorithm, you must examine 
 every element of the search string.  It's just that typical searches are 
 actually worse than O(N), as they often examine each element more than 
 once.

 You have to examine every element of the search string? How come I haven't 
 heard that? :)

 Try (approximately - my C(++) experience shows too much):

 char *needleInHaystack(char *haystack, char needle)
 {
    if( !haystack )
        return null;

    for( char *loc = haystack; *loc != '\0'; loc++ )
        if( *loc == needle )
            return loc;

    return null;
 }

 This definitely does not examine every element of the string, except in 
 the worst-case scenario - just enough to find the first occurence, no 
 matter what the string is or what character the needle is.

Lets assume that here unordered set (string) has flat distribution of 
element(chars) occurences.

You can cry, pray, read C++ books, whatever you like, but it will take you 
N/2 comparisons average
and in the worst case N comparisons to find first occurence of something 
there.

The only thing which affect N/2 average is the quality of your random 
generator (flatness of distribution).
And this quality is usual reason of "wonders" in such discoveries.

Andrew Fedoniouk.
http://terrainformatica.com

Apr 04 2006

BCS <BCS_member pathlink.com> writes:

Andrew Fedoniouk wrote:
 
 Lets assume that here unordered set (string) has flat distribution of 
 element(chars) occurences.
 
 You can cry, pray, read C++ books, whatever you like, but it will take you 
 N/2 comparisons average
 and in the worst case N comparisons to find first occurence of something 
 there.
 
 The only thing which affect N/2 average is the quality of your random 
 generator (flatness of distribution).
 And this quality is usual reason of "wonders" in such discoveries.
 
 Andrew Fedoniouk.
 http://terrainformatica.com
 

That is true if you are using N for the size of your alphabet

consider:

B is a string of random bits with even distribution of 1 or 0.

Find the first 1 bit.

mean search length is something like 1-2

Apr 04 2006

Sean Kelly <sean f4.ca> writes:

Rioshin an'Harthen wrote:
 
 You have to examine every element of the search string? How come I haven't 
 heard that? :)

Potentially, yes.  Unless the algorithm has notably unique best-case 
behavior (such as quicksort), there's really no point in considering 
anything else.


Sean

Apr 04 2006

sundaresh <sundaresh_member pathlink.com> writes:

FYI, there is absolutely nothing wrong with that statement. It is
certainly not a bug, if there is one, and I am not ruling out that
possibility.comparisons is never decremented or decreased, so how can it be
reduced to the value of 9, if it increases above 9, as you claim/assume
that it supposedly does. The value of comparisons is set prior to any/all
recursive calls to psearch. If on the other hand, you claim that it is 
possibly the characteristic of the random number generator, since it makes
the decision of which half to search for first, but then again, that decision
is binary and I think that any random number generator should generate as
many even numbers as odd numbers, at least for a reasonably sized sample.

I am not one to contest theory, and if this algorithm is correct, then
there probably is a very valid and sound theoretical explanation as to
why it works, but I am not one to reject evidence either, in the face of it,
especially of such an astounding nature. It is well acknowledged that
empirical testing of methods lead to a convincing degree of assurance in them
and is a reasonable way to assess their reliability.

My apologies if I have irked you. 

To have an open mind, while looking at it is all I asked for.

The reason I posted it here was because I was not sure myself.



In article <e0u7jg$1n3i$1 digitaldaemon.com>, Oskar Linde says...
sundaresh skrev:
 I was told to post here by walter. I did think of posting it in the C newgroup
 but felt otherwise, since D is apparently upward compatible with C, or the
 specs in the site told me, so I presumed that every C program was a D program.
 As far as proof goes, I have included a test program. The results clearly
 indicate that for an array of size 512, of random numbers, with 2048 trials
 of different distibutions, and for each trial upto 2048 random selections
 of elements in the array to lookup, the number of comparisons is a constant
 at 9 or lg(512). Please run the test program. If it is a bug and you can point
 it out, I would be indebted. But that was why I posted it here. 
 If it is'nt a bug, I was hoping it would be my contribution.
 
 I was hoping for an enlightened response.
 
 You can lie all you want, but that does'nt change the truth.
 The sun does rise in the east. There are only 4 seasons in a year, and
 the value of pie is a fixed.
 
 I would encourage users to please try it out and respond. The proof of the
 pudding is in the eating.
 
 Not many experts here walter.I am dissapointed.

I'm not sure if this is intended as a joke or not, but I am sorry to 
tell that your code is not logarithmic. It is easy to prove that there 
are no better algorithm than O(n) for searching an unordered array on a 
deterministic single thread. I guess the error in your code is the 
computation of the number of comparisons:

comparisons = ++depth;

Cheers,

/Oskar

Apr 04 2006

BCS <BCS_member pathlink.com> writes:

sundaresh wrote:
 FYI, there is absolutely nothing wrong with that statement. It is
 certainly not a bug, if there is one, and I am not ruling out that
 possibility.comparisons is never decremented or decreased, so how can it be
 reduced to the value of 9, if it increases above 9, as you claim/assume
 that it supposedly does. The value of comparisons is set prior to any/all
 recursive calls to psearch. If on the other hand, you claim that it is 
 possibly the characteristic of the random number generator, since it makes
 the decision of which half to search for first, but then again, that decision
 is binary and I think that any random number generator should generate as
 many even numbers as odd numbers, at least for a reasonably sized sample.
 
 I am not one to contest theory, and if this algorithm is correct, then
 there probably is a very valid and sound theoretical explanation as to
 why it works, but I am not one to reject evidence either, in the face of it,
 especially of such an astounding nature. It is well acknowledged that
 empirical testing of methods lead to a convincing degree of assurance in them
 and is a reasonable way to assess their reliability.
 
 My apologies if I have irked you. 
 
 To have an open mind, while looking at it is all I asked for.
 
 The reason I posted it here was because I was not sure myself.
 

I haven't looked at the code because I don't have a .tar.gz loaded so...

How about try a different dataset? Grab chunks out of "Alice in wonderland" or 
something like that. I have heard of cases where it was possible to get more 
performance than you should be able to by leveraging predictable non randomness 
in a data set.
	An example: in English the letter "Q" is almost always followed by "U" their 
for if you are searching for "U" and you find a "Q" check the next char.

Apr 04 2006

Georg Wrede <georg.wrede nospam.org> writes:

Point 1: we do have to allow for the different expression of content and 
intent that arises from writers, who are from a fundamentally different 
cultural background. (Possibly this is made even harder if such a person 
uses some automatic translation between languages.)

So, I vote for understanding, meditation and deep thought, before 
dismissing the ideas presented.

Point 2: there are several "test cases" for algorithms. One is "worst 
case", another is "truly random data", another is "best case", yet 
another is "average case", and finally we have "plausible data". (There 
are others, I know.)

Since _no_ interesting data is truely random (because by definition, 
such data does not contain Information!), we can dismiss the "truly 
random data" case.

Now, depending on the application domain, it may or may not be 
significant what the "worst case" performance is. Similarly, "best case" 
usually doesn't count, but it's weight has to be evaluated on a case by 
case basis.

At this point we are between the "average case" and "plausible data". 
Here we actually can find surprising results, if we look hard enough. 
For all practical purposes, the algorithm which performs best _given_ 
"actual like" data, would be the one to choose. (Assuming we are even 
able to present such "actual like" data in the first place.)

---

In summary, instead of quibbling over theoretical merits of this 
particular method, we might do better by trying to evaluate which kind 
of data this algorithm may perform best with, and thereafter compare it 
to those generally suggested for that kind of data.


sundaresh wrote:
 FYI, there is absolutely nothing wrong with that statement. It is 
 certainly not a bug, if there is one, and I am not ruling out that 
 possibility.comparisons is never decremented or decreased, so how can
 it be reduced to the value of 9, if it increases above 9, as you
 claim/assume that it supposedly does. The value of comparisons is set
 prior to any/all recursive calls to psearch. If on the other hand,
 you claim that it is possibly the characteristic of the random number
 generator, since it makes the decision of which half to search for
 first, but then again, that decision is binary and I think that any
 random number generator should generate as many even numbers as odd
 numbers, at least for a reasonably sized sample.
 
 I am not one to contest theory, and if this algorithm is correct,
 then there probably is a very valid and sound theoretical explanation
 as to why it works, but I am not one to reject evidence either, in
 the face of it, especially of such an astounding nature. It is well
 acknowledged that empirical testing of methods lead to a convincing
 degree of assurance in them and is a reasonable way to assess their
 reliability.
 
 My apologies if I have irked you.
 
 To have an open mind, while looking at it is all I asked for.
 
 The reason I posted it here was because I was not sure myself.
 
 
 
 In article <e0u7jg$1n3i$1 digitaldaemon.com>, Oskar Linde says...
 
 sundaresh skrev:
 
 I was told to post here by walter. I did think of posting it in
 the C newgroup but felt otherwise, since D is apparently upward
 compatible with C, or the specs in the site told me, so I
 presumed that every C program was a D program. As far as proof
 goes, I have included a test program. The results clearly 
 indicate that for an array of size 512, of random numbers, with
 2048 trials of different distibutions, and for each trial upto
 2048 random selections of elements in the array to lookup, the
 number of comparisons is a constant at 9 or lg(512). Please run
 the test program. If it is a bug and you can point it out, I
 would be indebted. But that was why I posted it here. If it is'nt
 a bug, I was hoping it would be my contribution.
 
 I was hoping for an enlightened response.
 
 You can lie all you want, but that does'nt change the truth. The
 sun does rise in the east. There are only 4 seasons in a year,
 and the value of pie is a fixed.
 
 I would encourage users to please try it out and respond. The
 proof of the pudding is in the eating.
 
 Not many experts here walter.I am dissapointed.

 
 I'm not sure if this is intended as a joke or not, but I am sorry
 to tell that your code is not logarithmic. It is easy to prove that
 there are no better algorithm than O(n) for searching an unordered
 array on a deterministic single thread. I guess the error in your
 code is the computation of the number of comparisons:
 
 comparisons = ++depth;
 
 Cheers,
 
 /Oskar

Apr 04 2006

kwilson nowhere.com writes:

Hi there Sandresh,

The "bug" within your program is simply that your base case (or terminating
case) for the recursion is (size == 1) and you increment/decrement a variable
(ie. depth)  outside of this base case "if" statement. 

This leads to a recursively maximized value of 9 (or log base 2 of 512), as
pointed out. Then, since no variable is updated in the base case "if" statement,
you don't actually add to the comparison total accordingly. Thus you mistakenly
get a maximal depth of 9 for each search. This is not actually the depth to find
the element you are searching for, but rather the depth to the "size == 1" base
case.

Now you will say that "this is not true" since the base case is reached many
times during the search process before the search is terminated....not just once
after depth equals nine. The static "depth" variable is not being updated
properly, though.

Once you try the changes suggested below you will see that the "Average depth"
of the 2048 searches per distribution will be approximately 256. Single "depths"
or comparisons will range from 1 to 512 as would be expected. The average per
distribution is approximately N/2 as several others in this thread have pointed
out, as the correct average case search for an element in a randomized array of
size N. If you want to argue that the "depth" of the actual comparisons never
exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but
that doesn't mean that the number of comparisons is 9. Sorry ;)

Hope the code change and viewing the results will clear things up if anyone
can't follow my explanation above (my fault for not explaining things well
enough, I'm afraid).


Try these simple changes:

in psearch.c 

on line 9:
add ------>  ++comparisons;


on line 18:
comment out --------> //comparisons = ++depth;


on line 31:
comment out --------> //--depth;

Thanks,
K.Wilson

Apr 04 2006

sundaresh <sundaresh_member pathlink.com> writes:

I believe that I have replied to this already. My first suspisions was with
comparisons, but there is nothing wrong with it. Comparisons is updated prior
to call to the base case psearch, so why should it be updated again during the
base case, since in the base case, no recursive call to psearch is being made.
Doing so might actually result in 256 or so as you claim, but that would be
introducing a bug and not eliminating it. Sorry, these arguments have only
served to convince me even further of the correctness and workability
of this algorithm.

In article <e0uug0$2mq8$1 digitaldaemon.com>, kwilson nowhere.com says...
Hi there Sandresh,

The "bug" within your program is simply that your base case (or terminating
case) for the recursion is (size == 1) and you increment/decrement a variable
(ie. depth)  outside of this base case "if" statement. 

This leads to a recursively maximized value of 9 (or log base 2 of 512), as
pointed out. Then, since no variable is updated in the base case "if" statement,
you don't actually add to the comparison total accordingly. Thus you mistakenly
get a maximal depth of 9 for each search. This is not actually the depth to find
the element you are searching for, but rather the depth to the "size == 1" base
case.

Now you will say that "this is not true" since the base case is reached many
times during the search process before the search is terminated....not just once
after depth equals nine. The static "depth" variable is not being updated
properly, though.

Once you try the changes suggested below you will see that the "Average depth"
of the 2048 searches per distribution will be approximately 256. Single "depths"
or comparisons will range from 1 to 512 as would be expected. The average per
distribution is approximately N/2 as several others in this thread have pointed
out, as the correct average case search for an element in a randomized array of
size N. If you want to argue that the "depth" of the actual comparisons never
exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but
that doesn't mean that the number of comparisons is 9. Sorry ;)

Hope the code change and viewing the results will clear things up if anyone
can't follow my explanation above (my fault for not explaining things well
enough, I'm afraid).


Try these simple changes:

in psearch.c 

on line 9:
add ------>  ++comparisons;


on line 18:
comment out --------> //comparisons = ++depth;


on line 31:
comment out --------> //--depth;

Thanks,
K.Wilson

Apr 05 2006

kris <foo bar.com> writes:

sundaresh wrote:
 I believe that I have replied to this already. My first suspisions was with
 comparisons, but there is nothing wrong with it. Comparisons is updated prior
 to call to the base case psearch, so why should it be updated again during the
 base case, since in the base case, no recursive call to psearch is being made.
 Doing so might actually result in 256 or so as you claim, but that would be
 introducing a bug and not eliminating it. Sorry, these arguments have only
 served to convince me even further of the correctness and workability
 of this algorithm.


Jolly good! Capital stuff, what!

Now then, my good man ~ you've qualified for tonight's free gift: a 
front-row ticket to the greatest show on the planet!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       The Cast (in order of appearance.)
       M= Man looking for an argument
       R= Receptionist
       Q= Abuser
       A= Arguer (John Cleese)
       C= Complainer (Eric Idle)
       H= Head Hitter


M:   Ah. I'd like to have an argument, please.
R:    Certainly sir. Have you been here before?
M:   No, I haven't, this is my first time.
R:     I see. Well, do you want to have just one argument, or were you 
thinking of taking a course?
M:   Well, what is the cost?
R:    Well, It's one pound for a five minute argument, but only eight 
pounds for a course of ten.
M:   Well, I think it would be best if I perhaps started off with just 
the one and then see how it goes.
R:     Fine. Well, I'll see who's free at the moment.
Pause
R:    Mr. DeBakey's free, but he's a little bit conciliatory.
Ahh yes, Try Mr. Barnard; room 12.
M:    Thank you.

(Walks down the hall. Opens door.)

Q:   WHAT DO YOU WANT?
M:   Well, I was told outside that...
Q:   Don't give me that, you snotty-faced heap of parrot droppings!
M:   What?
Q:   Shut your festering gob, you tit! Your type really makes me puke, 
you vacuous, coffee-nosed, maloderous, pervert!!!
M:   Look, I CAME HERE FOR AN ARGUMENT, I'm not going to just stand...!!
Q:   OH, oh I'm sorry, but this is abuse.
M:   Oh, I see, well, that explains it.
Q:   Ah yes, you want room 12A, Just along the corridor.
M:   Oh, Thank you very much. Sorry.
Q:   Not at all.
M:   Thank You.
(Under his breath) Stupid git!!

(Walk down the corridor)
M: (Knock)
A:   Come in.
M:   Ah, Is this the right room for an argument?
A:   I told you once.
M:   No you haven't.
A:   Yes I have.
M:   When?
A:    Just now.
M:   No you didn't.
A:   Yes I did.
M:  You didn't
A:   I did!
M:  You didn't!
A:   I'm telling you I did!
M:  You did not!!
A:   Oh, I'm sorry, just one moment. Is this a five minute argument or 
the full half hour?
M:  Oh, just the five minutes.
A:   Ah, thank you. Anyway, I did.
M:  You most certainly did not.
A:   Look, let's get this thing clear; I quite definitely told you.
M:  No you did not.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:   No you didn't.
A:   Yes I did.
M:  You didn't.
A:   Did.
M:  Oh look, this isn't an argument.
A:   Yes it is.
M:   No it isn't. It's just contradiction.
A:   No it isn't.
M:  It is!
A:   It is not.
M:  Look, you just contradicted me.
A:   I did not.
M:  Oh you did!!
A:   No, no, no.
M:  You did just then.
A:   Nonsense!
M:  Oh, this is futile!
A:   No it isn't.
M:  I came here for a good argument.
A:   No you didn't; no, you came here for an argument.
M:  An argument isn't just contradiction.
A:   It can be.
M:  No it can't. An argument is a connected series of statements 
intended to establish a proposition.
A:   No it isn't.
M:  Yes it is! It's not just contradiction.
A:   Look, if I argue with you, I must take up a contrary position.
M:  Yes, but that's not just saying 'No it isn't.'
A:   Yes it is!
M:   No it isn't!

A:   Yes it is!
M:  Argument is an intellectual process. Contradiction is just the 
automatic gainsaying of any statement the other person makes.
(short pause)
A:  No it isn't.
M:  It is.
A:  Not at all.
M:  Now look.
A: (Rings bell)  Good Morning.
M:  What?
A:   That's it. Good morning.
M:   I was just getting interested.
A:   Sorry, the five minutes is up.
M:  That was never five minutes!
A:   I'm afraid it was.
M:  It wasn't.
Pause
A:   I'm sorry, but I'm not allowed to argue anymore.
M:  What?!
A:   If you want me to go on arguing, you'll have to pay for another 
five minutes.
M:  Yes, but that was never five minutes, just now. Oh come on!
A:  (Hums)
M:  Look, this is ridiculous.
A:   I'm sorry, but I'm not allowed to argue unless you've paid!
M:  Oh, all right.
(pays money)
A:   Thank you.
short pause
M:  Well?
A:   Well what?
M:   That wasn't really five minutes, just now.
A:    I told you, I'm not allowed to argue unless you've paid.
M:   I just paid!
A:   No you didn't.
M:   I DID!
A:   No you didn't.
M:  Look, I don't want to argue about that.
A:  Well, you didn't pay.
M:  Aha. If I didn't pay, why are you arguing? I Got you!
A:   No you haven't.
M:  Yes I have. If you're arguing, I must have paid.
A:   Not necessarily. I could be arguing in my spare time.
M:  Oh I've had enough of this.
A:   No you haven't.
M:  Oh Shut up.

(Walks down the stairs. Opens door.)

M:  I want to complain.
C:  You want to complain! Look at these shoes. I've only had them three 
weeks and the heels are worn right through.
M:  No, I want to complain about...
C:   If you complain nothing happens, you might as well not bother.
M:  Oh!
C:   Oh my back hurts, it's not a very fine day and I'm sick and tired 
of this office.


(Slams door. walks down corridor, opens next door.)

M:  Hello, I want to... Ooooh!
H:   No, no, no. Hold your head like this, then go Waaah. Try it again.
M:  uuuwwhh!!
H:   Better, Better, but Waah, Waah! Put your hand there.
M:  No.
H:   Now..
M:  Waaaaah!!!
H:   Good, Good! That's it.
M:  Stop hitting me!!
H:  What?
M:  Stop hitting me!!
H:   Stop hitting you?
M:  Yes!
H:   Why did you come in here then?
M:   I wanted to complain.
H:   Oh no, that's next door. It's being-hit-on-the-head lessons in here.
M:  What a stupid concept.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Sundaresh-dude;

If you want to argue the merits of your outstanding algorithm, you need 
to choose the right room to begin. This is the room for discussion upon 
the D language itself ~ not language-agnostic algorithms. Might I 
suggest you try the /slashdot/ room instead? Just down the hallway, and 
to the right a bit. Yes, that's right. You may get a good argument there

Apr 05 2006

"Rioshin an'Harthen" <rharth75 hotmail.com> writes:

"sundaresh" <sundaresh_member pathlink.com> wrote in message 
news:e0vqej$12o5$1 digitaldaemon.com...
I believe that I have replied to this already. My first suspisions was with
 comparisons, but there is nothing wrong with it. Comparisons is updated 
 prior
 to call to the base case psearch, so why should it be updated again during 
 the
 base case, since in the base case, no recursive call to psearch is being 
 made.

That's just it. You don't count how many comparisons it takes to find a 
match, just
how deep it has to go. And with an original size of 512, that always leads 
to
log2(512).

Since you only count the depth of the search, not the amount of comparisons, 
you'll
find it's almost always 9 - it may be shorter, depending on which item was 
randomly picked to be searched for.

 Doing so might actually result in 256 or so as you claim, but that would 
 be
 introducing a bug and not eliminating it.

Frankly, you are so wrong in this. Now you only count the depth, not how 
many items it searches. And it's the items that matter, not the depth. It's 
the amount of comparisons that tells what time the algorithm has - and in 
the case of this, it's most definitely O(N).

 Sorry, these arguments have only served to convince me even further of the
 correctness and workability of this algorithm.

*sigh* A fool, then.

Many have shown you where you've gone wrong, but you seem to be "I'm right 
and I know it, you'd just introduce a bug in it"... If you refuse to listen, 
go ahead.

Mathematically, it is possible to prove that no search of unordered data can 
be done in time less than O(N) - you always have to in the worst case test 
*all* occurences.

I'm attaching an altered version of psearch with this message, so you can 
test it yourself. It's a zip archive with MS-DOS line endings (since that's 
the system I happen to be on). I strongly recommend after unzipping and 
compiling that you invoke it by directing standard output to a file - it's 
going to display so much data on the amount of comparisons it took to find 
something, as well as how it found it. Then you'll see it's not
O(log N) but O(N). 


begin 666 psearch.zip

M`` `%E6%-$XL)B*]`0``P 0``!$```!P<V5A<F-H+W!S96%R8V N8\5336_;
M, P]V[^"2Q%4CI7LZ[8L'89>B]YZ*U H$1T+<.1 4M9V0_][*4IVC6$8NNTP

M:3R&=EV6WWJC87'TJ-RN%>EKJSQ*B$AOON?3O=&AE9 `V.$!;4 9L4 #L,KE
M`ZJJX$=9^*""V3&29\(&WM'<PC0B=H?-!MXSL( 0A_[4!<+DGB)QR0.K-<$F
M<NHZ!HZ.*ALQNQP3,/^HX1/,]:V=R=R3:QV&D[/#E"]P?7-U1< XA-)/98&=

MG2RX>+F"BE+%^$6E^6)>9'T=C)MK":H)Z. TO50)]WCN$/:]L7L P0R74TB5
M& CFE]P>^W-L14XQJ&!]Q&-8B> -U.P%+(8U8,W+;%".C4LQK .WHZ%O 'OF
ML7^F"QIE.M2P?<SRF*V$#L.YA^ >Z=2$U4QR:TC/S\)_+4J^ACZM!+_C7HST
M>>YO79/_PY](:VI/NMV_\><?+GUP;;D<MSG_>?GOH>Q3^0Q02P,$% `"`` `
M]XIZ-.O.;/,A`0``T (``!4```!P<V5A<F-H+W!S96%R8V N8RYB86NEDDM.

M2R67V S3*O'1>3GI_EH]<:Z-AV$^+,)J-QNWX_Q]UA*JQ:&P RK24R\<-A"U


M"6&<5R-C&HOK$$V1-W ?):0-,]&[P HCBP!\E58_. J"NB;RN C4="P1,_(/


M<'-E87)C:"]P<V5A<F-H+FA-C<N.PC ,1??Y"DO=T*B"_; !C49BB08T6Q0:
MM[&4.%$>P^/K22E([.Q[CGT;&ECC`/O#S_;W>W?:B::NQ/B1B)6$HZ$$RHX^
M4C8."EM,4Q!1Z1OT/MPBC2:C[J"*Q) -0BAG2SUH[Q3Q4H"$;<G&1_B"0V&M

M0S9K(?X]:9 AH8J]6<S;627LGF:B^VNZD,ZF UE BPXYSV0AYQYL7^=OJVWK


M7M<]_9REA=VD__NDOZ.)+4YP/+WMWU\/GP?5M948_R5JI^'L2,#X.68J+D!E
MC[($&8V]P1C3+=/L"MH!FD ,Q2&D>O$T HW!$&\5:-C7XF*&)SA5MB:C./A 
MKG-,QB\<F^D;E =^68+M&,-"]4XI_"Z8N364UAJ2R221Y5FIKT 6=!(T>72;
M=;L8P>'/%?JY3U>RQ0VP"N Q()>5;/3Z$?O[^</J^_:_0[8TJ5]02P,$% `"
M`` `F(IZ-%'^F>R !0``JPP``!,```!P<V5A<F-H+W!S96%R8V N='AT?5?!
MCMLV$#TOOV
M96V3PYDW[[V9J"])#ZZX
MR.AB^>M_[$ Z4YEBMJ27Q3NC>V^I1*4#Q338A!/9%HHC66]G&THF'8:?7<'W

MQ2.."S><^% OZ)3TM0;G3X/-CB.VC([T9>++RY*BED A%CK;="4[CLXX/J0^

MGY0"<8RO(RT7[H\=Z1V-.J/+[1 5-ULN^"5JBH% -K$O&L -U%]Q)B(2X+!F

M$ 7BRL"E48[[TO(>(!>R3452FJT.&?DK/OPL6+<5/KH$TBQ>&WNDWP$A8.SX
MEV1K^SV>TI1GE$(1C9JL'AB1:,RZZ&"NZG'D%HW:E) .('R.QFE.X +Z244;
M#X
MD:!-+JZ<;]5H<8+',U8C'H3R6>J67DVXW>W4(:U^(8]D?0WH`>B*YGU;G7FR

MUT#G;:?X)8ZSJ;4P((^5M(?ZPUZQOK$CK'./&O?FTII>KQ_5>[&*>[G"^8V_
M?-SX=8"MN:"AW': DZE>N$0&!A'NKL$%X 6%'UA&2,>>P?,YXON\+MRE5(FZ
MCXBN_(F+#JS\0/;[XOGT)J<E0A^]\ZY<^>+_.-FNT V.&0[26]6$N+0"&'\0

MS<N?FZ;"3EBA_7Q+I%P[B:O#E4
MR?7LLZY<-I!-);.J\P7Y(J9ED_[A?J7+FH+$,5-TIKJA8"M&O76DD9K-Q&65


M>!-!-%-HT$63  " `:,6S>%N$90ZZNNP$Y]"-6  6V#K\,P'-4%&/(Q*M91A
M38PC"&-D9AS5UVQOB=Z&[:R?:N]X#K

M-_ZFU#FZ 5ZWTX_RB;M?%[!F=OS7Q0T%GB,';A[*OSR^EHCVT*YOIPX'I1[^

M$!KM*$P_P\C(S/CEC8R?B\M6M'6;\; *P67UH-NC:9/\Y 6(Z>I8&N.*6!+Q
MT]>/'_<Q/]Q,P=1DV:<>SL"X:F,_B^I^Q;.(M]878\,5.^?[QM6V3*QI9]ZV
M1AOJ<BQS.<42RW6Q5;G 6MN+VL#AK8&]O/GL;<%DN%1_DU8]?\5 WTNL:ROQ

M>*E"#Q TA^]=4:_(&'I]-+1E%JJ&T7+D\XJ.O^HC;[)=CXY0FQV\EO-IWDBV
M[?BVHM<:ZI*VK#WJ


MZ4,--KI/=Y^^[W2^%');' H^4;H0]?7^V^ R0$JQ.8.TJ+ %/!(WBN?M=G^]
MCPU8\)V0''XL'K+UW>PWD%&:4 ]G/Q>S)>(WZ><OM"_(9M^7\_7#XG$.Y'9T
M0P=":M#YIN1/_=;SN"]8W&==NA+_>+TCIH :0ELGMG75Y"TGQUH4,.0EK[C4

MU(J!.E3K3 0&!6_TWLB/HEW=$EL`4TC'KA0FOE\() ER1Y%J<UD0VW-R_VNY
MI)"X!#IVNY
M63M7*VW*5MJ9B%DH(3!J6+U3)\EK=!RH$I>3?A(Z*!2*EO*-(JB2PE7HJ\O"

M7HMG";N43,/L, /U)X&9]^]G=OP#F(MW9&-+J5P<!P,'G_I&,\<"'W]"T0&/

M97-T+F,N8F%K;5)=:\(P%'UN?\5=99!H<:UL,%8==,R' O-A=C V1:J-,]"F
MI8D^;/C?EX_:UK&'EN3DGI-S3VZ/LFUV2,F8BY06P_VCW>L &=U<0(+F1 $-
MXI2<)-5V/]P[$DS)CC("S]$B7K^$[X!\S\,-'+]&X<S (^_V'K>$.'R:3=>+
MZ&,*Z,X?89LR`2+99.2S/5H%+2&:QW4YI]^DV"%)P%)0\>BVR,ND(NA8T!3Z
M)",Y8<)SX6+O8_BQ+:LBXE QZ"L!Z.-S,5Q-_F)^8)_,!7E"&3)TM4TI%RZ(

M!3C0ITI*:ALEM1AW0M-(JV19)CH)KB1!Z]8R)_TO*^E[AYPE,Z8>X#J5:]W"


M_Z,,CU^ 4]-Q.GH*X*9-1 N>Y S]`E!+`0(4``H``````!=5A30`````````
M```````(````````````$ ````````!P<V5A<F-H+U!+`0(4`!0`` `(`!95
MA31.+"8BO0$``,($```1``````````$`( ```"8```!P<V5A<F-H+W!S96%R
M8V N8U!+`0(4`!0`` `(`/>*>C3KSFSS(0$``- "```5``````````$`( ``
M`!("``!P<V5A<F-H+W!S96%R8V N8RYB86M02P$"% `4``(`" `7584T0<;[
M`=8```!'`0``$0`````````!`" ```!F`P``<'-E87)C:"]P<V5A<F-H+FA0
M2P$"% `4``(`" #HBGHT<OR4O,X````Q`0``%0`````````!`" ```!K! ``
M<'-E87)C:"]P<V5A<F-H+F N8F%K4$L!`A0`% `"`` `F(IZ-%'^F>R !0``
MJPP``!,``````````0` ````; 4``'!S96%R8V O<'-E87)C:"YT>'102P$"

M87)C:"]T97-T+F-02P$"% `4``(`" !63'HT'AH.6Z(!``!R`P``$ ``````

'``</````````
`
end

Apr 05 2006

sundaresh <sundaresh_member pathlink.com> writes:

Sorry about that, you were right, for all but the base case, the value of
comparisons is incremented in the else part. So here is the corrected version.
But the search time is still O(lg(n)), since avg comparisons is just one more
at 10,and I think the worst case comparisons is 11, so this still seems to be
comparable to binary search.Here is the corrected version.

#include<stdlib.h>

int comparisons;

void *psearch(void *base, int size, int width, void *element, int
(*compare)(void *, void *)) {
static int depth = 0;

if(size <= 1) {
++comparisons;
return ! size || compare(base, element) ? NULL : base;
}
else {
int half, right;
void *found;

half = size / 2;
right = abs(rand()) % 2;
comparisons = ++depth;
if(right) {
found = psearch(base + half * width, size - half, width, element, compare);
if(! found) {
found = psearch(base, half, width, element, compare);
}
}
else {
found = psearch(base, half, width, element, compare);
if(! found) {
found = psearch(base + half * width, size - half, width, element, compare);
}
}
--depth;
return found;
}
}


In article <e0uug0$2mq8$1 digitaldaemon.com>, kwilson nowhere.com says...
Hi there Sandresh,

The "bug" within your program is simply that your base case (or terminating
case) for the recursion is (size == 1) and you increment/decrement a variable
(ie. depth)  outside of this base case "if" statement. 

This leads to a recursively maximized value of 9 (or log base 2 of 512), as
pointed out. Then, since no variable is updated in the base case "if" statement,
you don't actually add to the comparison total accordingly. Thus you mistakenly
get a maximal depth of 9 for each search. This is not actually the depth to find
the element you are searching for, but rather the depth to the "size == 1" base
case.

Now you will say that "this is not true" since the base case is reached many
times during the search process before the search is terminated....not just once
after depth equals nine. The static "depth" variable is not being updated
properly, though.

Once you try the changes suggested below you will see that the "Average depth"
of the 2048 searches per distribution will be approximately 256. Single "depths"
or comparisons will range from 1 to 512 as would be expected. The average per
distribution is approximately N/2 as several others in this thread have pointed
out, as the correct average case search for an element in a randomized array of
size N. If you want to argue that the "depth" of the actual comparisons never
exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but
that doesn't mean that the number of comparisons is 9. Sorry ;)

Hope the code change and viewing the results will clear things up if anyone
can't follow my explanation above (my fault for not explaining things well
enough, I'm afraid).


Try these simple changes:

in psearch.c 

on line 9:
add ------>  ++comparisons;


on line 18:
comment out --------> //comparisons = ++depth;


on line 31:
comment out --------> //--depth;

Thanks,
K.Wilson

Apr 05 2006

Nic Tiger <g_tiger progtech.ru> writes:

sundaresh wrote:
 #include<stdlib.h>
 
 int comparisons;
 
 void *psearch(void *base, int size, int width, void *element, int
 (*compare)(void *, void *)) {
 static int depth = 0;
 
 if(size <= 1) {

Replace this line
 ++comparisons;

with
   if ( size ) ++comparisons;

 return ! size || compare(base, element) ? NULL : base;
 }
 else {
 int half, right;
 void *found;
 
 half = size / 2;
 right = abs(rand()) % 2;

and remove this line
 comparisons = ++depth;

 if(right) {
 found = psearch(base + half * width, size - half, width, element, compare);
 if(! found) {
 found = psearch(base, half, width, element, compare);
 }
 }
 else {
 found = psearch(base, half, width, element, compare);
 if(! found) {
 found = psearch(base + half * width, size - half, width, element, compare);
 }
 }
 --depth;
 return found;
 }
 }

then you will have *REAL* count of compare() function call
even better we to supply test compare function that would increment 
comparisons before returning comparison result.

If you don't apply mentioned changes, your code will count something 
completely different from "comparison number"

Nic Tiger

Apr 05 2006

Nic Tiger <g_tiger progtech.ru> writes:

this is your latest code, with proper comparisons counting and some 
simple test;
just compile it and run.
last printed line will state:
   average_cmp=247
which is essentially ~(N/2) for 512 item test

as a note, you should write compare(*(void**)base, element), otherwise 
it is difficult to use any of existing comparison functions (strcmp or 
smth.)

Nic Tiger.

-------- test suite for psearch --------
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int comparisons = 0;

int compare (void *p1, void *p2) {
   comparisons ++;
   return strcmp ( *(const char**)p1, (const char*)p2 );
}

void *psearch(void *base, int size, int width, void *element, int 
(*compare)(void *, void *)) {
   static int depth = 0;

   if(size <= 1) {
     return ! size || compare(base, element) ? NULL : base;
   } else {
     int half, right;
     void *found;

     half = size / 2;
     right = abs(rand()) % 2;
     //comparisons = ++depth;
     if(right) {
       found = psearch((char*)base + half * width, size - half, width, 
element, compare);
       if(! found) {
         found = psearch((char*)base, half, width, element, compare);
       }
     }
     else {
       found = psearch((char*)base, half, width, element, compare);
       if(! found) {
         found = psearch((char*)base + half * width, size - half, width, 
element, compare);
       }
     }
     --depth;
     return found;
   }
}

main () {
   char *list[512];
   int i;
   for ( i = 0; i < 512; i ++ ) {
     char buf[32];
     sprintf ( buf, "str%d", i );
     list[i] = strdup(buf);
   }

   int total_cmp = 0;
   for ( i = 0; i < 512; i ++ ) {
     comparisons = 0;
     void *res = psearch ( list, 512, 4, list[i], compare );
     if ( res )
       printf ( "%d comparisons to find %s (found as %d index of 
list)\n", comparisons, list[i], (char**)res - list );
     else
       printf ( "%s not found, %d comparisons\n", list[i], comparisons );
     total_cmp += comparisons;
   }
   printf ( "average_cmp=%d", total_cmp/512 );
}

Apr 05 2006

sundaresh <sundaresh_member pathlink.com> writes:

My apologies, making these changes does result in a linear worst case
behaviour, so it is no better than sequential search.

In article <e0uug0$2mq8$1 digitaldaemon.com>, kwilson nowhere.com says...
Hi there Sandresh,

The "bug" within your program is simply that your base case (or terminating
case) for the recursion is (size == 1) and you increment/decrement a variable
(ie. depth)  outside of this base case "if" statement. 

This leads to a recursively maximized value of 9 (or log base 2 of 512), as
pointed out. Then, since no variable is updated in the base case "if" statement,
you don't actually add to the comparison total accordingly. Thus you mistakenly
get a maximal depth of 9 for each search. This is not actually the depth to find
the element you are searching for, but rather the depth to the "size == 1" base
case.

Now you will say that "this is not true" since the base case is reached many
times during the search process before the search is terminated....not just once
after depth equals nine. The static "depth" variable is not being updated
properly, though.

Once you try the changes suggested below you will see that the "Average depth"
of the 2048 searches per distribution will be approximately 256. Single "depths"
or comparisons will range from 1 to 512 as would be expected. The average per
distribution is approximately N/2 as several others in this thread have pointed
out, as the correct average case search for an element in a randomized array of
size N. If you want to argue that the "depth" of the actual comparisons never
exceeds 9, then that is obviously true because log base 2 of 512 is 9.....but
that doesn't mean that the number of comparisons is 9. Sorry ;)

Hope the code change and viewing the results will clear things up if anyone
can't follow my explanation above (my fault for not explaining things well
enough, I'm afraid).


Try these simple changes:

in psearch.c 

on line 9:
add ------>  ++comparisons;


on line 18:
comment out --------> //comparisons = ++depth;


on line 31:
comment out --------> //--depth;

Thanks,
K.Wilson

Apr 05 2006

pragma <pragma_member pathlink.com> writes:

In article <e0u6na$1mdo$1 digitaldaemon.com>, sundaresh says...
Not many experts here walter.I am dissapointed.

Well, as you say, "the proof [...] is in the eating".  You, my good man, have
merely had a taste of our wonderful community so far! :)

In all seriousness, I think Wang's request for proof/data was the prime example
of an enlightened response.  I'd be suprised if others lurking here weren't
thinking the same thing and just didn't bother to post redundantly.  

My $0.02:
As this group is still participating in the formulative steps of the D langauge,
we get a lot of "pie in the sky" ideas around here.  If I may be so bold as to
speak for the group: we've all learned to approach things here with a dose of
skepticism (as professionals do) before embracing them completely.  The
unfortunate side effect of this behavior is that perfectly good new ideas still
have to fight for some degree of recognition thanks to this de-facto acceptance
process.  

And of course Walter's opinion trumps everyone's when it comes to what actually
gets into D. ;)

Also, I've noticed that people here actually read posts rather than skip over
the longer ones and hunt for attachments.  So posting just a .gz along with a
one-liner citing how awesome your algorithm probably didn't help.  It really
wouldn't hurt your case at all if you compose a small abstract about the
algorithm (how it works, what its based on) and supplement that with some timing
and performance data.  At least that way, nobody has to even touch a compiler
unless they feel the need to validate your claims. :)


- EricAnderton at yahoo

Apr 04 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

sundaresh wrote:
 This (attached) search function of an unordered array of elements appears
 to have a logarithmic search time, like binary search of an ordered array.
 
 

It doesn't compile, on VC++ 2005 I get:

psearch.c(19) : error C2036: 'void *' : unknown size
psearch.c(27) : error C2036: 'void *' : unknown size

I fixed it by casting base to int
(int)base

anyway, I'd like (please) to see a human readable pseudo code that 
doesn't include mangling with pointers, just array indecies.

Apr 04 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

sundaresh wrote:
 This (attached) search function of an unordered array of elements appears
 to have a logarithmic search time, like binary search of an ordered array.
 
 

ok, I just wrote a D version of the algorithm, and used the CPU counter 
to benchmark pefromance.

before I post the code, here's how you can use it, assuming you compile 
it into dps.exe

dps > log.txt

(give it a few seconds to run all the test cases, shouldn't take too long).

you can examine the log file, and you will see clearly that the increase 
in time is linear related to the increase in array size.

or, you can use gnuplot to see it more clearly.

gnuplot
set terminal png
set output "graph.png"
plot "log.txt" with lines


I've attached my results.

and here's the D code, according to what I understood of the algorithm.
if something is wrong in my understanding, please point me to it.

---------
import std.stdio;
import std.random;

template psearch(T)
{
     bool psearch( T[] arr, T element )
     {
         if( arr.length == 1 )
         {
             if( arr[0] == element ) return true;
             else return false;
         }

         int choice = rand() % 2;

         T[] first, second;
         int middle = arr.length / 2;

         if( choice == 1 )
         {
             first = arr[0..middle];
             second = arr[middle..$];
         }
         else
         {
             first = arr[middle..$];
             second = arr[0..middle];
         }

         if( !psearch( first, element ) )
         {
             return psearch( second, element );
         }
         return false;
     }
}

alias psearch!(int) probSearch;

void main()
{
     CPUTimer timer = new CPUTimer();
     for( int i = 1000; i < 100000; i+=500 )
     {
         int[] arr = randomArray( i ); //random int array of size i
         timer.timein();  //start cpu timer
         probSearch( arr, 27 ); //look for 27
         timer.timeout(); //stop cpu timer
	//print: length   time
         writefln( "%d %d", i, timer.getSmallTime() );
     }
}

class CPUTimer
{
     long time;

     void timein()
     {
         time = rdtsc();
     }
     void timeout()
     {
         time = rdtsc() - time;
     }

     //a small version of the time ..
     int getSmallTime()
     {
         return (time/1000L);
     }
}

//x86 only
long rdtsc()
{
     asm
     {
         naked;
         rdtsc;
         ret;
     }
}

int[] randomArray( int size )
{
     int[] arr = new int[size];
     for( int i = 0; i < size; i++ )
     {
         arr[i] = rand() % (size/2); //random data, mod by size/2 to 
give 27 a reasonably chance of appearing??!!
     }
     return arr;
}

Apr 05 2006

Sean Kelly <sean f4.ca> writes:

Hasan Aljudy wrote:
 
 and here's the D code, according to what I understood of the algorithm.
 if something is wrong in my understanding, please point me to it.
 
 ---------
 import std.stdio;
 import std.random;
 
 template psearch(T)
 {
     bool psearch( T[] arr, T element )
     {
         if( arr.length == 1 )
         {
             if( arr[0] == element ) return true;
             else return false;
         }
 
         int choice = rand() % 2;
 
         T[] first, second;
         int middle = arr.length / 2;
 
         if( choice == 1 )
         {
             first = arr[0..middle];
             second = arr[middle..$];
         }
         else
         {
             first = arr[middle..$];
             second = arr[0..middle];
         }
 
         if( !psearch( first, element ) )
         {
             return psearch( second, element );
         }
         return false;
     }
 }

Thanks for posting this.  I hadn't taken the time to look at the 
algorithm yet, and this got me interested :-)  The algorithm is 
obviously O(N) for single-element searches, and equivalent in 
theoretical complexity to the naive approach for substring searches.  In 
practical terms it's even worse because of the N recursive function 
calls, but it could always be rewritten as a loop with some bookkeeping.

It's obvious that the algorithm favors certain pattern positions in the 
search string, ie. (1/4)N, (3/4)N, and so on, while traditional searches 
typically perform best if the pattern is near the beginning of the 
string.  Though this raises an interesting point--the probabilistic 
search above is not a "find first" or "find last" algorithm.  Probably 
not a big deal for unordered sets, but it might be otherwise.


Sean

Apr 05 2006

Hasan Aljudy <hasan.aljudy gmail.com> writes:

Sean Kelly wrote:
 Hasan Aljudy wrote:
 
 and here's the D code, according to what I understood of the algorithm.
 if something is wrong in my understanding, please point me to it.

 ---------
 import std.stdio;
 import std.random;

 template psearch(T)
 {
     bool psearch( T[] arr, T element )
     {
         if( arr.length == 1 )
         {
             if( arr[0] == element ) return true;
             else return false;
         }

         int choice = rand() % 2;

         T[] first, second;
         int middle = arr.length / 2;

         if( choice == 1 )
         {
             first = arr[0..middle];
             second = arr[middle..$];
         }
         else
         {
             first = arr[middle..$];
             second = arr[0..middle];
         }

         if( !psearch( first, element ) )
         {
             return psearch( second, element );
         }
         return false;
     }
 }

 
 
 Thanks for posting this.  I hadn't taken the time to look at the 
 algorithm yet, and this got me interested :-)  The algorithm is 
 obviously O(N) for single-element searches, and equivalent in 
 theoretical complexity to the naive approach for substring searches.  In 
 practical terms it's even worse because of the N recursive function 
 calls, but it could always be rewritten as a loop with some bookkeeping.
 
 It's obvious that the algorithm favors certain pattern positions in the 
 search string, ie. (1/4)N, (3/4)N, and so on, while traditional searches 
 typically perform best if the pattern is near the beginning of the 
 string.  Though this raises an interesting point--the probabilistic 
 search above is not a "find first" or "find last" algorithm.  Probably 
 not a big deal for unordered sets, but it might be otherwise.
 
 
 Sean


I think I have a mistake in the last bit of it ..

I wrote:

         if( !psearch( first, element ) )
         {
             return psearch( second, element );
         }
         return false;

when it should be:

         if( psearch( first, element ) )
         {
             return true;
         }
         else
         {
              return psearch( second, element );
         }

Apr 05 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Probabilistic search - psearch.tar.gz