www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why function template 'among' is of complexity O(1) and not O(n)?

reply "Gopan" <gggopan gmail.com> writes:
Recently, I came across the following thread
http://forum.dlang.org/thread/l8nocr$1dv5$1 digitalmars.com?page=2

On Monday, 16 December 2013 at 22:10:52 UTC, Andrei Alexandrescu 
wrote:
 On 12/16/13 1:45 PM, Walter Bright wrote:
 On 12/16/2013 12:38 PM, Andrei Alexandrescu wrote:
 uint among(T, Us...)(T v, Us vals)
 {
     foreach (i, U; Us)
     {
         if (v == vals[i]) return i + 1;
     }
     return 0;
 }
This has O(n) behavior, which might be unexpected for the user.
It's O(1) because the data size is in the source text. There's no variable n.
To me, it looks like a Linear Search and hence of complexity O(n). Could somebody please elaborate why it is O(1) and not O(n)? If there is an article/document that I am to read to understand this, please cite it. (I am a beginner level programmer and new to D Language.) Thanks, Gopan.
Feb 19 2014
parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 19 February 2014 at 08:58:10 UTC, Gopan wrote:
 Recently, I came across the following thread
 http://forum.dlang.org/thread/l8nocr$1dv5$1 digitalmars.com?page=2

 On Monday, 16 December 2013 at 22:10:52 UTC, Andrei 
 Alexandrescu wrote:
 On 12/16/13 1:45 PM, Walter Bright wrote:
 On 12/16/2013 12:38 PM, Andrei Alexandrescu wrote:
 uint among(T, Us...)(T v, Us vals)
 {
    foreach (i, U; Us)
    {
        if (v == vals[i]) return i + 1;
    }
    return 0;
 }
This has O(n) behavior, which might be unexpected for the user.
It's O(1) because the data size is in the source text. There's no variable n.
To me, it looks like a Linear Search and hence of complexity O(n). Could somebody please elaborate why it is O(1) and not O(n)? If there is an article/document that I am to read to understand this, please cite it. (I am a beginner level programmer and new to D Language.) Thanks, Gopan.
It's O(k) with k = vals.length, which is the number of arguments given to the function, which is a compile time constant and not dependent on any input of the final program. In O(n) the n always relates to the input somehow.
Feb 19 2014
parent reply "Gopan" <gggopan gmail.com> writes:
On Wednesday, 19 February 2014 at 09:09:25 UTC, Tobias Pankrath 
wrote:
 On Wednesday, 19 February 2014 at 08:58:10 UTC, Gopan wrote:
 Recently, I came across the following thread
 http://forum.dlang.org/thread/l8nocr$1dv5$1 digitalmars.com?page=2

 On Monday, 16 December 2013 at 22:10:52 UTC, Andrei 
 Alexandrescu wrote:
 On 12/16/13 1:45 PM, Walter Bright wrote:
 On 12/16/2013 12:38 PM, Andrei Alexandrescu wrote:
 uint among(T, Us...)(T v, Us vals)
 {
   foreach (i, U; Us)
   {
       if (v == vals[i]) return i + 1;
   }
   return 0;
 }
This has O(n) behavior, which might be unexpected for the user.
It's O(1) because the data size is in the source text. There's no variable n.
To me, it looks like a Linear Search and hence of complexity O(n). Could somebody please elaborate why it is O(1) and not O(n)? If there is an article/document that I am to read to understand this, please cite it. (I am a beginner level programmer and new to D Language.) Thanks, Gopan.
It's O(k) with k = vals.length, which is the number of arguments given to the function, which is a compile time constant and not dependent on any input of the final program. In O(n) the n always relates to the input somehow.
I agree that vals.length is known at compile time. But while running the application, it iterates through every val in the input list until a match is found. So, how can that be considered O(1)? See my test app and output. uint among(T, Us...)(T v, Us vals) { foreach (i, U; Us) { writeln("checking ", v, " == ", vals[i]); if (v == vals[i]) return i + 1; } return 0; } void main() { uint index = among(3, 1,2,5,3); writeln("Index of 3 in (1,2,5,3) is ", index); } outrput of running the app: checking 3 == 1 checking 3 == 2 checking 3 == 5 checking 3 == 3 Index of 3 in (1,2,5,3) is 4 Or, is my undertanding about Big-O notation of complexity wrong? Thanks, Gopan
Feb 19 2014
next sibling parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 19 February 2014 at 09:46:04 UTC, Gopan wrote:
 Index of 3 in (1,2,5,3) is 4

 Or, is my undertanding about Big-O notation of complexity wrong?

 Thanks,
 Gopan
O(1) = O(k) for any constant k.
Feb 19 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Wednesday, 19 February 2014 at 13:03:38 UTC, Tobias Pankrath 
wrote:
 On Wednesday, 19 February 2014 at 09:46:04 UTC, Gopan wrote:
 Index of 3 in (1,2,5,3) is 4

 Or, is my undertanding about Big-O notation of complexity 
 wrong?

 Thanks,
 Gopan
O(1) = O(k) for any constant k.
I don't think it is legit to speak about k as constant here. It is constant for any specific function instance but not for template meta-algorithm as a whole.
Feb 19 2014
parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 19 February 2014 at 13:11:32 UTC, Dicebot wrote:
 On Wednesday, 19 February 2014 at 13:03:38 UTC, Tobias Pankrath 
 wrote:
 On Wednesday, 19 February 2014 at 09:46:04 UTC, Gopan wrote:
 Index of 3 in (1,2,5,3) is 4

 Or, is my undertanding about Big-O notation of complexity 
 wrong?

 Thanks,
 Gopan
O(1) = O(k) for any constant k.
I don't think it is legit to speak about k as constant here. It is constant for any specific function instance but not for template meta-algorithm as a whole.
Yep, the meta-algorithm is O(n) but the specific instance used in the program is O(k). We need to agree on a) what the variables mean and on b) the artifact whose complexity is of interest.
Feb 19 2014
prev sibling next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 02/19/2014 01:46 AM, Gopan wrote:

 On Wednesday, 19 February 2014 at 09:09:25 UTC, Tobias Pankrath wrote:
 On Wednesday, 19 February 2014 at 08:58:10 UTC, Gopan wrote:
 Recently, I came across the following thread
 http://forum.dlang.org/thread/l8nocr$1dv5$1 digitalmars.com?page=2

 On Monday, 16 December 2013 at 22:10:52 UTC, Andrei Alexandrescu wrote:
 On 12/16/13 1:45 PM, Walter Bright wrote:
 On 12/16/2013 12:38 PM, Andrei Alexandrescu wrote:
 uint among(T, Us...)(T v, Us vals)
 {
   foreach (i, U; Us)
   {
       if (v == vals[i]) return i + 1;
   }
   return 0;
 }
This has O(n) behavior, which might be unexpected for the user.
It's O(1) because the data size is in the source text. There's no variable n.
To me, it looks like a Linear Search and hence of complexity O(n). Could somebody please elaborate why it is O(1) and not O(n)? If there is an article/document that I am to read to understand this, please cite it. (I am a beginner level programmer and new to D Language.) Thanks, Gopan.
It's O(k) with k = vals.length, which is the number of arguments given to the function, which is a compile time constant and not dependent on any input of the final program. In O(n) the n always relates to the input somehow.
I agree that vals.length is known at compile time. But while running the application, it iterates through every val in the input list until a match is found. So, how can that be considered O(1)? See my test app and output. uint among(T, Us...)(T v, Us vals) { foreach (i, U; Us) { writeln("checking ", v, " == ", vals[i]); if (v == vals[i]) return i + 1; } return 0; } void main() { uint index = among(3, 1,2,5,3); writeln("Index of 3 in (1,2,5,3) is ", index); } outrput of running the app: checking 3 == 1 checking 3 == 2 checking 3 == 5 checking 3 == 3 Index of 3 in (1,2,5,3) is 4 Or, is my undertanding about Big-O notation of complexity wrong?
You get that output only because you have inserted the writeln() expressions in there. That foreach over variadic template arguments is a "compile-time foreach"[1]. For example, if v were 2 and vals were "1, 2, 3", the original code would be expanded to the following: if (2 == 1) return 1; // i == 0 if (2 == 2) return 2; // i == 1 if (2 == 3) return 3; // i == 3 Since all of those conditions is known to be true or false at compile time, the compiler will reduce the code to the following: return 2; Ali [1] http://ddili.org/ders/d.en/tuples.html
Feb 19 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 19 February 2014 at 19:45:39 UTC, Ali Çehreli wrote:
 ...
This is much better explanation - O(n) compile-time algorithm which results in O(1) run-time code generated :)
Feb 19 2014
prev sibling next sibling parent reply "anonymous" <anonymous example.com> writes:
On Wednesday, 19 February 2014 at 19:45:39 UTC, Ali Çehreli wrote:
 On 02/19/2014 01:46 AM, Gopan wrote:
[...]
 uint among(T, Us...)(T v, Us vals)
 {
      foreach (i, U; Us)
      {
      writeln("checking ", v, " == ", vals[i]);
          if (v == vals[i])
          return i + 1;
      }
      return 0;
 }
[...]
 You get that output only because you have inserted the 
 writeln() expressions in there.

 That foreach over variadic template arguments is a 
 "compile-time foreach"[1]. For example, if v were 2 and vals 
 were "1, 2, 3", the original code would be expanded to the 
 following:

     if (2 == 1) return 1;    // i == 0
     if (2 == 2) return 2;    // i == 1
     if (2 == 3) return 3;    // i == 3

 Since all of those conditions is known to be true or false at 
 compile time, the compiler will reduce the code to the 
 following:

     return 2;

 Ali

 [1] http://ddili.org/ders/d.en/tuples.html
Nope. v and vals are runtime parameters, their values are not known at compile time. What's known at compile time is vals.length.
Feb 19 2014
parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 19 February 2014 at 21:52:06 UTC, anonymous wrote:
 Nope. v and vals are runtime parameters, their values are not
 known at compile time. What's known at compile time is
 vals.length.
While this is correct, note that there is also an overload that takes `values` as compile-time parameters, generating a switch statement, which is especially interesting because the switch statement is a good opportunity for the compiler to optimise.
Feb 19 2014
prev sibling parent reply "Gopan" <gggopan gmail.com> writes:
On Wednesday, 19 February 2014 at 19:45:39 UTC, Ali Çehreli wrote:
 On 02/19/2014 01:46 AM, Gopan wrote:

 uint among(T, Us...)(T v, Us vals)
 {
      foreach (i, U; Us)
      {
          writeln("checking ", v, " == ", vals[i]);
          if (v == vals[i])
              return i + 1;
      }
      return 0;
 }

 void main()
 {
      uint index = among(3, 1,2,5,3);
      writeln("Index of 3 in (1,2,5,3) is ", index);
 }

 outrput of running the app:
 checking 3 == 1
 checking 3 == 2
 checking 3 == 5
 checking 3 == 3
 Index of 3 in (1,2,5,3) is 4

 Or, is my undertanding about Big-O notation of complexity
wrong? You get that output only because you have inserted the writeln() expressions in there. That foreach over variadic template arguments is a "compile-time foreach"[1]. For example, if v were 2 and vals were "1, 2, 3", the original code would be expanded to the following: if (2 == 1) return 1; // i == 0 if (2 == 2) return 2; // i == 1 if (2 == 3) return 3; // i == 3 Since all of those conditions is known to be true or false at compile time, the compiler will reduce the code to the following: return 2; Ali
Great !! Thank you Ali. I am convinced. You have just taught me my first lesson of CTFE. After removing writeln() from the template, the following statements compiled without error, which means 'n' got computed at compile time. const uint n = among(3, 1,2,5,3); //no error int[n] arr; //no error. And, int v = 3; const uint n = among(v, 1,2,5,3); //Error: variable v cannot be read at compile time int[n] arr; Error: Integer constant expression expected instead of among(v, 1, 2, 5, 3) My conclusion is that when both 'v' and 'vals' are known at compile time, the complexity is perfectly O(1). I would even like to say O(0) at runtime :) And, when either 'v' or 'vals' is not known at compile time, the complexity is O(vals.length) because the computation happens at runtime.
 [1] http://ddili.org/ders/d.en/tuples.html
I have already started learning your tutorial. It is great !! Thanks all who posted. I learnt from every post. I love this forum.
Feb 20 2014
parent reply "anonymous" <anonymous example.com> writes:
On Thursday, 20 February 2014 at 09:42:34 UTC, Gopan wrote:
 const uint n = among(3, 1,2,5,3); //no error
 int[n] arr; //no error.

 And,
 int v = 3;
 const uint n = among(v, 1,2,5,3); //Error: variable v cannot be 
 read at compile time
 int[n] arr; Error: Integer constant expression expected instead 
 of among(v, 1, 2, 5, 3)

 My conclusion is that when both 'v' and 'vals' are known at 
 compile time, the complexity is perfectly O(1).  I would even 
 like to say O(0) at runtime :)
 And, when either 'v' or 'vals' is not known at compile time, 
 the complexity is O(vals.length) because the computation 
 happens at runtime.
When v and vals are known at compile time, that opens up the possibility of CTFE. It's necessary but not sufficient. When a value is needed at compile time (e.g. initializer for a static variable, length of a static array), then CTFE is forced. When the value is not needed at compile time, the optimizer may go for it, but that's not guaranteed.
Feb 20 2014
parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 20 February 2014 at 13:40:38 UTC, anonymous wrote:
 When the value is not needed at compile time, the optimizer may 
 go for it, but that's not guaranteed.
That's not really true. The optimizer will never try CTFE because of the halting problem. Runtime is runtime, compile-time is compile-time - they are clearly separated by the context of the expression alone.
Feb 20 2014
parent reply "anonymous" <anonymous example.com> writes:
On Thursday, 20 February 2014 at 17:08:11 UTC, Jakob Ovrum wrote:
 On Thursday, 20 February 2014 at 13:40:38 UTC, anonymous wrote:
 When the value is not needed at compile time, the optimizer 
 may go for it, but that's not guaranteed.
That's not really true. The optimizer will never try CTFE because of the halting problem. Runtime is runtime, compile-time is compile-time - they are clearly separated by the context of the expression alone.
LDC does it. The halting problem just dictates that it can't be done in all cases. test.d: --- int among(Value, Values...) (Value value, Values values) { foreach (i, v; values) if (value == v) return i + 1; return 0; } int main() { return among(3, 1,2,5,3); } --- `ldmd2 -O -c test.d && objdump -d -M intel test.o` (excerpt): --- 0000000000000000 <_Dmain>: 0: b8 04 00 00 00 mov eax,0x4 5: c3 ret 6: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0] d: 00 00 00 --- I.e. int main() {return 4;}
Feb 20 2014
parent "anonymous" <anonymous example.com> writes:
On Thursday, 20 February 2014 at 13:40:38 UTC, anonymous wrote:
 When a
 value is needed at compile time (e.g. initializer for a static
 variable, length of a static array), then CTFE is forced. When
 the value is not needed at compile time, the optimizer may go 
 for
 it, but that's not guaranteed.
On Thursday, 20 February 2014 at 21:23:08 UTC, anonymous wrote:
 On Thursday, 20 February 2014 at 17:08:11 UTC, Jakob Ovrum 
 wrote:
 On Thursday, 20 February 2014 at 13:40:38 UTC, anonymous wrote:
 When the value is not needed at compile time, the optimizer 
 may go for it, but that's not guaranteed.
That's not really true. The optimizer will never try CTFE because of the halting problem. Runtime is runtime, compile-time is compile-time - they are clearly separated by the context of the expression alone.
LDC does it. The halting problem just dictates that it can't be done in all cases.
While it's evaluation at compile time, that optimization should not be called "CTFE". The way I had worded it, it sounded like __ctfe may or may not be true, depending on the optimizer's mood. That's not so, of course.
Feb 21 2014
prev sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Wednesday, 19 February 2014 at 09:46:04 UTC, Gopan wrote:
 On Wednesday, 19 February 2014 at 09:09:25 UTC, Tobias Pankrath
 It's O(k) with k = vals.length, which is the number of 
 arguments given to the function, which is a compile time 
 constant and not dependent on any input of the final program. 
 In O(n) the n always relates to the input somehow.
I agree that vals.length is known at compile time. But while running the application, it iterates through every val in the input list until a match is found. So, how can that be considered O(1)? ... Or, is my undertanding about Big-O notation of complexity wrong?
Your understanding is probably fine. It's just one of those technicalities which is a bit misleading. The algorithm is O(vals.length), there's no arguing about that, but since vals.length is a constant, it's also O(1) because O(1) = O(2) = O(1,000,000) = O(k) for any constant k. If you want to get even more anal about it, searching an array is technically O(1) because an array cannot be bigger than size_t.max, and since size_t.max is a constant, O(size_t.max) is O(1). Again, completely misleading but technically correct in an esoteric, academic way.
Feb 19 2014
next sibling parent reply "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Wednesday, 19 February 2014 at 22:05:49 UTC, Peter Alexander 
wrote:
 If you want to get even more anal about it, searching an array 
 is technically O(1) because an array cannot be bigger than 
 size_t.max, and since size_t.max is a constant, O(size_t.max) 
 is O(1). Again, completely misleading but technically correct 
 in an esoteric, academic way.
That's basically saying that on any linear bouded automaton, all algorithms (except an infinite loop) are O(1) - which is silly. To the OP: The among() function is defined to template out into multiple versions of itself depending on the number of arguments, rather than using variadic parameter checking during runtime. When the template writes out the definition of the function, if the parameters can be interpreted during compile-time, then the foreach ( O(n) ) will be run by the compiler to detect the correct parameter and the function will be written out as just as a return statement in the body, without the loop. So during runtime, the function would have an O(1) runtime. If the parameters can't be determined during compile-time then the function will be written out with the contents of the foreach copied into the body as many times as there are parameters, making the runtime O(n), where n is the number of parameters to the function. Where people are getting a bit technical is that, because the function is generated by a template the following two calls: among("a", "b", "c"); among("a", "b", "c", "d"); end up declaring two separate functions, one which accepts three parameters and the other which accepts four. The version which accepts three parameters will always run the same speed no matter what you call it - since it does the same thing exactly three times. The version which accepts four parameters is the same. Since they don't have any logic - they just run through a constant list of operations with no loop - one can say that they are O(1). Technically, it's true, but also a mildly pointless distinction to make.
Feb 19 2014
parent reply "anonymous" <anonymous example.com> writes:
On Wednesday, 19 February 2014 at 22:46:45 UTC, Chris Williams
wrote:
 When the template writes out the definition of the function, if 
 the parameters can be interpreted during compile-time, then the 
 foreach ( O(n) ) will be run by the compiler to detect the 
 correct parameter and the function will be written out as just 
 as a return statement in the body, without the loop. So during 
 runtime, the function would have an O(1) runtime.
That's an optimization the compiler may do, but it's not guaranteed. And `among` is not special in this regard. [...]
 The version which accepts three parameters will always run the 
 same speed no matter what you call it - since it does the same 
 thing exactly three times.
Well, it returns as soon as the value is found. So among(v, "b", "c") finishes quicker when v is "b" than when it's "c" or "a". It only does the same thing three times when it didn't find the value after two times.
Feb 19 2014
parent reply "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Thursday, 20 February 2014 at 00:36:38 UTC, anonymous wrote:
 On Wednesday, 19 February 2014 at 22:46:45 UTC, Chris Williams
 wrote:
 When the template writes out the definition of the function, 
 if the parameters can be interpreted during compile-time, then 
 the foreach ( O(n) ) will be run by the compiler to detect the 
 correct parameter and the function will be written out as just 
 as a return statement in the body, without the loop. So during 
 runtime, the function would have an O(1) runtime.
That's an optimization the compiler may do, but it's not guaranteed. And `among` is not special in this regard.
I think "expected to do" is probably the better phrasing at the moment. Traits/annotations would be largely unusable unless foreach was able to behave as a static construct. And since there is no way for one to declare "static foreach", the only way foreach can be used for running over traits, is for the compiler to detect constant values in the parameters to the statement.
Feb 19 2014
parent reply "anonymous" <anonymous example.com> writes:
On Thursday, 20 February 2014 at 01:41:25 UTC, Chris Williams
wrote:
 On Thursday, 20 February 2014 at 00:36:38 UTC, anonymous wrote:
 On Wednesday, 19 February 2014 at 22:46:45 UTC, Chris Williams
 wrote:
 When the template writes out the definition of the function, 
 if the parameters can be interpreted during compile-time, 
 then the foreach ( O(n) ) will be run by the compiler to 
 detect the correct parameter and the function will be written 
 out as just as a return statement in the body, without the 
 loop. So during runtime, the function would have an O(1) 
 runtime.
That's an optimization the compiler may do, but it's not guaranteed. And `among` is not special in this regard.
I think "expected to do" is probably the better phrasing at the moment. Traits/annotations would be largely unusable unless foreach was able to behave as a static construct. And since there is no way for one to declare "static foreach", the only way foreach can be used for running over traits, is for the compiler to detect constant values in the parameters to the statement.
The foreach is definitely always evaluated at compile time. That's independent of the values being known statically. It generates a couple of if statements. But optimizing those if statements away for static values is not guaranteed or expected. In other words, with code, this: --- uint among(Value, Values...) (Value value, Values values) { foreach (i, v; values) if (value == v) return i + 1; return 0; } auto foo = among("a", "b", "c"); --- definitely boils down to something like this: --- uint among_string_string_string(string value, string v0, string v2) { if(value == v0) return 1; if(value == v1) return 2; return 0; } auto foo = among_string_string_string("a", "b", "c"); --- But it's a mere optimization to get to this: --- auto foo = 0; ---
Feb 19 2014
parent "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Thursday, 20 February 2014 at 02:10:50 UTC, anonymous wrote:
 The foreach is definitely always evaluated at compile time.
 That's independent of the values being known statically. It
 generates a couple of if statements. But optimizing those if
 statements away for static values is not guaranteed or expected.
Ah, roger.
Feb 19 2014
prev sibling parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 19 February 2014 at 22:05:49 UTC, Peter Alexander 
wrote:
 O(1) = O(2) = O(1,000,000) = O(k) for any constant k.

 If you want to get even more anal about it, searching an array 
 is technically O(1) because an array cannot be bigger than 
 size_t.max, and since size_t.max is a constant, O(size_t.max) 
 is O(1). Again, completely misleading but technically correct 
 in an esoteric, academic way.
That's a useless oversimplification since for all sizes lesser than size_t.max there is a better input-length depending bound for the complexity. Just like every algorithm that is in O(n) is also in O(n^3), but usually you are interested • in the smallest upper bound • that depends on the input. The input dependency is important because it lets us reason about how the algorithm might scale. Now, let n be the size of input, for example in this application, n is input.length. -- int[] input = getInputFromSomeWhere(); foreach(i; input) { if(i.among!(1, 42, 12)) { writeln(i); break; } } -- Now, among does not depend of input.length and it will never do in any application. That is the sole reason I say it's O(1).
Feb 20 2014
parent "Tobias Pankrath" <tobias pankrath.net> writes:
 }
 --

 Now, among does not depend of input.length and it will never do 
 in any application. That is the sole reason I say it's O(1).
The application is of course O(n).
Feb 20 2014