www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - chevrons for D templates in emacs

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
With the help of Pascal J. Bourguignon on the gnu.emacs.help newsgroup I 
got some chevrons working for D templates. The current version has quite 
a few shortcomings (no proper nesting and electric behavior could be 
better) but it does get us started towards finding a comprehensive 
solution.

Paste this into your ~/.emacs (assuming you already use Bill Baxter's 
d-mode and of course emacs):

(defun d-chevrons ()
   (font-lock-add-keywords
    nil '(("\\(!(\\)[^()]*\\()\\)"
           0 (progn
               (compose-region (match-beginning 1) (match-end 1)
                               ? 'decompose-region)
                               (compose-region (match-beginning 2) 
(match-end 2)
                                               ? 'decompose-region)
               nil))))
   (font-lock-mode 1))

(add-hook 'd-mode-hook 'd-chevrons)


The result looks pretty darn yummy.

Andrei
Oct 12 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
Oct 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Oct 12 2008
next sibling parent reply Anon <no one.com> writes:
Andrei Alexandrescu Wrote:

 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).
Oct 12 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Anon wrote:
 Andrei Alexandrescu Wrote:
 
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

grousing about !( ) after that!!!

Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).

A winner but gets penalty points for not admitting being superdan :o). Andrei
Oct 12 2008
prev sibling parent Janderson <ask me.com> writes:
Anon wrote:
 Andrei Alexandrescu Wrote:
 
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

grousing about !( ) after that!!!

Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).

This is a similar trick I've used in the past for A*. Basically you only care about the top item(s) so you only sort them those. With A* of course there's the potential that you might run out of top items. However items that where candidates for the queue but where pushed out are already partially sorted so you can save some cycles time there too. -Joel
Oct 12 2008
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

I smell a clever trick to divert us all from rolling our eyes about the emacs chevron trick :-) Fine by me. I'll bite... auto heap = new BinaryHeap!(File, ulong)(1_000_000); while (googleApi.hasMoreFiles()) { File f = googleApi.getNextFile(); ulong size = googleApi.sizeOfFile(f); heap.add(f, size); } while (heap.size() > 0) { ulong size = heap.getTopValue(); File f = heap.removeTopItem(); Stdout.formatln("file: {0}, size: {1}", f.getName(), size); } The BinaryHeap class I'm using is described pretty well here: http://en.wikipedia.org/wiki/Binary_heap Items with the highest value automatically bubble to the top of the heap, and items with lower values sink to the bottom, eventually falling completely out of the collection. The performance is n log m, where 'n' is the total number of documents, and 'm' is the number of <filename, size> pairs that we care about (in this case 1,000,000). --benji
Oct 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

I smell a clever trick to divert us all from rolling our eyes about the emacs chevron trick :-) Fine by me. I'll bite... auto heap = new BinaryHeap!(File, ulong)(1_000_000); while (googleApi.hasMoreFiles()) { File f = googleApi.getNextFile(); ulong size = googleApi.sizeOfFile(f); heap.add(f, size); } while (heap.size() > 0) { ulong size = heap.getTopValue(); File f = heap.removeTopItem(); Stdout.formatln("file: {0}, size: {1}", f.getName(), size); } The BinaryHeap class I'm using is described pretty well here: http://en.wikipedia.org/wiki/Binary_heap Items with the highest value automatically bubble to the top of the heap, and items with lower values sink to the bottom, eventually falling completely out of the collection. The performance is n log m, where 'n' is the total number of documents, and 'm' is the number of <filename, size> pairs that we care about (in this case 1,000,000). --benji

Winner! One nice touch is that you don't even bother to sort the heap; you only remove from it. I wonder if this will be overall sensibly faster than just heapsorting the heap. Thoughts? Andrei
Oct 12 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you only 
 remove from it. I wonder if this will be overall sensibly faster than 
 just heapsorting the heap. Thoughts?

I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji
Oct 12 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you 
 only remove from it. I wonder if this will be overall sensibly faster 
 than just heapsorting the heap. Thoughts?

I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji

Neither Tango nor Phobos contains a heap implementation. I've submitted one to Tango, but I don't think anyone's done anything about it. My implementation's rather smaller, but that may be due to the number of comments yours has. The comment lines and code lines are approaching parity in yours. I believe that mine also releases memory if you add a lot to it and subsequently remove a lot from it.
Oct 13 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Benji Smith wrote:
 Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you 
 only remove from it. I wonder if this will be overall sensibly faster 
 than just heapsorting the heap. Thoughts?

I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji

Neither Tango nor Phobos contains a heap implementation. I've submitted one to Tango, but I don't think anyone's done anything about it.

There are heap primitives in std.algorithm, just not published yet. Anyhow, in case you or anyone would be interested in putting your ideas in Phobos, I encourage you to share some code with me so I can look over it. Andrei
Oct 13 2008
parent reply Klaus Oberhofer <kobi goppertsweiler.de> writes:
Andrei Alexandrescu schrieb:
 There are heap primitives in std.algorithm, just not published yet. 
 Anyhow, in case you or anyone would be interested in putting your 
 ideas in Phobos, I encourage you to share some code with me so I can 
 look over it.

 Andrei

Some time ago I implemented a binary heap because I needed it to create Huffman codes (Warning: Maybe the code does not catch up to the current Tango library.). Parts of the code are inspired by the MINTL library, which contains an implementation, too. See my code at http://www.dsource.org/projects/nova/browser/trunk/nova/ds/priorityqueue.d MINTL seems to be abandoned, but I found it in the necrophilia repos: http://necrophilia.googlecode.com/svn/trunk/tests/shared/mintl/arrayheap.d KlausO
Oct 13 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
Klaus Oberhofer wrote:
 Andrei Alexandrescu schrieb:
 There are heap primitives in std.algorithm, just not published yet. 
 Anyhow, in case you or anyone would be interested in putting your 
 ideas in Phobos, I encourage you to share some code with me so I can 
 look over it.

 Andrei

Some time ago I implemented a binary heap because I needed it to create Huffman codes (Warning: Maybe the code does not catch up to the current Tango library.). Parts of the code are inspired by the MINTL library, which contains an implementation, too. See my code at http://www.dsource.org/projects/nova/browser/trunk/nova/ds/priorityqueue.d MINTL seems to be abandoned, but I found it in the necrophilia repos: http://necrophilia.googlecode.com/svn/trunk/tests/shared/mintl/arrayheap.d KlausO

There isn't a lot you can do to mess up a heap, but there are some minor issues with yours: 1. It only has one direction (which isn't documented). 2. It has a key/value pair as its element type. 3. It's only keyed on ints. My heap was apparently accepted into Tango last week, with some minor fixes and cleanup (and switched into a struct rather than a class; not sure I like that). The only issue I see with the fixed version is that it uses recursion in a couple places where it could loop instead. Anyway, the code's here: http://dsource.org/projects/tango/browser/trunk/tango/util/container/more/Heap.d?rev=3959
Oct 13 2008
parent KlausO <oberhofer users.sf.net> writes:
Christopher Wright schrieb:
 
 There isn't a lot you can do to mess up a heap, but there are some minor 
 issues with yours:
 1. It only has one direction (which isn't documented).
 2. It has a key/value pair as its element type.
 3. It's only keyed on ints.

I admit, the implementation has a special purpose. It sorts codes by their propability to create Huffman codes. All these three points are owed to this context. So the class name "PriorityQueue" is misleading, maybe I had to choose a better name like "CodeByProbabilitySortingPriorityQueue" :)
 
 My heap was apparently accepted into Tango last week, with some minor 
 fixes and cleanup (and switched into a struct rather than a class; not 
 sure I like that). The only issue I see with the fixed version is that 
 it uses recursion in a couple places where it could loop instead.
 
 Anyway, the code's here: 
 http://dsource.org/projects/tango/browser/trunk/tango/util/container/
ore/Heap.d?rev=3959 
 

Thank you for the link. I try to use your general purpose implementation. KlausO
Oct 14 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 12 Oct 2008 19:11:00 -0500,
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments.

A straight-forward solution would be putting name-size pairs into a tree sorted by size, which is O(1e6 log 1e6). Then, if the tree contains more than a million elements, remove the smallest which is also O(1e6 log 1e6). You can optimize it by skipping elements smaller than the smallest element in the tree if the tree is full. I haven't found any trees in Phobos though. Am I missing something?
Oct 12 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Sun, 12 Oct 2008 19:11:00 -0500,
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

grousing about !( ) after that!!!

Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments.

A straight-forward solution would be putting name-size pairs into a tree sorted by size, which is O(1e6 log 1e6). Then, if the tree contains more than a million elements, remove the smallest which is also O(1e6 log 1e6). You can optimize it by skipping elements smaller than the smallest element in the tree if the tree is full. I haven't found any trees in Phobos though. Am I missing something?

The trick here was to figure that you don't really need a sorted container; an "almost sorted" container suffices, and a heap is exactly what the doctor prescribed. Andrei
Oct 12 2008
prev sibling parent reply KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).
Oct 12 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents);

One point taken off for not using the new syntax PriorityQueue!Documents. Ahem. On a serious note, you should specify the unusual comparison predicate: auto list = new PriorityQueue!(Documents, "a > b"); as I supposed the default version would use less-than. Or does it?
 // Documents.opCmp returns difference of two documents' length.
 // the SHORTEST document gets priority.
 
 foreach (doc; google_cache) {     // O(N)
   if (list.length >= 1_000_000)
     list.pop();                   // O(log 1M)
   list.push(doc);                 // O(log 1M)
 }
 
 auto array = convertToArray(list); // O(1M log 1M)
 array.reverse();                   // O(1M)
 
 write(array);                      // O(1M).
 
 Total time complexity: O(N log 1M).

Great! I'm glad there's so much interest and competency around here. Andrei
Oct 12 2008
parent KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents);

One point taken off for not using the new syntax PriorityQueue!Documents. Ahem. On a serious note, you should specify the unusual comparison predicate:

Will use it when it's announced :)
 auto list = new PriorityQueue!(Documents, "a > b");
 
 as I supposed the default version would use less-than. Or does it?
 

I don't know :). I haven't written a priority queue in D (only stacks and queues) :)
 // Documents.opCmp returns difference of two documents' length.
 // the SHORTEST document gets priority.

 foreach (doc; google_cache) {     // O(N)
   if (list.length >= 1_000_000)
     list.pop();                   // O(log 1M)
   list.push(doc);                 // O(log 1M)
 }

 auto array = convertToArray(list); // O(1M log 1M)
 array.reverse();                   // O(1M)

 write(array);                      // O(1M).

 Total time complexity: O(N log 1M).

Great! I'm glad there's so much interest and competency around here. Andrei

Oct 12 2008
prev sibling parent KennyTM~ <kennytm gmail.com> writes:
KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"

I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!

I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei

Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).

Sorry, wrong solution. Forgot to compare before pop. auto list = new PriorityQueue!(Documents, "a < b"); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000 && list.top() < doc) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).
Oct 12 2008