www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - chevrons for D templates in emacs

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
With the help of Pascal J. Bourguignon on the gnu.emacs.help newsgroup I 
got some chevrons working for D templates. The current version has quite 
a few shortcomings (no proper nesting and electric behavior could be 
better) but it does get us started towards finding a comprehensive 
solution.

Paste this into your ~/.emacs (assuming you already use Bill Baxter's 
d-mode and of course emacs):

(defun d-chevrons ()
   (font-lock-add-keywords
    nil '(("\\(!(\\)[^()]*\\()\\)"
           0 (progn
               (compose-region (match-beginning 1) (match-end 1)
                               ?« 'decompose-region)
                               (compose-region (match-beginning 2) 
(match-end 2)
                                               ?» 'decompose-region)
               nil))))
   (font-lock-mode 1))

(add-hook 'd-mode-hook 'd-chevrons)


The result looks pretty darn yummy.

Andrei
Oct 12 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
Oct 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Oct 12 2008
next sibling parent reply Anon <no one.com> writes:
Andrei Alexandrescu Wrote:

 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).
Oct 12 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Anon wrote:
 Andrei Alexandrescu Wrote:
 
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).
A winner but gets penalty points for not admitting being superdan :o). Andrei
Oct 12 2008
prev sibling parent Janderson <ask me.com> writes:
Anon wrote:
 Andrei Alexandrescu Wrote:
 
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
implement a max heap on top of a fixed size array. "insert" all pairs to the heap (if the current is smaller than everything on the heap and the heap is full the pair is discarded) insert is O(log(heap height)) =O(log (1_000_000)) for each file in google's cache, remove top is O(1).
This is a similar trick I've used in the past for A*. Basically you only care about the top item(s) so you only sort them those. With A* of course there's the potential that you might run out of top items. However items that where candidates for the queue but where pushed out are already partially sorted so you can save some cycles time there too. -Joel
Oct 12 2008
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
I smell a clever trick to divert us all from rolling our eyes about the emacs chevron trick :-) Fine by me. I'll bite... auto heap = new BinaryHeap!(File, ulong)(1_000_000); while (googleApi.hasMoreFiles()) { File f = googleApi.getNextFile(); ulong size = googleApi.sizeOfFile(f); heap.add(f, size); } while (heap.size() > 0) { ulong size = heap.getTopValue(); File f = heap.removeTopItem(); Stdout.formatln("file: {0}, size: {1}", f.getName(), size); } The BinaryHeap class I'm using is described pretty well here: http://en.wikipedia.org/wiki/Binary_heap Items with the highest value automatically bubble to the top of the heap, and items with lower values sink to the bottom, eventually falling completely out of the collection. The performance is n log m, where 'n' is the total number of documents, and 'm' is the number of <filename, size> pairs that we care about (in this case 1,000,000). --benji
Oct 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
I smell a clever trick to divert us all from rolling our eyes about the emacs chevron trick :-) Fine by me. I'll bite... auto heap = new BinaryHeap!(File, ulong)(1_000_000); while (googleApi.hasMoreFiles()) { File f = googleApi.getNextFile(); ulong size = googleApi.sizeOfFile(f); heap.add(f, size); } while (heap.size() > 0) { ulong size = heap.getTopValue(); File f = heap.removeTopItem(); Stdout.formatln("file: {0}, size: {1}", f.getName(), size); } The BinaryHeap class I'm using is described pretty well here: http://en.wikipedia.org/wiki/Binary_heap Items with the highest value automatically bubble to the top of the heap, and items with lower values sink to the bottom, eventually falling completely out of the collection. The performance is n log m, where 'n' is the total number of documents, and 'm' is the number of <filename, size> pairs that we care about (in this case 1,000,000). --benji
Winner! One nice touch is that you don't even bother to sort the heap; you only remove from it. I wonder if this will be overall sensibly faster than just heapsorting the heap. Thoughts? Andrei
Oct 12 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you only 
 remove from it. I wonder if this will be overall sensibly faster than 
 just heapsorting the heap. Thoughts?
I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji
Oct 12 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you 
 only remove from it. I wonder if this will be overall sensibly faster 
 than just heapsorting the heap. Thoughts?
I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji
Neither Tango nor Phobos contains a heap implementation. I've submitted one to Tango, but I don't think anyone's done anything about it. My implementation's rather smaller, but that may be due to the number of comments yours has. The comment lines and code lines are approaching parity in yours. I believe that mine also releases memory if you add a lot to it and subsequently remove a lot from it.
Oct 13 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Benji Smith wrote:
 Andrei Alexandrescu wrote:
 One nice touch is that you don't even bother to sort the heap; you 
 only remove from it. I wonder if this will be overall sensibly faster 
 than just heapsorting the heap. Thoughts?
I should have mentioned... The heap doesn't need to be sorted. Anytime you remove an element from the heap, it's guaranteed to have the highest value. That's the nice thing about the heap. Multiple consecutive removals of the topmost item is functionally identical to sorting it in descending order. Of course, every time you remove the topmost item, it has a cost of log n, where n is the number of remaining items in the heap. But sorting the whole heap would cost n log n anyhow, so the iterative removal process is actually slightly less expensive (since n decreases with every iteration). I actually don't have the code implemented in D, but I've attached my Java implementation, so you can get an idea of how it works. --benji
Neither Tango nor Phobos contains a heap implementation. I've submitted one to Tango, but I don't think anyone's done anything about it.
There are heap primitives in std.algorithm, just not published yet. Anyhow, in case you or anyone would be interested in putting your ideas in Phobos, I encourage you to share some code with me so I can look over it. Andrei
Oct 13 2008
parent reply Klaus Oberhofer <kobi goppertsweiler.de> writes:
Andrei Alexandrescu schrieb:
 There are heap primitives in std.algorithm, just not published yet. 
 Anyhow, in case you or anyone would be interested in putting your 
 ideas in Phobos, I encourage you to share some code with me so I can 
 look over it.

 Andrei
Some time ago I implemented a binary heap because I needed it to create Huffman codes (Warning: Maybe the code does not catch up to the current Tango library.). Parts of the code are inspired by the MINTL library, which contains an implementation, too. See my code at http://www.dsource.org/projects/nova/browser/trunk/nova/ds/priorityqueue.d MINTL seems to be abandoned, but I found it in the necrophilia repos: http://necrophilia.googlecode.com/svn/trunk/tests/shared/mintl/arrayheap.d KlausO
Oct 13 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
Klaus Oberhofer wrote:
 Andrei Alexandrescu schrieb:
 There are heap primitives in std.algorithm, just not published yet. 
 Anyhow, in case you or anyone would be interested in putting your 
 ideas in Phobos, I encourage you to share some code with me so I can 
 look over it.

 Andrei
Some time ago I implemented a binary heap because I needed it to create Huffman codes (Warning: Maybe the code does not catch up to the current Tango library.). Parts of the code are inspired by the MINTL library, which contains an implementation, too. See my code at http://www.dsource.org/projects/nova/browser/trunk/nova/ds/priorityqueue.d MINTL seems to be abandoned, but I found it in the necrophilia repos: http://necrophilia.googlecode.com/svn/trunk/tests/shared/mintl/arrayheap.d KlausO
There isn't a lot you can do to mess up a heap, but there are some minor issues with yours: 1. It only has one direction (which isn't documented). 2. It has a key/value pair as its element type. 3. It's only keyed on ints. My heap was apparently accepted into Tango last week, with some minor fixes and cleanup (and switched into a struct rather than a class; not sure I like that). The only issue I see with the fixed version is that it uses recursion in a couple places where it could loop instead. Anyway, the code's here: http://dsource.org/projects/tango/browser/trunk/tango/util/container/more/Heap.d?rev=3959
Oct 13 2008
parent KlausO <oberhofer users.sf.net> writes:
Christopher Wright schrieb:
 
 There isn't a lot you can do to mess up a heap, but there are some minor 
 issues with yours:
 1. It only has one direction (which isn't documented).
 2. It has a key/value pair as its element type.
 3. It's only keyed on ints.
I admit, the implementation has a special purpose. It sorts codes by their propability to create Huffman codes. All these three points are owed to this context. So the class name "PriorityQueue" is misleading, maybe I had to choose a better name like "CodeByProbabilitySortingPriorityQueue" :)
 
 My heap was apparently accepted into Tango last week, with some minor 
 fixes and cleanup (and switched into a struct rather than a class; not 
 sure I like that). The only issue I see with the fixed version is that 
 it uses recursion in a couple places where it could loop instead.
 
 Anyway, the code's here: 
 http://dsource.org/projects/tango/browser/trunk/tango/util/container/
ore/Heap.d?rev=3959 
 
Thank you for the link. I try to use your general purpose implementation. KlausO
Oct 14 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 12 Oct 2008 19:11:00 -0500,
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments.
A straight-forward solution would be putting name-size pairs into a tree sorted by size, which is O(1e6 log 1e6). Then, if the tree contains more than a million elements, remove the smallest which is also O(1e6 log 1e6). You can optimize it by skipping elements smaller than the smallest element in the tree if the tree is full. I haven't found any trees in Phobos though. Am I missing something?
Oct 12 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Sun, 12 Oct 2008 19:11:00 -0500,
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments.
A straight-forward solution would be putting name-size pairs into a tree sorted by size, which is O(1e6 log 1e6). Then, if the tree contains more than a million elements, remove the smallest which is also O(1e6 log 1e6). You can optimize it by skipping elements smaller than the smallest element in the tree if the tree is full. I haven't found any trees in Phobos though. Am I missing something?
The trick here was to figure that you don't really need a sorted container; an "almost sorted" container suffices, and a heap is exactly what the doctor prescribed. Andrei
Oct 12 2008
prev sibling next sibling parent reply KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).
Oct 12 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents);
One point taken off for not using the new syntax PriorityQueue!Documents. Ahem. On a serious note, you should specify the unusual comparison predicate: auto list = new PriorityQueue!(Documents, "a > b"); as I supposed the default version would use less-than. Or does it?
 // Documents.opCmp returns difference of two documents' length.
 // the SHORTEST document gets priority.
 
 foreach (doc; google_cache) {     // O(N)
   if (list.length >= 1_000_000)
     list.pop();                   // O(log 1M)
   list.push(doc);                 // O(log 1M)
 }
 
 auto array = convertToArray(list); // O(1M log 1M)
 array.reverse();                   // O(1M)
 
 write(array);                      // O(1M).
 
 Total time complexity: O(N log 1M).
Great! I'm glad there's so much interest and competency around here. Andrei
Oct 12 2008
parent KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents);
One point taken off for not using the new syntax PriorityQueue!Documents. Ahem. On a serious note, you should specify the unusual comparison predicate:
Will use it when it's announced :)
 auto list = new PriorityQueue!(Documents, "a > b");
 
 as I supposed the default version would use less-than. Or does it?
 
I don't know :). I haven't written a priority queue in D (only stacks and queues) :)
 // Documents.opCmp returns difference of two documents' length.
 // the SHORTEST document gets priority.

 foreach (doc; google_cache) {     // O(N)
   if (list.length >= 1_000_000)
     list.pop();                   // O(log 1M)
   list.push(doc);                 // O(log 1M)
 }

 auto array = convertToArray(list); // O(1M log 1M)
 array.reverse();                   // O(1M)

 write(array);                      // O(1M).

 Total time complexity: O(N log 1M).
Great! I'm glad there's so much interest and competency around here. Andrei
Oct 12 2008
prev sibling parent KennyTM~ <kennytm gmail.com> writes:
KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
    nil '(("\\(!(\\)[^()]*\\()\\)"
I guess this is why I don't use emacs. I don't want to hear any more grousing about !( ) after that!!!
I agree about that, but why don't you use std.algorithm? :o) Speaking of which, here's another challenge for everybody who'd like to waste some cycles. Say you have a simple API for accessing a large collection of files - e.g. all of Google's cached documents. The task, should you accept it, is to find the 1,000,000 largest ones of those files. The output should be filenames sorted in decreasing order of size. Assume the API gives you <filename, size> pairs in a serial fashion. You can't use parallel processing (e.g. map/reduce on clusters), but you are allowed to use threads on one machine if if fancies you. Speed is highly desirable. Devise your algorithm and explain how fast it runs with practical and/or theoretical arguments. Andrei
Straightforward solution: Use insertion sort. auto list = new PriorityQueue!(Documents); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).
Sorry, wrong solution. Forgot to compare before pop. auto list = new PriorityQueue!(Documents, "a < b"); // Documents.opCmp returns difference of two documents' length. // the SHORTEST document gets priority. foreach (doc; google_cache) { // O(N) if (list.length >= 1_000_000 && list.top() < doc) list.pop(); // O(log 1M) list.push(doc); // O(log 1M) } auto array = convertToArray(list); // O(1M log 1M) array.reverse(); // O(1M) write(array); // O(1M). Total time complexity: O(N log 1M).
Oct 12 2008
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:
 http://dsource.org/projects/tango/browser/trunk/tango/util/container/more/Heap.d?rev=3959
Very nice, thank you. Few quick comments (no deep comments, I haven't used it nor read it carefully): In this lines: assert (other.pop is 4); I think it's better to use assert(other.pop == 4); This lines show that a bulk addition method may be useful, that accepts any lazy/eagar iterable: MaxHeap!(uint) h; h ~= 1; h ~= 3; h ~= 2; h ~= 4; The following aliases can be moved just below their respective methods, with a /// ditto before them: alias pop remove; alias push opCatAssign; This style of comments: /** Inserts all elements in the given array into the heap. */ Can sometimes be replaced by: /// Inserts all elements in the given array into the heap. The class ddoc misses the explanation for the Min template argument: struct Heap(T, bool Min) { Some lines of comments are too much long, they may be broken at 80-95 chars long. An optional 'key' callable can be added; it can be used as sorting key mapper (Swartz-style). If not specified it can be "null" that means the current behaviour. Few other handy methods may be added: A merge() (~ and ~=) method can be added, maybe. heapreplace(heap, item) (pop and return the smallest item from the heap, and also push the new item. The heap size doesn't change) heapify(iterable) nlargest(n, iterable) nsmallest(n, iterable) The following is not a critic, just a note, I think programs with such low code density are harder to read to me, they seem like fresh snow: /** Inserts all elements in the given array into the heap. */ void push (T[] array) { while (heap.length < next + array.length) { heap.length = 2 * heap.length + 32; } foreach (t; array) push (t); } I'd write that as this, allowing me to have more code on the screen: /// Inserts all elements in the given array into the heap. void push(T[] array) { while (heap.length < next + array.length) heap.length = 2 * heap.length + 32; foreach (t; array) push(t); } In the future FibonacciHeap too may be useful to be added to the Std libraries. Bye, bearophile
Oct 14 2008
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
bearophile wrote:
 Christopher Wright:
 http://dsource.org/projects/tango/browser/trunk/tango/util/container/more/Heap.d?rev=3959
Very nice, thank you. Few quick comments (no deep comments, I haven't used it nor read it carefully): In this lines: assert (other.pop is 4); I think it's better to use assert(other.pop == 4);
Any particular reason?
 This lines show that a bulk addition method may be useful, that accepts any
lazy/eagar iterable:
 MaxHeap!(uint) h;
 h ~= 1;
 h ~= 3;
 h ~= 2;
 h ~= 4;
Agreed. The implementation that I submitted had a push method that took an array. If I had given it more thought, I would've accepted any Tango container, as well, and maybe have a vararg version. (Except, ugh, the new containers are all structs. That means writing a template method for the task rather than using an interface.)
 The following aliases can be moved just below their respective methods, with a
/// ditto before them:
 alias pop       remove;
 alias push      opCatAssign;
That was changed after I submitted the code.
 This style of comments:
 /** Inserts all elements in the given array into the heap. */
 
 Can sometimes be replaced by:
 /// Inserts all elements in the given array into the heap.
And then it takes slightly more work to change it into a multiline comment.
 The class ddoc misses the explanation for the Min template argument:
 struct Heap(T, bool Min) {
The version I submitted didn't have the Min template argument; it had a virtual comparison method instead. That method was documented. The added argument was not documented.
 Some lines of comments are too much long, they may be broken at 80-95 chars
long.
Dunno how that slipped by me.
 An optional 'key' callable can be added; it can be used as sorting key mapper
(Swartz-style). If not specified it can be "null" that means the current
behaviour.
I actually need this behavior. The version I submitted had a virtual comparison method, so that would have been reasonably simple to implement: class ComparatorHeap (T, alias comparator) : Heap!(T) { override int comp (T left, T right) { return comparator (left, right); } } Or: class HeapMap (TKey, TValue) : Heap!(KeyValuePair!(TKey, TValue)) { // add methods for inserting a key and a value without having to // construct a KeyValuePair } As it stands, you'd need to use composition and forward a lot of methods to an internal struct -- that's just hideous. Tango has been on a struct rampage. It's part of their campaign against extensibility. I guess making everything private or final just wasn't enough. I jest. But it is a serious concern of mine. If I see anything labeled "package" rather than "protected", my liver will burst. At the very least, I'll write a script to change anything private to protected, and anything protected or package to public.
 Few other handy methods may be added:
 
 A merge() (~ and ~=) method can be added, maybe.
 
 heapreplace(heap, item) (pop and return the smallest item from the heap, and
also push the new item. The heap size doesn't change)
 
 heapify(iterable)
 nlargest(n, iterable) 
 nsmallest(n, iterable) 
 
 
 The following is not a critic, just a note, I think programs with such low
code density are harder to read to me, they seem like fresh snow:
<snip>
 I'd write that as this, allowing me to have more code on the screen:
I use a large monitor. Whitespace makes things more readable for me. Though the giant tabs are annoying; I use a tabstop of four. Still, this is only a display issue.
 In the future FibonacciHeap too may be useful to be added to the Std libraries.
 
 Bye,
 bearophile
Thanks for pointing out those issues. I'll correct those I can, but even after that, the Tango heap will not suffice for my needs.
Oct 14 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:

Any particular reason?<
'is' is for object equivalence. But you have values there.
And then it takes slightly more work to change it into a multiline comment.<
Right.
I actually need this behavior.<
I'll probably add that key mapping function to your code in the following days. If you want I may show you the code later...
The version I submitted had a virtual comparison method, so that would have
been reasonably simple to implement:<
Interesting.
I use a large monitor. Whitespace makes things more readable for me.<
I think older people tend to write more compact code, because in the past monitors were smaller (and probably because of other factors, like the amount of code you can put on the screen at once) while the new style (that can be sparse lines on the screen. I think my style is midway.
the Tango heap will not suffice for my needs.<
Well, if you are the author you can change the situation, can't you? :-) Bye, bearophile
Oct 14 2008
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Actually, I STRONGLY urge that nobody use this heap.

It's a struct. If you access it by value and append to it, then access 
it from any other alias, you will erase a random element from the heap.

If it were a class, this would not be possible. As I implemented it, it 
would not be possible.

Containers as value types is such an idiotic idea for precisely this reason.
Oct 14 2008
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Christopher Wright" wrote
 Actually, I STRONGLY urge that nobody use this heap.

 It's a struct. If you access it by value and append to it, then access it 
 from any other alias, you will erase a random element from the heap.

 If it were a class, this would not be possible. As I implemented it, it 
 would not be possible.

 Containers as value types is such an idiotic idea for precisely this 
 reason.
Unless such value types have copy constructors which make a copy of their elements... Or are you calling the authors of STL idiots ;) -Steve
Oct 14 2008
prev sibling next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Christopher Wright wrote:
 Actually, I STRONGLY urge that nobody use this heap.
 
 It's a struct. If you access it by value and append to it, then access 
 it from any other alias, you will erase a random element from the heap.
 
 If it were a class, this would not be possible. As I implemented it, it 
 would not be possible.
 
 Containers as value types is such an idiotic idea for precisely this 
 reason.
Correction: as partial value types. If they were completely value types, it'd be stupid for a much different reason -- namely, that they're very difficult to resize. But this class of bugs would not be an issue. D's builtin arrays have the same problem. In their case, the only real way around the issue is setting the first sizeof(size_t) bytes to the length rather than putting that on the stack. Any situation in which only part of your state is shared is a potential for getting data structures into an erroneous state.
Oct 14 2008
parent Christopher Wright <dhasenan gmail.com> writes:
Christopher Wright wrote:
 Christopher Wright wrote:
 Actually, I STRONGLY urge that nobody use this heap.

 It's a struct. If you access it by value and append to it, then access 
 it from any other alias, you will erase a random element from the heap.

 If it were a class, this would not be possible. As I implemented it, 
 it would not be possible.

 Containers as value types is such an idiotic idea for precisely this 
 reason.
Correction: as partial value types. If they were completely value types, it'd be stupid for a much different reason -- namely, that they're very difficult to resize. But this class of bugs would not be an issue.
Argh, I'm making more of an idiot of myself. If they're value types by merit of being entirely on the stack, yes. If they're value types by merit of the behavior of assignment, then I'm full of crap and you should pay me no mind.
Oct 14 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Actually, I STRONGLY urge that nobody use this heap.
 
 It's a struct. If you access it by value and append to it, then access 
 it from any other alias, you will erase a random element from the heap.
 
 If it were a class, this would not be possible. As I implemented it, it 
 would not be possible.
 
 Containers as value types is such an idiotic idea for precisely this 
 reason.
I'd say this statement reflects a lack of understanding of value types, rather than value containers are an idiotic idea. Andrei
Oct 14 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Actually, I STRONGLY urge that nobody use this heap.

 It's a struct. If you access it by value and append to it, then access 
 it from any other alias, you will erase a random element from the heap.

 If it were a class, this would not be possible. As I implemented it, 
 it would not be possible.

 Containers as value types is such an idiotic idea for precisely this 
 reason.
I'd say this statement reflects a lack of understanding of value types, rather than value containers are an idiotic idea. Andrei
I'll bite. Why is it a good idea that containers are implements as structs? I too was pretty astonished to discover this pattern in tango. And why just for a few container types? Stack, Heap, and Vector are structs, while all other container types are classes. --benji
Oct 14 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Actually, I STRONGLY urge that nobody use this heap.

 It's a struct. If you access it by value and append to it, then 
 access it from any other alias, you will erase a random element from 
 the heap.

 If it were a class, this would not be possible. As I implemented it, 
 it would not be possible.

 Containers as value types is such an idiotic idea for precisely this 
 reason.
I'd say this statement reflects a lack of understanding of value types, rather than value containers are an idiotic idea. Andrei
I'll bite. Why is it a good idea that containers are implements as structs?
As the Mexican would say: why not? Andrei
Oct 14 2008
parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Containers as value types is such an idiotic idea for precisely this 
 reason.
I'd say this statement reflects a lack of understanding of value types, rather than value containers are an idiotic idea.
I'll bite. Why is it a good idea that containers are implements as structs?
As the Mexican would say: why not?
Well... here are a few reasons why not: Structs can't implement interfaces, and a collection API is the poster-child of good interface usage. Structs can't inherit from abstract classes. A good deal of functionality in a collection class can be implemented once in an abstract base class, making it simpler for subclass authors to implement the API without duplicating a bunch of boilerplate. Conceptually -- and there's no hard and fast rule about this -- structs usually either represent small logically-atomic values (DateTime), or some fixed-size collection of logically-atomic values (Vector3). Using a struct as an arbitrarily-sized container seems, on the face of it, to break those conventions. --benji
Oct 14 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Benji Smith wrote:
 Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Containers as value types is such an idiotic idea for precisely 
 this reason.
I'd say this statement reflects a lack of understanding of value types, rather than value containers are an idiotic idea.
I'll bite. Why is it a good idea that containers are implements as structs?
As the Mexican would say: why not?
Well... here are a few reasons why not: Structs can't implement interfaces, and a collection API is the poster-child of good interface usage.
I don't want to seem like picking on you, but I would like to answer this message too. In my humble opinion, collections APIs are the poster-child of the misguided joy of the beginnings of object-orientation, right next to the mammal isa animal example. Organizing collections in hierarchies is of occasional benefit, but the real benefit comes from decoupling containers from algorithms via iterators, the way the STL did. The savings there were that an O(m * n) problem has been solved with writing O(m + n) code. Container hierarchies do not achieve such a reduction unless efforts are being made that have little to do with hierarchical organization. At most they factor out some code in the upper classes, but hierarchies help when types have many commonalities and few difference, and that is very rarely the case with containers.
 Structs can't inherit from abstract classes. A good deal of 
 functionality in a collection class can be implemented once in an 
 abstract base class, making it simpler for subclass authors to implement 
 the API without duplicating a bunch of boilerplate.
This also reveals a beginner's view of inheritance. You don't inherit to reuse. You inherit to be reused. If all you want is to reuse, use composition.
 Conceptually -- and there's no hard and fast rule about this -- structs 
 usually either represent small logically-atomic values (DateTime), or 
 some fixed-size collection of logically-atomic values (Vector3). Using a 
 struct as an arbitrarily-sized container seems, on the face of it, to 
 break those conventions.
I see how it breaks those conventions, but I've never heard of them. The convention I heard of is that if you want dynamic polymorphism you use classes, otherwise you don't. Andrei
Oct 15 2008
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 15 Oct 2008 09:17:57 -0500,
Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
Oct 15 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism. Andrei
Oct 15 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 15 Oct 2008 09:49:08 -0500,
Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally." So how do I return it? Should I return it by value, hoping that the implied reference semantics will stay that way in future releases? How does your Array!() behave?
Oct 15 2008
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. 
 The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally." So how do I return it? Should I return it by value, hoping that the implied reference semantics will stay that way in future releases? How does your Array!() behave?
There's nothing requiring that a struct must implement value semantics. The fact that you are expecting such just because it is a struct is a mistake. The reality is that only structs can implement value semantics because of the inability to override A = A in a class. That being said, I think containers work best as a mixture of interface and compile-time typing. See http://www.dsource.org/projects/dcollections. -Steve
Oct 15 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
 So how do I return it?  Should I 
 return it by value, hoping that the implied reference semantics will stay 
 that way in future releases?  How does your Array!() behave?
That's not finished, but e.g. Appender can be copied around and always refers to the same underlying array. Andrei
Oct 15 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 15 Oct 2008 14:04:44 -0500,
Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details. And yes, I assume that struct copying yields a separate, independent object unless documented otherwise. I don't usually need such documentation in my code because I either remember what I'm doing or the reference components of structs are obvious. But with a library struct with undocumented, private implementation it becomes an issue.
 So how do I return it?  Should I 
 return it by value, hoping that the implied reference semantics will stay 
 that way in future releases?  How does your Array!() behave?
That's not finished, but e.g. Appender can be copied around and always refers to the same underlying array.
This means that without reading docs I won't be able to tell whether my code is going to work or not. Because the only truly copyable struct- based solution I can imagine is a struct with a single pointer field pointing at a heap-allocated implementation. And I doubt that your Appender is implemented this way. I'm afraid I'll have to actually look into library sources to make sure Appender does what I expect.
Oct 15 2008
next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 16, 2008 at 5:46 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 15 Oct 2008 14:04:44 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details.
I think the point is that value *type* doesn't necessarily imply value *semantics*. Like with D's built-in arrays. --bb
Oct 15 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 16 Oct 2008 06:44:30 +0900,
Bill Baxter wrote:
 On Thu, Oct 16, 2008 at 5:46 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 15 Oct 2008 14:04:44 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details.
I think the point is that value *type* doesn't necessarily imply value *semantics*. Like with D's built-in arrays.
Arrays are very widely used and very well documented to the point they're quite obvious. There are no hidden implementation details. Nevertheless, there are well-known problems which directly follow from the arrays' sort-of-reference semantics.
Oct 15 2008
parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 16, 2008 at 7:47 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Thu, 16 Oct 2008 06:44:30 +0900,
 Bill Baxter wrote:
 On Thu, Oct 16, 2008 at 5:46 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 15 Oct 2008 14:04:44 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details.
I think the point is that value *type* doesn't necessarily imply value *semantics*. Like with D's built-in arrays.
Arrays are very widely used and very well documented to the point they're quite obvious. There are no hidden implementation details.
And I think the issues are pretty much the same with all struct-based container. So if you understand the issues with D's arrays, then all you have to know is "same thing for other struct containers".
 Nevertheless, there are well-known problems which directly follow from
 the arrays' sort-of-reference semantics.
Yeh, these do make me wonder about whether it was a good idea to make D's arrays behave the way they do. But since they do, and since you can't really avoid having to learn how they work and what their pitfalls are, I don't think it's so bad to make other containers behave the same way. Less to learn that way. But I'd definitely be willing to entertain new ground-up designs in which arrays behaved more like pure references, or were not copyable, or similar. --bb
Oct 15 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 16 Oct 2008 08:42:23 +0900,
Bill Baxter wrote:
 On Thu, Oct 16, 2008 at 7:47 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Thu, 16 Oct 2008 06:44:30 +0900,
 Bill Baxter wrote:
 On Thu, Oct 16, 2008 at 5:46 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 15 Oct 2008 14:04:44 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. The
 convention I heard of is that if you want dynamic polymorphism you use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details.
I think the point is that value *type* doesn't necessarily imply value *semantics*. Like with D's built-in arrays.
Arrays are very widely used and very well documented to the point they're quite obvious. There are no hidden implementation details.
And I think the issues are pretty much the same with all struct-based container. So if you understand the issues with D's arrays, then all you have to know is "same thing for other struct containers".
No. Built-in array is transparent. Appender is not. Because if Appender is transparent, too, and there is a change in GC which makes its current implementation slow, you won't be able to fix that. Therefore Appender must be opaque and must specify its interface. And copy semantics becomes a part of that interface. It may transfer ownership on assignment for instance. Or declare behavior undefined if there are multiple copies of it. You see, I can't even say "multiple references" because they are not, and "multiple copies" sounds ambiguous. The bottom line is, your knowledge of built-in arrays is useless here. You must study docs for every container you use.
 Nevertheless, there are well-known problems which directly follow from
 the arrays' sort-of-reference semantics.
Yeh, these do make me wonder about whether it was a good idea to make D's arrays behave the way they do. But since they do, and since you can't really avoid having to learn how they work and what their pitfalls are, I don't think it's so bad to make other containers behave the same way. Less to learn that way.
They cannot behave the same way unless they're very similar to the built- in arrays, which makes them pretty useless. Even built-in AA behaves differently. BTW I was quite surprised to find out that it was implemented exactly as a struct with a single pointer inside. Well, there could have been reasons to implement a built-in type like this. I cannot see such reasons for Appender.
 But I'd definitely be willing to entertain new ground-up designs in
 which arrays behaved more like pure references, or were not copyable,
 or similar.
 
 --bb
 
Oct 15 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
 Or declare behavior undefined if there are 
 multiple copies of it.  You see, I can't even say "multiple references" 
 because they are not, and "multiple copies" sounds ambiguous.
Just a note: my ArrayBuilder is a struct (with 3 fields inside, each sizeof 1 CPU word). I have already used it a lot in my libs, probably about 50 times, but I've never had to copy one of them. I have used a struct because a class isn't necessary, and because the struct methods are faster at my benchmarks. Bye, bearophile
Oct 15 2008
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Wed, 15 Oct 2008 14:04:44 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:49:08 -0500,
 Andrei Alexandrescu wrote:
 Sergey Gromov wrote:
 Wed, 15 Oct 2008 09:17:57 -0500,
 Andrei Alexandrescu wrote:
 I see how it breaks those conventions, but I've never heard of them. 
 The
 convention I heard of is that if you want dynamic polymorphism you 
 use
 classes, otherwise you don't.
How about if you want pass-by-value you use structs, otherwise you don't? My argument against structs-collections is that I want to toss collections around and build other collections out of them without messing with pointers.
A struct can choose to implement value or reference semantics as it pleases. The only thing it can't readily implement is dynamic polymorphism.
I can imagine the documentation: "Ignore the fact it's a struct, it's actually a reference internally."
No. "Don't submit to the dogma that struct == value semantics, because that was never true".
I don't understand. Structs are value types, everything else are implementation details. And yes, I assume that struct copying yields a separate, independent object unless documented otherwise. I don't usually need such documentation in my code because I either remember what I'm doing or the reference components of structs are obvious. But with a library struct with undocumented, private implementation it becomes an issue.
First, there is no compiler required semantics. It is up to the designer. If the designer says 'this is return by reference', then the designer is telling you what it is. Second, just because you have your expectations doesn't mean they should be right ;) The only things that are guaranteed are done so by the compiler. Everything else is up to the designer.
 So how do I return it?  Should I
 return it by value, hoping that the implied reference semantics will 
 stay
 that way in future releases?  How does your Array!() behave?
That's not finished, but e.g. Appender can be copied around and always refers to the same underlying array.
This means that without reading docs I won't be able to tell whether my code is going to work or not. Because the only truly copyable struct- based solution I can imagine is a struct with a single pointer field pointing at a heap-allocated implementation. And I doubt that your Appender is implemented this way. I'm afraid I'll have to actually look into library sources to make sure Appender does what I expect.
All you should need to do is look at the docs. If the docs suck, you have a gripe. If the implementation isn't what you would have done, but there is nothing wrong with it, tough. Go write your own. -Steve
Oct 15 2008
prev sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 I don't want to seem like picking on you, but I would like to answer
 this message too.
I don't feel like you're picking on me! Without disagreements, there wouldn't be much to discuss :) Though I do think you should lay off with the "beginner" business. The people who disagree with you disagree for legitimate reasons, not because they're inexperienced. Perhaps Josh Bloch is a beginner?
 In my humble opinion, collections APIs are the poster-child of the
 misguided joy of the beginnings of object-orientation, right next to the
 mammal isa animal example. Organizing collections in hierarchies is of
 occasional benefit, but the real benefit comes from decoupling
 containers from algorithms via iterators, the way the STL did. The
 savings there were that an O(m * n) problem has been solved with writing
 O(m + n) code. Container hierarchies do not achieve such a reduction
 unless efforts are being made that have little to do with hierarchical
 organization. At most they factor out some code in the upper classes,
 but hierarchies help when types have many commonalities and few
 difference, and that is very rarely the case with containers.
Nonsense. Iterators are not the only way to "decouple containers from algorithms". An iterator in STL is just a compile-time interface for accessing the elements in a container. You can argue that using type hierarchies or interface inheritance is an undue runtime burden and that iterators solve that problem effectively. But, from a data-modeling perspective, there is absolutely no difference between these two: sort(List<T> list); sort(iterator<T> start, iterator<T> end);
 Structs can't inherit from abstract classes. A good deal of 
 functionality in a collection class can be implemented once in an 
 abstract base class, making it simpler for subclass authors to 
 implement the API without duplicating a bunch of boilerplate.
This also reveals a beginner's view of inheritance. You don't inherit to reuse. You inherit to be reused. If all you want is to reuse, use composition.
When the rubber hits the road, I'll take my "beginners view of inheritance any time". One of my favorite tiny classes in Java is this little nugget: import java.util.LinkedHashMap; import java.util.Map; public class CacheMap<K, V> extends LinkedHashMap<K, V> { int maxCapacity; public CacheMap(int maxCapacity) { this.maxCapacity = maxCapacity; } protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > maxCapacity; } } Now I have a fully-functional Map class that automatically removes entries (in FIFO order) when the collection reaches some maximum size. In every other way, it performs exactly like any other LinkedHashMap (which always iterates in insertion-order). If I used composition, and if I wanted to comply with with Map interface, I'd have to manually re-implement every single method, wrapping the underlying functionality and delegating the calls myself. If the type system can do it for me, why not? And since I've implemented Map<K, V>, I can pass my container to any method that cares about "mappiness", regardless of whether it cares about "cachiness".
 Conceptually -- and there's no hard and fast rule about this -- 
 structs usually either represent small logically-atomic values 
 (DateTime), or some fixed-size collection of logically-atomic values 
 (Vector3). Using a struct as an arbitrarily-sized container seems, on 
 the face of it, to break those conventions.
I see how it breaks those conventions, but I've never heard of them. The convention I heard of is that if you want dynamic polymorphism you use classes, otherwise you don't.
The D documentation implies otherwise: http://www.digitalmars.com/d/2.0/struct.html Though I suppose you're free to use whatever conventions you like. --benji
Oct 15 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 Andrei Alexandrescu wrote:
 In my humble opinion, collections APIs are the poster-child of the
 misguided joy of the beginnings of object-orientation, right next to the
 mammal isa animal example. Organizing collections in hierarchies is of
 occasional benefit, but the real benefit comes from decoupling
 containers from algorithms via iterators, the way the STL did. The
 savings there were that an O(m * n) problem has been solved with writing
 O(m + n) code. Container hierarchies do not achieve such a reduction
 unless efforts are being made that have little to do with hierarchical
 organization. At most they factor out some code in the upper classes,
 but hierarchies help when types have many commonalities and few
 difference, and that is very rarely the case with containers.
Nonsense.
I know you're going to hate this, but I'm unable to follow through with this discussion. There is much background and many aspects to discuss, and I simply can't afford the time. Since not too long ago it has dawned on the programming community that OOP is not all it's cracked to be, and that there are quite a few amendments and qualifications that need to be made to what once seemed to be very attractive principles. But such developments are quite recent, and they may sound downright heretic and indeed nonsensical to many. That's why there's need for a lot of explaining. There's a book I do recommend though to bring some good info on the subject: Agile Software Development by Robert Martin. The book is good, and for this particular subject I recommend Chapter 10. Andrei
Oct 15 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Benji Smith wrote:
 Andrei Alexandrescu wrote:
 In my humble opinion, collections APIs are the poster-child of the
 misguided joy of the beginnings of object-orientation, right next to the
 mammal isa animal example. Organizing collections in hierarchies is of
 occasional benefit, but the real benefit comes from decoupling
 containers from algorithms via iterators, the way the STL did. The
 savings there were that an O(m * n) problem has been solved with writing
 O(m + n) code. Container hierarchies do not achieve such a reduction
 unless efforts are being made that have little to do with hierarchical
 organization. At most they factor out some code in the upper classes,
 but hierarchies help when types have many commonalities and few
 difference, and that is very rarely the case with containers.
Nonsense.
I know you're going to hate this, but I'm unable to follow through with this discussion. There is much background and many aspects to discuss, and I simply can't afford the time.
Naw, I don't hate it. I'm up to my eyeballs too, and I had no delusions that I'd convince you anyhow. I just wanted to put those ideas on the table, since I think there are a lot of other people like me out there. Those people might like to use D too, and their development styles are valid too, so I'm standing up for them. Anyhoo... catcha later! --benji
Oct 15 2008
prev sibling parent reply "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Tue, Oct 14, 2008 at 5:53 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Christopher Wright:
 http://dsource.org/projects/tango/browser/trunk/tango/util/container/more/Heap.d?rev=3959
Very nice, thank you. Few quick comments (no deep comments, I haven't used it nor read it carefully): In this lines: assert (other.pop is 4); I think it's better to use assert(other.pop == 4); This lines show that a bulk addition method may be useful, that accepts any lazy/eagar iterable: MaxHeap!(uint) h; h ~= 1; h ~= 3; h ~= 2; h ~= 4; The following aliases can be moved just below their respective methods, with a /// ditto before them: alias pop remove; alias push opCatAssign; This style of comments: /** Inserts all elements in the given array into the heap. */ Can sometimes be replaced by: /// Inserts all elements in the given array into the heap. The class ddoc misses the explanation for the Min template argument: struct Heap(T, bool Min) { Some lines of comments are too much long, they may be broken at 80-95 chars long. An optional 'key' callable can be added; it can be used as sorting key mapper (Swartz-style). If not specified it can be "null" that means the current behaviour. Few other handy methods may be added: A merge() (~ and ~=) method can be added, maybe. heapreplace(heap, item) (pop and return the smallest item from the heap, and also push the new item. The heap size doesn't change) heapify(iterable) nlargest(n, iterable) nsmallest(n, iterable) The following is not a critic, just a note, I think programs with such low code density are harder to read to me, they seem like fresh snow: /** Inserts all elements in the given array into the heap. */ void push (T[] array) { while (heap.length < next + array.length) { heap.length = 2 * heap.length + 32; } foreach (t; array) push (t); } I'd write that as this, allowing me to have more code on the screen: /// Inserts all elements in the given array into the heap. void push(T[] array) { while (heap.length < next + array.length) heap.length = 2 * heap.length + 32; foreach (t; array) push(t); }
Bearophile, you're doing it again. Why are you so hung up on how other people format their code? Please, please get over it.
Oct 14 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:
 Why are you so hung up on how
 other people format their code?
Not all coding styles are created equal. Some are better. Bye, bearophile
Oct 15 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"bearophile" wrote
 Jarrett Billingsley:
 Why are you so hung up on how
 other people format their code?
Not all coding styles are created equal. Some are better.
Yes, but clearly not yours. Mine is much better. -Steve
Oct 15 2008