www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [OT] Just curious: What's this algorithm's name?

reply "Nick Sabalausky" <a a.a> writes:
Nothing important, just something I'm curious about:

At least a couple times recently I've come across a certain pattern for 
dealing with directed graphs (ones that may have cycles). It seems general 
enough that I'm surprised I don't seem to recognize it (but then again, it 
has been years since I've formally studied algorithms - I only have vague 
recollections of names like "Prim's"). A brief search didn't seem to turn up 
anything that jumped out at me as being a match.

Here's what it's for:

Suppose you have a directed graph, which may have cycles, and you want to 
compute something for each node. But the final result for each node is 
dependent on the final results for each of the nodes it points to. (It may 
sound like this would just cause an endless "feedback loop" ie infinite 
recursion whenever there's a cycle, but there are applications of this where 
that isn't a problem. Ie, in cases where the computation stops recursing 
when it gets back to itself.)

A basic example: You have a graph. Various strings can be computed for each 
node (based on the data stored in the node). Additionally, for each node, 
all N edges are numbered from 1 to N. For each node "X", you want to compute 
a single set of unique strings: All the strings computed from node X, plus 
all the strings computed from all the nodes reachable from node X using 
*only* even-numbered edges (ie, "edge % 2 == 0").

The way that's done:

There's two main phases:

1. "Spontaneously generate" a partial result for each node X. This partial 
result is *only* the strings generated from node X itself. It does not 
attempt to include any strings from the other nodes reachable from X. 
Additionally, while you're at each node X, you generate a "propagation 
graph": Ie, make note of which edges are even-numbered and will therefore 
need to be traversed.

2. "Propagate" the results. For each node X, add in (propagate) any new 
strings that come directly from the nodes you noted in the "propagation 
graph" (ie, the edges you already identified as matching "edge % 2 == 0"). 
Repeat this step 2 until you can go through the entire graph without 
propagating any new strings.

Granted, that specific example can be simplified somewhat since "follow only 
even-numbered edges" just happens to be trivial without actually creating a 
propagation graph. But anyways, that's the basic idea.

There's at least two uses of this algorithm when building LALR parse tables 
for LALR parsing: For one, it's used to generate all the lookahead 
information (which is where I got much of the terminology). And I've 
recently realized that, unless I'm mistaken, it also appears to be needed 
when determining what terminals can be the "first" terminal in a 
nonterminal's derivation (which is, in turn, more necessary information for 
computing the lookahead information).

Even though I don't remember ever encountering this particular algorithm 
anywhere besides generating LALR parse tables, I have a feeling there are 
other practical applications of it.

So anyone know offhand what this graph-processing algorithm is called? Or is 
it just too basic a usage of graphs for it to even have a name?
Apr 03 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Apr 04, 2012 at 01:33:21AM -0400, Nick Sabalausky wrote:
[...]
 Suppose you have a directed graph, which may have cycles, and you want
 to compute something for each node. But the final result for each node
 is dependent on the final results for each of the nodes it points to.
 (It may sound like this would just cause an endless "feedback loop" ie
 infinite recursion whenever there's a cycle, but there are
 applications of this where that isn't a problem. Ie, in cases where
 the computation stops recursing when it gets back to itself.)

Sounds like the word you want is "closure". T -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot
Apr 04 2012
parent "Nick Sabalausky" <a a.a> writes:
"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
news:mailman.1351.1333549896.4860.digitalmars-d puremagic.com...
 On Wed, Apr 04, 2012 at 01:33:21AM -0400, Nick Sabalausky wrote:
 [...]
 Suppose you have a directed graph, which may have cycles, and you want
 to compute something for each node. But the final result for each node
 is dependent on the final results for each of the nodes it points to.
 (It may sound like this would just cause an endless "feedback loop" ie
 infinite recursion whenever there's a cycle, but there are
 applications of this where that isn't a problem. Ie, in cases where
 the computation stops recursing when it gets back to itself.)

Sounds like the word you want is "closure".

Now that you mention it, that *does* seem to make sense: (Warning: Little bit of LALR internals here) The LALR theory materials I had read didn't refer to what I just desribed (ie, basically the lookahead generation) as "closure". But before that, when actually generating the state graph itself, it did use the term "closure" for going from what it called the "kernel" items in a state to the entire state with *all* applicable items included. The algorithm looked a little different, but now that I think about it, those "kernel" items *are* essentially the "spontaneously generated" elements (and computing them involves looking at another graph - the AST of the grammar's BNF-like description). And then computing what it called the "closure" of the state is essentially the same as the "step 2" I described, traversing the graph of the BNF over and over to find new elements until there's no new elements found. It would indeed appear that the term "closure" the books used for state graph generation is also applicable for the lookahead generation algorithm I described in the OP. I think another part of what made me not realize to call it "closure" is that somehow I had the impression of "closure" being a much more general, even nebulous term. For example, I knew I'd also heard it used in the context of database theory. But, looking that up now, that *is* the same damn algorithm, too. So I guess a /facepalm is in order here ;) Thanks!
Apr 04 2012
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Apr 04, 2012 at 05:03:39PM -0400, Nick Sabalausky wrote:
 "H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
 news:mailman.1351.1333549896.4860.digitalmars-d puremagic.com...

 Sounds like the word you want is "closure".

Now that you mention it, that *does* seem to make sense: (Warning: Little bit of LALR internals here)

I'm familiar with LALR internals. :-) [...]
 It would indeed appear that the term "closure" the books used for
 state graph generation is also applicable for the lookahead generation
 algorithm I described in the OP.
 
 I think another part of what made me not realize to call it "closure"
 is that somehow I had the impression of "closure" being a much more
 general, even nebulous term. For example, I knew I'd also heard it
 used in the context of database theory. But, looking that up now, that
 *is* the same damn algorithm, too.
 
 So I guess a /facepalm is in order here ;)

Well, the term itself *is* quite general. Mathematically speaking, for something to be "closed" means that every combination of operations on a set results in something that is already in the set. It's closed in the sense that no operation will get you "outside the set". So what's a closure? Given a set that *isn't* closed (i.e., some operations may take you "outside the set", so to speak), the "closure" of the set is a larger set that *is* closed. OK, an example is in order. Take the set A={0,1,2} and the operation +. A is not closed, because although 0+1 and 0+2 are in A, 1+2 is not in A. To form a closure, then, we take the elements produced by + that aren't already in A, and add them to A. So for example, we add 3 to A, now 1+2 is closed. But then 2+2 is not in the set. No problem, we just add that to the set too. But now the newly added elements 3 and 4 are themselves not closed under +, because 1+4 isn't in the set, for example. So we have to add *those* elements as well. (I hope you can see the parallel here with the algo in your OP -- you're trying to incrementally compute the closure.) Taken to its logical conclusion, we find eventually that in order for A to be closed under the operation +, we have to add *all* positive numbers to it. So the closure of A is actually the entire set of natural numbers. Now, this example isn't exactly the best one, because the closure is infinite, so an incremental algorithm to compute it would never terminate. However, not all closures are infinite. Closure under grammar rule applications, like in your OP, is finite. Basically you keep applying the rules until it doesn't yield anything more that you don't already have. Then you can stop, because you've found the closure. T -- Prosperity breeds contempt, and poverty breeds consent. -- Suck.com
Apr 04 2012
parent "Nick Sabalausky" <a a.a> writes:
"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message 
news:mailman.1358.1333576694.4860.digitalmars-d puremagic.com...
 On Wed, Apr 04, 2012 at 05:03:39PM -0400, Nick Sabalausky wrote:
 "H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message
 news:mailman.1351.1333549896.4860.digitalmars-d puremagic.com...

 Sounds like the word you want is "closure".

Now that you mention it, that *does* seem to make sense: (Warning: Little bit of LALR internals here)

I'm familiar with LALR internals. :-) [...]
 It would indeed appear that the term "closure" the books used for
 state graph generation is also applicable for the lookahead generation
 algorithm I described in the OP.

 I think another part of what made me not realize to call it "closure"
 is that somehow I had the impression of "closure" being a much more
 general, even nebulous term. For example, I knew I'd also heard it
 used in the context of database theory. But, looking that up now, that
 *is* the same damn algorithm, too.

 So I guess a /facepalm is in order here ;)

Well, the term itself *is* quite general. Mathematically speaking, for something to be "closed" means that every combination of operations on a set results in something that is already in the set. It's closed in the sense that no operation will get you "outside the set". So what's a closure? Given a set that *isn't* closed (i.e., some operations may take you "outside the set", so to speak), the "closure" of the set is a larger set that *is* closed. OK, an example is in order. Take the set A={0,1,2} and the operation +. A is not closed, because although 0+1 and 0+2 are in A, 1+2 is not in A. To form a closure, then, we take the elements produced by + that aren't already in A, and add them to A. So for example, we add 3 to A, now 1+2 is closed. But then 2+2 is not in the set. No problem, we just add that to the set too. But now the newly added elements 3 and 4 are themselves not closed under +, because 1+4 isn't in the set, for example. So we have to add *those* elements as well. (I hope you can see the parallel here with the algo in your OP -- you're trying to incrementally compute the closure.) Taken to its logical conclusion, we find eventually that in order for A to be closed under the operation +, we have to add *all* positive numbers to it. So the closure of A is actually the entire set of natural numbers. Now, this example isn't exactly the best one, because the closure is infinite, so an incremental algorithm to compute it would never terminate. However, not all closures are infinite. Closure under grammar rule applications, like in your OP, is finite. Basically you keep applying the rules until it doesn't yield anything more that you don't already have. Then you can stop, because you've found the closure.

Yea, that makes sense. I've been starting to think of it like: In both everyday english and in math/CS, "closure" is essentially "Everything's taken care of, no loose ends". Not a good technical definition, of course, but a way to help remember an internalize it.
Apr 04 2012
prev sibling parent bls <bizprac orange.fr> writes:
On 04/03/2012 10:33 PM, Nick Sabalausky wrote:
 Suppose you have a directed graph, which may have cycles, and you want to
 compute something for each node. But the final result for each node is
 dependent on the final results for each of the nodes it points to. (It may
 sound like this would just cause an endless "feedback loop" ie infinite
 recursion whenever there's a cycle, but there are applications of this where
 that isn't a problem. Ie, in cases where the computation stops recursing
 when it gets back to itself.)

Had to find my R. Sedgewick and N. Wirth Algorithm books.. But: these books do not even give a hint. Nada. However the problem you describe smells like something for which the the "Chain of Responsibility Pattern" is made for. Though this pattern requires a break condition 'cause you wont an endless search of an responsible ( Finally We are programmers, not politicians ) I have a hard time to image how your Graph Node might look. SOOOooooo... Here you go
Apr 04 2012