www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Numerically largest entry in a trie of digits

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Just thought I'd pick all the bright minds here: I have a set of numbers
in string form (as decimal digits), stored in a trie.  How do I retrieve
the largest number in the set?

My initial guess was to find the rightmost child in the trie.
Unfortunately, this is incorrect, because the rightmost child may not
necessarily be the largest numerically. For example, consider the set
{1, 9, 10}. In trie form, 10 would be a child of 1 (it would be stored
as the digit path 1 -> 0), and the rightmost child would be 9, which is
less than 10.

It would appear that what I want is the rightmost *longest* entry in the
trie.  Is there a more efficient way to compute this, short of
brute-force traversal over the entire trie?


T

-- 
Change is inevitable, except from a vending machine.
Apr 19 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
It sounds to me like you want to build a separate index for the trie 
rather than do something clever.
Apr 19 2022
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/19/22 12:35 PM, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set of numbers
 in string form (as decimal digits), stored in a trie.  How do I retrieve
 the largest number in the set?
 
 My initial guess was to find the rightmost child in the trie.
 Unfortunately, this is incorrect, because the rightmost child may not
 necessarily be the largest numerically. For example, consider the set
 {1, 9, 10}. In trie form, 10 would be a child of 1 (it would be stored
 as the digit path 1 -> 0), and the rightmost child would be 9, which is
 less than 10.
 
 It would appear that what I want is the rightmost *longest* entry in the
 trie.  Is there a more efficient way to compute this, short of
 brute-force traversal over the entire trie?
 
Maybe store the longest chain in each node when inserting? Then you look for the longest chain first, and rightmost if there's more than one. -Steve
Apr 19 2022
prev sibling next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 19 April 2022 at 16:35:21 UTC, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set 
 of numbers in string form (as decimal digits), stored in a 
 trie.  How do I retrieve the largest number in the set?

 My initial guess was to find the rightmost child in the trie. 
 Unfortunately, this is incorrect, because the rightmost child 
 may not necessarily be the largest numerically. For example, 
 consider the set {1, 9, 10}. In trie form, 10 would be a child 
 of 1 (it would be stored as the digit path 1 -> 0), and the 
 rightmost child would be 9, which is less than 10.

 It would appear that what I want is the rightmost *longest* 
 entry in the trie.  Is there a more efficient way to compute 
 this, short of brute-force traversal over the entire trie?


 T
Maybe maintain overall order via a linked list (head and tail pointers for the whole tree + next/prev on each node), modified at insertion/removal of nodes? That should provide O(1) access to largest and smallest number.
Apr 19 2022
prev sibling next sibling parent reply Dom DiSc <dominikus scherkl.de> writes:
On Tuesday, 19 April 2022 at 16:35:21 UTC, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set 
 of numbers in string form (as decimal digits), stored in a 
 trie.  How do I retrieve the largest number in the set?
Is it necessary to store decimal numbers? Because if they would be stored binary, the highest number would automatically be among the longest strings.
Apr 19 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 19, 2022 at 09:16:18PM +0000, Dom DiSc via Digitalmars-d wrote:
 On Tuesday, 19 April 2022 at 16:35:21 UTC, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set of
 numbers in string form (as decimal digits), stored in a trie.  How
 do I retrieve the largest number in the set?
Is it necessary to store decimal numbers? Because if they would be stored binary, the highest number would automatically be among the longest strings.
Obviously, the same holds for decimal numbers. The question is, how to efficiently locate it in the trie (without traversing the entire trie). T -- A linguistics professor was lecturing to his class one day. "In English," he said, "A double negative forms a positive. In some languages, though, such as Russian, a double negative is still a negative. However, there is no language wherein a double positive can form a negative." A voice from the back of the room piped up, "Yeah, yeah."
Apr 19 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/19/22 14:28, H. S. Teoh wrote:
 On Tue, Apr 19, 2022 at 09:16:18PM +0000, Dom DiSc via Digitalmars-d 
wrote:
 Is it necessary to store decimal numbers?
 Because if they would be stored binary, the highest number would
 automatically be among the longest strings.
Obviously, the same holds for decimal numbers.
I think Dom DiSc meant zero pad all numbers to have the same maximum length? If there is indeed a maximum length that does not incur too much cost (I think tries naturally don't have that kind of cost?), then the rightmost will be the highest number. (?) Ali
Apr 19 2022
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 19, 2022 at 02:37:07PM -0700, Ali Çehreli via Digitalmars-d wrote:
 On 4/19/22 14:28, H. S. Teoh wrote:
 On Tue, Apr 19, 2022 at 09:16:18PM +0000, Dom DiSc via Digitalmars-d
wrote:
 Is it necessary to store decimal numbers?
 Because if they would be stored binary, the highest number would
 automatically be among the longest strings.
Obviously, the same holds for decimal numbers.
I think Dom DiSc meant zero pad all numbers to have the same maximum length? If there is indeed a maximum length that does not incur too much cost (I think tries naturally don't have that kind of cost?), then the rightmost will be the highest number. (?)
[...] !!! Ali, you're a genius! Pad the numbers with 0's up to some maximum number of digits (for this particular trie I'm not expecting more than a handful of digits anyway), and that would ensure numerical order == lexicographic order, so the problem becomes trivial. It also simultaneously solves a bunch of other issues I have, all stemming from the fact that with digit strings of non-equal lengths, numerical order != lexicographic order. Thanks!!! T -- Why have vacation when you can work?? -- EC
Apr 19 2022
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 19, 2022 at 05:12:30PM -0700, H. S. Teoh via Digitalmars-d wrote:
 On Tue, Apr 19, 2022 at 02:37:07PM -0700, Ali Çehreli via Digitalmars-d wrote:
[...]
 !!! Ali, you're a genius!  Pad the numbers with 0's up to some maximum
 number of digits (for this particular trie I'm not expecting more than
 a handful of digits anyway), and that would ensure numerical order ==
 lexicographic order, so the problem becomes trivial. It also
 simultaneously solves a bunch of other issues I have, all stemming
 from the fact that with digit strings of non-equal lengths, numerical
 order != lexicographic order.
[...] Hmm, actually, it turns out that padding with 0's breaks the very reason I'm using a trie in the first place (it's for detecting potential matches of a set of numbers based on an incomplete input prefix). So it's a no-go, even though it was a genius idea. :-( Looks like the best solution is just to separately index the set with a sorted linear array... Steven's idea of storing max lengths in each node does work for this specific problem, but it makes other operations needlessly hard. So also no-go. :-/ T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.
Apr 19 2022
prev sibling next sibling parent Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Tuesday, 19 April 2022 at 16:35:21 UTC, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set 
 of numbers in string form (as decimal digits), stored in a 
 trie.  How do I retrieve the largest number in the set?
Just to get the dumb answer out of the way: we need to assume that the trie is updated dynamically, right? So just trivially taking the maximum while inserting all the members won't work. In the dynamic case, keeping a separate sorted list sounds best if memory usage is of no concern to you. If you really need only the largest value and performance is a concern, a max heap would probably be faster than a fully sorted list.
Apr 20 2022
prev sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 19 April 2022 at 16:35:21 UTC, H. S. Teoh wrote:
 Just thought I'd pick all the bright minds here: I have a set 
 of numbers in string form (as decimal digits), stored in a 
 trie.  How do I retrieve the largest number in the set?

 My initial guess was to find the rightmost child in the trie. 
 Unfortunately, this is incorrect, because the rightmost child 
 may not necessarily be the largest numerically. For example, 
 consider the set {1, 9, 10}. In trie form, 10 would be a child 
 of 1 (it would be stored as the digit path 1 -> 0), and the 
 rightmost child would be 9, which is less than 10.

 It would appear that what I want is the rightmost *longest* 
 entry in the trie.  Is there a more efficient way to compute 
 this, short of brute-force traversal over the entire trie?


 T
Perhaps have each node children sorted first by: - having themselves a child - digit itself Then rightmost/leftmost chain of nodes should give what you want. Alexandru.
Apr 20 2022