www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC and big non-binary trees

reply Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm faced with huge non-binary trees:
leafs: > 8000000 
subleafs:  1-40
depth: > 17






Saddly the parent member is required and results in crossreferences.
Occasinally some branches will be dropped.

Should I
1) simply drop the branch by removing it's root from the members list,
   (huge GC costs)

2) walk down the dropped branch and set parent to null,
   (medium dropping costs, medium GC costs)

3) or walk down the dropped branch and set parent and members to null?
   (huge dropping costs)

Tests till now indicate that (2) is the apropiate solution for my
current trees (~20000 leafs), but the tree size will significantly.

Any ideas how the GC will react or how to improve branch dropping
and GC performance?

Thomas
 

-----BEGIN PGP SIGNATURE-----

iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
ZCpVDJxy7mjlMIhwFYFTqtA=
=Xps0
-----END PGP SIGNATURE-----
Apr 07 2005
next sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
(NOTE: All this assumes the current GC, which is a mark-and-sweep GC).

I assume from what you're saying is that the branch is a substantial 
portion of the tree.

Remember that, when the GC scans, it only scans the *live* portion of 
the tree.  So, if you disconnect a large subset of the tree (even if 
that subset still has links back to the live data), then the GC will 
never scan any of the nodes of the branch.  So it doesn't matter whether 
you set the 'parent' parameters in them to null or not.

Concept:
The cost of a pass of the GC is proportional to the size of the live set 
at the time of the sweep.

On the other hand, having a lot of garbage lingering around means that 
you will more quickly run out of available memory, which causes the GC 
to run sooner.  So, if you are *certain* that the entire branch is 
garbage (no outside references into any node), then you could decide to 
recursively delete all of the nodes in the branch.  That would increase 
your 'dropping' costs, but would probably result in better long-term 
performance.  However, it might not be a good option if you're worried 
about the real-time performance of your drop operation.



If you're worried about the real-time performance of your 'dropping' 
operation, but don't want to have big GC sweeps either, then you could 
implement a delayed drop.  To do this, keep a global structure (perhaps 
as simple as a dynamic array of references) around which holds a list of 
dropped branches.  During idle time, you iterate over that array, and 
recursively delete all of the nodes in each dropped branch.  (Or, just 
zero the array, and kick off a GC pass.)  Thus, dropping a branch is as 
simple as removing it from its parent and storing a reference to the 
branch on this array.  If you want, you can also disable the GC during 
your critical windows, and re-enable it during your idle times.

The effect of this would be to have very fast real-time performance 
(drops are quick, and GC never runs during critical windows). 
Allocation in that window would also be pretty fast.  (Since the GC is 
disabled, it never stops to sweep memory; if it ever lacks memory, it 
just gets more from the OS.)  Yet, during idle times, you free up a lot 
of memory.

You would end up with a system where your total memory allocation is 
M+C, where M is the maximum amount of memory you would need (typically), 
and C is the amount of memory that is allocated/freed in a real-time 
window.  During the real-time portion of your program, the GC allocates 
from C; during the idle portion, you release objects (or the GC collects 
them), refilling C.



Hope this makes sense, and at least some of it applies to your work :)



Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 I'm faced with huge non-binary trees:
 leafs: > 8000000 
 subleafs:  1-40
 depth: > 17
 




 
 Saddly the parent member is required and results in crossreferences.
 Occasinally some branches will be dropped.
 
 Should I
 1) simply drop the branch by removing it's root from the members list,
    (huge GC costs)
 
 2) walk down the dropped branch and set parent to null,
    (medium dropping costs, medium GC costs)
 
 3) or walk down the dropped branch and set parent and members to null?
    (huge dropping costs)
 
 Tests till now indicate that (2) is the apropiate solution for my
 current trees (~20000 leafs), but the tree size will significantly.
 
 Any ideas how the GC will react or how to improve branch dropping
 and GC performance?
 
 Thomas
  
 
 -----BEGIN PGP SIGNATURE-----
 
 iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
 ZCpVDJxy7mjlMIhwFYFTqtA=
 =Xps0
 -----END PGP SIGNATURE-----
Apr 07 2005
parent Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Russ Lewis schrieb am Thu, 07 Apr 2005 13:34:46 -0700:
 (NOTE: All this assumes the current GC, which is a mark-and-sweep GC).

 I assume from what you're saying is that the branch is a substantial 
 portion of the tree.

 Remember that, when the GC scans, it only scans the *live* portion of 
 the tree.  So, if you disconnect a large subset of the tree (even if 
 that subset still has links back to the live data), then the GC will 
 never scan any of the nodes of the branch.  So it doesn't matter whether 
 you set the 'parent' parameters in them to null or not.
Arg, why did I thought it was a reference counting GC??? Back to the drawing board!
 Concept:
 The cost of a pass of the GC is proportional to the size of the live set 
 at the time of the sweep.

 On the other hand, having a lot of garbage lingering around means that 
 you will more quickly run out of available memory, which causes the GC 
 to run sooner.  So, if you are *certain* that the entire branch is 
 garbage (no outside references into any node), then you could decide to 
 recursively delete all of the nodes in the branch.  That would increase 
 your 'dropping' costs, but would probably result in better long-term 
 performance.  However, it might not be a good option if you're worried 
 about the real-time performance of your drop operation.
It's not a realtime application, rather a small memory eating animal! As I know the branch size at dropping time I'll try to delete all leafs in big branches and leaf small branches to the GC. <snip>
 Hope this makes sense, and at least some of it applies to your work :)
Thanks, you gave me some ideas worth playing around with. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCViBj3w+/yD4P9tIRAgCSAJ0c+ZihWUGp7s1mn9tWjJAPKHpdgwCglsY6 XGBF0GNp32znHoE1CKqATG8= =GVWQ -----END PGP SIGNATURE-----
Apr 07 2005
prev sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Can't you just make a tree of that size with dummy data, and see how it 
performs?

Thomas Kuehne wrote:
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 I'm faced with huge non-binary trees:
 leafs: > 8000000 
 subleafs:  1-40
 depth: > 17
 




 
 Saddly the parent member is required and results in crossreferences.
 Occasinally some branches will be dropped.
 
 Should I
 1) simply drop the branch by removing it's root from the members list,
    (huge GC costs)
 
 2) walk down the dropped branch and set parent to null,
    (medium dropping costs, medium GC costs)
 
 3) or walk down the dropped branch and set parent and members to null?
    (huge dropping costs)
 
 Tests till now indicate that (2) is the apropiate solution for my
 current trees (~20000 leafs), but the tree size will significantly.
 
 Any ideas how the GC will react or how to improve branch dropping
 and GC performance?
 
 Thomas
  
 
 -----BEGIN PGP SIGNATURE-----
 
 iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
 ZCpVDJxy7mjlMIhwFYFTqtA=
 =Xps0
 -----END PGP SIGNATURE-----
Apr 07 2005
parent Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Georg Wrede schrieb am Fri, 08 Apr 2005 00:45:10 +0300:
 Can't you just make a tree of that size with dummy data, and see how it 
 performs?
No, I haven't got enough RAM for that. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCVhxq3w+/yD4P9tIRAlxgAJ0ash9wrb3v5FQ2EHc61uP7ZKNd3wCgyjR/ hyhmBwR4s0Esf1kZc5KwEZA= =hUu2 -----END PGP SIGNATURE-----
Apr 07 2005