www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Multidimensional slicing

reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
Hello all,

Suppose I have some data in an 2-dimensional associative array (it could be 
larger dimension, but let's limit it to 2d for now).  I'm using an associative 
array because the underlying data is sparse, i.e. for any given index pair (i, 
j) there's most likely not an entry.

It's trivial to take a slice of this data using the first dimension, i.e. if 
a[i][j] gives the value for the key pair (i, j), a[i] will give the array of 
values for all values of j, and a[i].keys will give me all the values of j that 
are "connected" to i.

... however, I can't do this the opposite way, i.e. to slice it a[][j] -- I get 
an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

and similarly there is no way to identify a[][j].keys i.e. all the i indexes 
associated with j.

So, I'm curious: is there any practical way to slice across the 2nd (or higher) 
dimension in this way, so that I get access to both values and keys?

The application here is a large (but sparse) dataset of links forming a 
bipartite graph of user-object weighted links.  The goal would be for the 2d 
array to store those weights, and the slices would allow you to look at that 
collection from one of two viewpoints -- i.e. all the objects a given user is 
linked to, or all the users a given object is linked to.

Of course, it's trivial to have two associative arrays, say userLinks and 
objectLinks, where the keys are respectively object and user IDs and the values 
are the weights, but that means storing the weight values twice.  I'd like to
be 
able to have one entity that stores the links and weights, and which can be 
sliced either way without needing to copy data and without any added CPU 
overhead (the planned simulation and data sizes are large, so something that is 
costly CPU-wise or that requires additional allocation is unlikely to scale).

Thanks in advance for any advice,

Best wishes,

     -- Joe
Oct 24 2012
next sibling parent "Nathan M. Swan" <nathanmswan gmail.com> writes:
On Wednesday, 24 October 2012 at 14:25:33 UTC, Joseph Rushton 
Wakeling wrote:
 Hello all,

 Suppose I have some data in an 2-dimensional associative array 
 (it could be larger dimension, but let's limit it to 2d for 
 now).  I'm using an associative array because the underlying 
 data is sparse, i.e. for any given index pair (i, j) there's 
 most likely not an entry.

 It's trivial to take a slice of this data using the first 
 dimension, i.e. if a[i][j] gives the value for the key pair (i, 
 j), a[i] will give the array of values for all values of j, and 
 a[i].keys will give me all the values of j that are "connected" 
 to i.

 ... however, I can't do this the opposite way, i.e. to slice it 
 a[][j] -- I get an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

 and similarly there is no way to identify a[][j].keys i.e. all 
 the i indexes associated with j.

 So, I'm curious: is there any practical way to slice across the 
 2nd (or higher) dimension in this way, so that I get access to 
 both values and keys?

 The application here is a large (but sparse) dataset of links 
 forming a bipartite graph of user-object weighted links.  The 
 goal would be for the 2d array to store those weights, and the 
 slices would allow you to look at that collection from one of 
 two viewpoints -- i.e. all the objects a given user is linked 
 to, or all the users a given object is linked to.

 Of course, it's trivial to have two associative arrays, say 
 userLinks and objectLinks, where the keys are respectively 
 object and user IDs and the values are the weights, but that 
 means storing the weight values twice.  I'd like to be able to 
 have one entity that stores the links and weights, and which 
 can be sliced either way without needing to copy data and 
 without any added CPU overhead (the planned simulation and data 
 sizes are large, so something that is costly CPU-wise or that 
 requires additional allocation is unlikely to scale).

 Thanks in advance for any advice,

 Best wishes,

     -- Joe

Short answer: no. Long answer: It would be inefficient to create a slice like you ask if your data structure is associative arrays. Here's the code: ulong[] result; foreach(i, js; a) { foreach (j, w; js) { if (j == jYouAskedFor) { result ~= w; } } } // use result... The worst case here is O(n), where n is the number of links. The slicing of a[i] is O(1). The two associative arrays is probably your best bet, but I'm no expert on this. Hope it helps, Nathan M. Swan
Oct 25 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Joseph Rushton Wakeling:

 Suppose I have some data in an 2-dimensional associative array 
 (it could be larger dimension, but let's limit it to 2d for 
 now).  I'm using an associative array because the underlying 
 data is sparse, i.e. for any given index pair (i, j) there's 
 most likely not an entry.

In Python people that want to use a associative arrays to store a sparse 2D array usually use just one associative array, where the keys are (r,c) pairs. But you can't "slice" this. In low level languages for sparse 2D arrays often people use something different. One popular choice is to use just a (dense) array for the first coordinate, and to put just index-value pairs (with ordered indexes) into such arrays. It's a simple solution, but it's often good enough. If the rows contain many items (more than 50-100) it's better to chunk each row in some way. If you want to slice on both coordinates consider using two data structures, where the second contains pointers to the first, and if you want to add/remove items then maybe each value needs a pointer to the start of the array. See also: http://en.wikipedia.org/wiki/Sparse_matrix
 It's trivial to take a slice of this data using the first 
 dimension, i.e. if a[i][j] gives the value for the key pair (i, 
 j), a[i] will give the array of values for all values of j, and 
 a[i].keys will give me all the values of j that are "connected" 
 to i.

 ... however, I can't do this the opposite way, i.e. to slice it 
 a[][j] -- I get an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

 and similarly there is no way to identify a[][j].keys i.e. all 
 the i indexes associated with j.

Maybe you are able to create a wrapper, that wraps the 2D associative array, and uses a little "most recently used" cache stored in a little array to memoize some of the last lookups on the first coordinate. But first it's better to try simpler solutions. Bye, bearophile
Oct 26 2012