digitalmars.D.learn - Multidimensional slicing

Joseph Rushton Wakeling (31/31) Oct 24 2012 Hello all,

Nathan M. Swan (20/57) Oct 25 2012 Short answer: no.
bearophile (23/38) Oct 26 2012 In Python people that want to use a associative arrays to store a

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

Hello all,

Suppose I have some data in an 2-dimensional associative array (it could be 
larger dimension, but let's limit it to 2d for now).  I'm using an associative 
array because the underlying data is sparse, i.e. for any given index pair (i, 
j) there's most likely not an entry.

It's trivial to take a slice of this data using the first dimension, i.e. if 
a[i][j] gives the value for the key pair (i, j), a[i] will give the array of 
values for all values of j, and a[i].keys will give me all the values of j that 
are "connected" to i.

... however, I can't do this the opposite way, i.e. to slice it a[][j] -- I get 
an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

and similarly there is no way to identify a[][j].keys i.e. all the i indexes 
associated with j.

So, I'm curious: is there any practical way to slice across the 2nd (or higher) 
dimension in this way, so that I get access to both values and keys?

The application here is a large (but sparse) dataset of links forming a 
bipartite graph of user-object weighted links.  The goal would be for the 2d 
array to store those weights, and the slices would allow you to look at that 
collection from one of two viewpoints -- i.e. all the objects a given user is 
linked to, or all the users a given object is linked to.

Of course, it's trivial to have two associative arrays, say userLinks and 
objectLinks, where the keys are respectively object and user IDs and the values 
are the weights, but that means storing the weight values twice.  I'd like to
be 
able to have one entity that stores the links and weights, and which can be 
sliced either way without needing to copy data and without any added CPU 
overhead (the planned simulation and data sizes are large, so something that is 
costly CPU-wise or that requires additional allocation is unlikely to scale).

Thanks in advance for any advice,

Best wishes,

     -- Joe

Oct 24 2012

"Nathan M. Swan" <nathanmswan gmail.com> writes:

On Wednesday, 24 October 2012 at 14:25:33 UTC, Joseph Rushton 
Wakeling wrote:
 Hello all,

 Suppose I have some data in an 2-dimensional associative array 
 (it could be larger dimension, but let's limit it to 2d for 
 now).  I'm using an associative array because the underlying 
 data is sparse, i.e. for any given index pair (i, j) there's 
 most likely not an entry.

 It's trivial to take a slice of this data using the first 
 dimension, i.e. if a[i][j] gives the value for the key pair (i, 
 j), a[i] will give the array of values for all values of j, and 
 a[i].keys will give me all the values of j that are "connected" 
 to i.

 ... however, I can't do this the opposite way, i.e. to slice it 
 a[][j] -- I get an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

 and similarly there is no way to identify a[][j].keys i.e. all 
 the i indexes associated with j.

 So, I'm curious: is there any practical way to slice across the 
 2nd (or higher) dimension in this way, so that I get access to 
 both values and keys?

 The application here is a large (but sparse) dataset of links 
 forming a bipartite graph of user-object weighted links.  The 
 goal would be for the 2d array to store those weights, and the 
 slices would allow you to look at that collection from one of 
 two viewpoints -- i.e. all the objects a given user is linked 
 to, or all the users a given object is linked to.

 Of course, it's trivial to have two associative arrays, say 
 userLinks and objectLinks, where the keys are respectively 
 object and user IDs and the values are the weights, but that 
 means storing the weight values twice.  I'd like to be able to 
 have one entity that stores the links and weights, and which 
 can be sliced either way without needing to copy data and 
 without any added CPU overhead (the planned simulation and data 
 sizes are large, so something that is costly CPU-wise or that 
 requires additional allocation is unlikely to scale).

 Thanks in advance for any advice,

 Best wishes,

     -- Joe

Short answer: no.

Long answer: It would be inefficient to create a slice like you 
ask if your data structure is associative arrays. Here's the code:

     ulong[] result;
     foreach(i, js; a) {
         foreach (j, w; js) {
             if (j == jYouAskedFor) {
                 result ~= w;
             }
         }
     }
     // use result...

The worst case here is O(n), where n is the number of links. The 
slicing of a[i] is O(1).

The two associative arrays is probably your best bet, but I'm no 
expert on this.

Hope it helps,
Nathan M. Swan

Oct 25 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Joseph Rushton Wakeling:

 Suppose I have some data in an 2-dimensional associative array 
 (it could be larger dimension, but let's limit it to 2d for 
 now).  I'm using an associative array because the underlying 
 data is sparse, i.e. for any given index pair (i, j) there's 
 most likely not an entry.

In Python people that want to use a associative arrays to store a 
sparse 2D array usually use just one associative array, where the 
keys are (r,c) pairs. But you can't "slice" this.

In low level languages for sparse 2D arrays often people use 
something different. One popular choice is to use just a (dense) 
array for the first coordinate, and to put just index-value pairs 
(with ordered indexes) into such arrays. It's a simple solution, 
but it's often good enough. If the rows contain many items (more 
than 50-100) it's better to chunk each row in some way.

If you want to slice on both coordinates consider using two data 
structures, where the second contains pointers to the first, and 
if you want to add/remove items then maybe each value needs a 
pointer to the start of the array.

See also:
http://en.wikipedia.org/wiki/Sparse_matrix


 It's trivial to take a slice of this data using the first 
 dimension, i.e. if a[i][j] gives the value for the key pair (i, 
 j), a[i] will give the array of values for all values of j, and 
 a[i].keys will give me all the values of j that are "connected" 
 to i.

 ... however, I can't do this the opposite way, i.e. to slice it 
 a[][j] -- I get an error message, e.g.:

     Error: ulong[ulong][ulong] cannot be sliced with []

 and similarly there is no way to identify a[][j].keys i.e. all 
 the i indexes associated with j.

Maybe you are able to create a wrapper, that wraps the 2D 
associative array, and uses a little "most recently used" cache 
stored in a little array to memoize some of the last lookups on 
the first coordinate. But first it's better to try simpler 
solutions.

Bye,
bearophile

Oct 26 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Multidimensional slicing