Tutorial 3 - AnamCollections - dealing with many similar objects¶

The AnamCollection code is designed to make it easy to deal with situations where you have many of the same type of class and want to perform analysis across them. For instance, in a sliding-window analysis, you may have a class which implements model fitting and metric measurement for a given window. You can then use an AnamCollection to easily collate the results together. You can consider AnamCollection to be a list of objects with additional helper functions to assist with the collation of data and HDF5 serialisation/deserialisation.

In general, you will want to subclass AnamCollection to use it. An example can be found in test_classes3.py:

#!/usr/bin/python3

from anamnesis import AbstractAnam, AnamCollection, register_class

class CollectableSubjectStats(AbstractAnam):

    hdf5_outputs = ['zstats', 'rts']

    hdf5_defaultgroup = 'subjectstats'

    def __init__(self, zstats=None, rts=None):
        """
        zstats must be a numpy array of [width, height] dimensions
        rts must be a numpy array of [ntrials, ] dimensions
        """
        AbstractAnam.__init__(self)

        self.zstats = zstats
        self.rts = rts

register_class(CollectableSubjectStats)

class StatsCollection(AnamCollection):
    anam_combine = ['zstats', 'rts']

register_class(StatsCollection)

In this case, we have two variables, each of which will be a numpy array. The AnamCollection class is specifically designed for use with numpy arrays.

As an example of using this class when writing, you can see test_script3_write.py:

#!/usr/bin/python3

import h5py
import numpy as np

from test_classes3 import CollectableSubjectStats, StatsCollection

# Create a collection to put our data into
collection = StatsCollection()

# Simulate 5 peoples worth of data
for person in range(5):
    # 10x10 zstats - low resolution image!
    zstats = np.random.randn(10, 10)
    # 100 trials - averaging 450ms
    rts = np.random.randn(100) * 450.0

    p = CollectableSubjectStats(zstats, rts)

    collection.append(p)

# Write the data to a file
f = h5py.File('test_script3.hdf5', 'w')
collection.to_hdf5(f.create_group('data'))
f.close()

Note that objects can be appended into the collection object using the normal .append() method and then be written into an HDF5 file as normal.

When using a AnamCollection derived object, the simplest form of use is to treat it as a list which will let you retrieve the objects stored within it. This can be seen in the first few lines of the script below.

In addition, if you request any of the members which are referred to in the anam_combine member variable on the collection, the class will collate all of the instances of the identically named variable from the objects in the list and return an object which has these objects stacked. In most cases, you will use this with numpy arrays - you will then end up with a numpy array with an additional dimension. I.e., if each object has a numpy array of dimension (10, 10) and you have 3 objects, the combined array will have size (10, 10, 3). The objects are accessed by just accessing it as a member variable; for instance, if the name data was in anam_combine and your collection was named collection, you could access the combined data by accessing collection.data. Note that this member will only be available once you have called the update_cache() function on the collection - this is for reasons of efficiency. Therefore, after modifying, adding or deleting members in the list, you should call update_cache(). There is also a clear_cache() function but it is rarely used.

For a full example, see test_script3_read.py:

#!/usr/bin/python3

from anamnesis import obj_from_hdf5file

import test_classes3  # noqa: F401

# Load our collection of data
c = obj_from_hdf5file('test_script3.hdf5')

# Demonstrate how we have access to each individual object
for p in c.members:
    print(p.zstats.shape, p.zstats.min(), p.zstats.max())

# Make sure that our cache is up-to-date before we demonstrate
# the stacked data methods
c.update_cache()

# Demonstrate that we have access to stacked versions of the data
print(c.zstats.shape)
print(c.rts.shape)