Tutorial 2 - More advanced serialisation features

Anamnesis has support for some more advanced features regarding serialisation.

In the main, most people will not require these, however they are used in NAF (the project from which anamnesis was extracted).

These features can be best described by the name of the member variables or function names which are used to configure them.

One thing to note is that anamnesis implicitly reserves the use of these names for its own functionality. Note that any future additions will use the prefixes hdf5_ or anam_. In order to avoid clashes with future versions of anamnesis, avoid using variables or function names with these prefixes.

  1. hdf5_defaultgroup (member variable)
  2. hdf5_aliases (member variable)
  3. hdf5_mapnames (member variable)
  4. extra_data (member variable)
  5. extra_bcast (member variable)
  6. init_from_hdf5 (member function)
  7. refs (member variable)
  8. shortdesc (member variable)

The example classes used in this tutorial are placed in test_classes2.py.

#!/usr/bin/python3

from anamnesis import AbstractAnam, AnamCollection, register_class


class CollectableSubjectStats(AbstractAnam):

    hdf5_outputs = ['zstats', 'rts']

    hdf5_defaultgroup = 'subjectstats'

    def __init__(self, zstats=None, rts=None):
        """
        zstats must be a numpy array of [width, height] dimensions
        rts must be a numpy array of [ntrials, ] dimensions
        """
        AbstractAnam.__init__(self)

        self.zstats = zstats
        self.rts = rts


register_class(CollectableSubjectStats)


class StatsCollection(AnamCollection):
    anam_combine = ['zstats', 'rts']


register_class(StatsCollection)

All of the files needed to run these examples are generated by the script test_script2_write.py. This is also where several examples of the actual usage of the variables within classes can be seen.

#!/usr/bin/python3

import shutil

import h5py

from test_classes2 import (ComplexPerson,
                           ComplexPlace,
                           ComplexTrain)

# Create a person and a place
s = ComplexPerson('Anna', 45)

print(s.name)
print(s.age)

loc = ComplexPlace('York')
print(loc.location)

t = ComplexTrain('Glasgow')
print(t.destination)

# Serialise the person and place to disk
f = h5py.File('test_script2.hdf5', 'w')
s.to_hdf5(f.create_group(s.hdf5_defaultgroup))
loc.to_hdf5(f.create_group(loc.hdf5_defaultgroup))
t.to_hdf5(f.create_group(t.hdf5_defaultgroup))
f.close()

# Serialise the person to disk using a different name
# To do this, we copy the HDF5 file and manually edit it
shutil.copyfile('test_script2.hdf5', 'test_script2_aliases.hdf5')
f = h5py.File('test_script2_alias.hdf5', 'a')
f['person'].attrs['class'] = 'test_classes2.OldComplexPerson'
f.close()

hdf5_defaultgroup

This variable is usually used when serialising a single instance of a class into and out of an HDF5 file. Its use obviates the need to specify a group name when reading from an HDF5 file using the from_hdf5file function.

E.g., if we have two classes, one of which has an hdf5_defaultgroup set to person and the other to place, we can load each of the instances without specifying where they are in the file, as follows:

#!/usr/bin/python3

from test_classes2 import ComplexPerson, ComplexPlace  # noqa: F401

# Load the classes from the HDF5 file using
# the default hdf5group names
s = ComplexPerson.from_hdf5file('test_script2.hdf5')
loc = ComplexPlace.from_hdf5file('test_script2.hdf5')

# Show that we have reconstructed the object
print("Person")
print(type(s))
print(s.name)
print(s.age)


print("Place")
print(type(loc))
print(loc.location)

hdf5_aliases

hdf5_aliases is a list wihch allows developers to specify additional class names which should be matched by the given class. As an example, if hdf5_aliases in the test_classes2.ComplexPerson class is set to [‘test_classes2.OldComplexPerson’], any files which were created using the old class name (OldComplexPerson) will now be read using the ComplexPerson class instead:

#!/usr/bin/python3

from anamnesis import obj_from_hdf5file

import test_classes2  # noqa: F401

# Demonstrate reading a file which has the old class name
# in the HDF5 file
s = obj_from_hdf5file('test_script2_aliases.hdf5', 'person')

# Show that we have reconstructed the object
print("Person")
print(type(s))
print(s.name)
print(s.age)

hdf5_mapnames

hdf5_mapnames is a rather specialised variable for which most users will not have a use. It allows users to control the mapping of variable names into and out of the HDF5 file - in other words, it decouples the names of the groups and attributes in the HDF5 file from those in the Python class.

As a concrete example, let us say that we are using a Python class which has a variable called _order but that for neatness sake, we would rather that this was called order in the HDF5 file. In this case, we would define the hdf5_mapnames variable as follows.

` hdf5_mapnames = {'_order': 'order'} hdf5_outputs = ['_order'] `

Note that hdf5_mapnames is a dictionary which maps Python class names to HDF5 entry names and that we still list the original variable name in hdf5_outputs.

You can have as many mappings as you want, but be very careful not to have a name in both hdf5_outputs and as a target in hdf5_mapnames. I.e., this is bad (assuming that your class has member variables _order and myvariable

` # Don't do this hdf5_mapnames = {'_order': 'myvariable'} hdf5_outputs = ['myvariable'] `

For (hopefully) obvious reasons, this makes no sense as you are attempting to serialise both the _order and myvariable variables into the HDF5 entry with name myvariable. Don’t Do This (TM).

extra_data

The extra_data variable is a dictionary which can be used by users of a class to serialise and unserialise additional data which is not normally saved by the object.

To use this, simply use the extra_data as a standard dictionary, for example:

#!/usr/bin/python3

import h5py
from anamnesis import obj_from_hdf5file

from test_classes2 import ComplexPerson

# Create an example object
p = ComplexPerson('Bob', 75)
p.extra_data['hometown'] = 'Oxford'

print(p)
print(p.extra_data)

# Save the object out
f = h5py.File('test_script2_extradata.hdf5', 'w')
p.to_hdf5(f.create_group(p.hdf5_defaultgroup))
f.close()

# Delete our object
del p

# Re-load our object
p = obj_from_hdf5file('test_script2_extradata.hdf5')

# Show that we recovered the object and the extra data
print(p)
print(p.extra_data)

extra_bcast

The extra_bcast variable is a list of member variable names similar to that in the main hdf5_outputs variable. The difference is that variables listed in extra_bcast will be transferred via MPI when the object is sent or broadcast, but will not be placed into the HDF5 file during serialisation/unserialisation.

The most common use of this is when there is some cached information in the class which you do not want to recompute on every MPI node but do not need to save into the HDF5 file. In this case, the name of the variable containing the cache would not be listed in hdf5_outputs but would be listed in extra_bcast. It is also possible in that instance that you would wish to use the init_from_hdf5 function as documented below.

init_from_hdf5

The optional function init_from_hdf5 is called after the object has has its members loaded when it is being unserialized from an HDF5 file. This means that you can perform any post-processing which you find necessary; for instance, if a class has a cache which needs updating after it is reinitialised (because it is not necessary to serialize/unserialize it), you can use this function to do so. To see how this works, look at the example class ComplexTrain in the test_modules2.py file shown above and examine the output from the test_script2_initfromhdf5.py script which uses this class:

#!/usr/bin/python3

from anamnesis import obj_from_hdf5file

from test_classes2 import ComplexPerson  # noqa: F401

# Load the train object and watch for the printed output from the
# init_from_hdf5 function
p = obj_from_hdf5file('test_script2.hdf5', 'train')

refs

Full use of this variable requires the addition of anamnesis’ report functionality. This will be ported from NAF soon.

shortdesc

Full use of this variable requires the addition of anamnesis’ report functionality. This will be ported from NAF soon.