# 1. Data wrangling¶

## 1.1. Working with data from the AllenSDK¶

The data we use in our connectivity models comes from the AAV tracing experiments performed at the Allen Institute for Brain Science. The experiments consist of injecting an AAV viral tracer into a region of the mouse brain and subsequently imaging the brain after the virus has propagated down the axons of the infected neurons. This imaging reveals how a group of infected neurons are structurally connected to other regions of the brain.

### 1.1.1. AllenSDK Package¶

allensdk is a python package provided by the Allen Institute to allow for retrieval and manipulation of the data generated by the experiments they perform. We utilize the mcmodels.core subpackage which supports the data service of the previously mentioned viral tracing experiments. Specifically, we incorporate the allensdk.core.MouseConnectivityCache object to pull experimental data as well as register the data in the Allen 3D Reference Space. More information can be found here

## 1.2. Core Package¶

### 1.2.1. VoxelModelCache¶

The VoxelModelCache extends allensdk.core.MouseConnectivityCache to download and pull the latest iteration of our voxel model. Additionally, this class implements the get_experiment_data to pull experiment injection and projection volumes given that the experiment satisfies all supplied parameters

>>> from mcmodels.core import VoxelModelCache
>>> cache = VoxelModelCache(manifest_file='connectivity/voxel_model_manifest.json')
>>> # this method returns a tuple with object types:
>>> normalized_connection_density = cache.get_normalized_connection_density()
>>> # get all wildtype, cortical experiment data
>>> # this method returns a VoxelData object
>>> cortex_data = cache.get_experiment_data(injection_structure_ids=[315], cre=None)


See VoxelModelCache and VoxelData for more information.

### 1.2.2. Mask class¶

In our package, we define methods relating to registering data into the 3D reference space in our Mask class. Specifically, we can:

• query only specific structures of the brain
• map masked vectors of the brain back to their corresponding locations in the 3D reference space.
• determine to which structure each element of a masked vector belongs

Mask is most often initialized through the Mask.from_cache classmethod using a VoxelModelCache object and optional keyword arguments for subetting either hemispheres or structures. In the case of the new voxel scale model, we define the source to be right hemisphere, and in this case the cortex:

>>> from mcmodels.core import Mask


The method get_experiment_data in VoxelModelData or VoxelData sets source and target matrices as attributes which have masked, flattened injection and projection volumes for each experiment as rows. One can determine the structure_id of a given column in either of these arrays using the method get_key from the Mask object:

>>> import numpy as np
>>> key.shape
xxxx
>>> np.unique(key)
np.array([315])


The key by default will include only the structure ids specified in the construction of the Mask object. However, we can pass specific structure_ids to the get_key method if we are interested in a finer or coarser level in the ontology

>>> # get set of summary structures
>>> structure_tree = cache.get_structure_tree()
>>> summary_structures = structure_tree.get_structures_by_set_id([167587189])[0]
>>> # the new ccf does not have sturcture 934 as a structure id
>>> structure_ids = [s['id'] for s in summary_structures if s['id'] != 934]
>>> len(np.unique(key))
293


The key array has length equal to the number of voxels in the cortex (R hemisphere) as that is the definition of our mask. However, if we just want the indices for a given structure:

>>> # get sturcture id correponding to VISp
>>> visp_id = structure_tree.get_structures_by_acronym(["VISp"])[0]["id"]

>>> # our key is a masked, flattened volume, lets map it back