Welcome

high5py is a high-level interface to h5py, which is itself a high-level interface to the HDF5 library. You can use high5py to make one-line calls for the most common HDF5 tasks, like saving and loading data. For example:

import numpy as np
import high5py as hi5

hi5.save_dataset('data.h5', np.random.rand(100), name='x')
x = hi5.load_dataset('data.h5', name='x')

Installation

From PyPI

The easiest way to install high5py is using pip (and PyPI):

pip install high5py

From source

To install from source, download the source code from Github:

# Using SSH
git clone git://github.com:jhtu/high5py.git

# Using HTTPS
git clone https://github.com/jhtu/high5py.git

Next, navigate to the high5py root directory (the one containing setup.py). Then run

pip install .

Testing the code

To be sure the code is working, run the unit tests:

python -c 'import high5py as hi5; hi5.run_all_tests()'

Documentation

A tutorial notebook is available in the source code at examples/tutorial.ipynb. The full documentation is available at ReadTheDocs. You can also build it manually with Sphinx. To do so, navigate to the high5py root directory (the one containing setup.py). Then run

sphinx-build docs docs/_build

You can then open docs/_build/index.html in a web browser.

Licensing

high5py is published under the BSD 3-clause license. The license file is available here.

API

high5py.append_attributes(filepath, attributes, name='data')[source]

Append HDF5 group/dataset attributes (never overwrites existing attributes).

Parameters

filepath (str) – Path to HDF5 file.
attributes (dict) – Attributes to append.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to ‘data’.

high5py.append_dataset(filepath, data, name='data', description=None, compression_level=None)[source]

Append dataset to HDF5 file (never overwrites file).

Parameters

filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.

high5py.delete(filepath, name)[source]

Delete group/dataset in HDF5 file.

Parameters

filepath (str) – Path to HDF5 file.
name (str) – HDF5 name (e.g., /group/old_dataset).

high5py.exists(filepath, name)[source]

Determine if group/dataset name exists in HDF5 file.

Parameters

filepath (str) – Path to HDF5 file.
name (str) – HDF5 group/dataset name (e.g., /group/dataset).

Returns

exists (bool) – Boolean describing if path exists in HDF5 file.

high5py.from_npz(npz_filepath, h5_filepath)[source]

Load data from an NPZ (compressed numpy archive) file and save to HDF5. NPZ array names are preserved.

Parameters

npz_filepath (str) – Path to NPZ file.
h5_filepath (str) – Path to HDF5 file.

high5py.info(filepath, name='/', return_info=False)[source]

Print and return information about HDF5 file/group/dataset.

Parameters

filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to root group (‘/’).
return_info (bool, optional) – If True, return a dictionary of results. Defaults to False.

Returns

info (dict, optional) – Dictionary of key, value pairs describing specified file/group/dataset. Only provided if return_info is True.

high5py.list_all(filepath, name='/', return_info=False)[source]

List all groups and datasets in HDF5 file or group.

Parameters

filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 group name (e.g., /group). Defaults to root group (‘/’).
return_into (bool, optional) – If True, return a dictionary of results. Defaults to False.

Returns

info (dict, optional) – Dictionary of key, value pairs describing specified file/group. Only provided if return_info is True.

high5py.load_attributes(filepath, name='data')[source]

Load HDF5 group/dataset attributes from HDF5 file.

Parameters

filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.

Returns

attributes (dict) – Dictionary of loaded attributes.

high5py.load_dataset(filepath, name='data', start_index=None, end_index=None)[source]

Load dataset from HDF5 file.

Parameters

filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
start_index (int, optional) – Start index for slicing HDF5 dataset. Providing a slice index here may be more efficient than returning the entire dataset and then slicing. Defaults to None, for which no slicing will be done on the beginning of the dataset.
end_index (int, optional) – End index for slicing HDF5 dataset. Providing a slice index here may be more efficient than returning the entire dataset and then slicing. Defaults to None, for which no slicing will be done on the end of the dataset.

Returns

data (array-like, scalar, or str) – Dataset values will be returned with same type they were saved (usually some sort of numpy array), except that single-element arrays will be returned as scalars.

high5py.rename(filepath, old_name, new_name, new_description=None)[source]

Rename group/dataset in HDF5 file.

Parameters

filepath (str) – Path to HDF5 file.
old_name (str) – Old HDF5 name (e.g., /group/old_dataset).
new_name (str) – New HDF5 name (e.g., /group/new_dataset).
description (str, optional) – New string describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which the old description is kept.

high5py.replace_dataset(filepath, data, name='data', description=None, compression_level=None)[source]

Replace/overwrite a dataset in an HDF5 file (do not overwrite the whole file).

Parameters

filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.

high5py.save_attributes(filepath, attributes, name='data', overwrite=True)[source]

Save HDF5 group/dataset attributes (overwrites existing attributes by default).

Parameters

filepath (str) – Path to HDF5 file.
attributes (dict) – Attributes to save.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to ‘data’.
overwrite (bool) – If True, saving overwrites existing attributes. Otherwise, new attributes are appended to existing ones. Defaults to True.

high5py.save_dataset(filepath, data, name='data', description=None, overwrite=True, compression_level=None)[source]

Save dataset to HDF5 file (overwrites file by default).

Parameters

filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
overwrite (bool) – If True, saving overwrites the file. Otherwise, data is appended to the file. Defaults to True.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.

high5py.to_npz(h5_filepath, npz_filepath, name='/')[source]

Save an HDF5 group/dataset to NPZ (compressed numpy archive) format. Subgroups such as path/group/subgroup/dataset will be saved with array names such as path_group_subgroup_dataset.

Parameters

h5_filepath (str) – Path to HDF5 file.
npz_filepath (str) – Path to NPZ file.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to root group (‘/’).

Release notes

high5py 0.2

A few new features, a small bug fix, and some internal changes.

New features and improvements

Added new method load_attributes that loads all attributes from a group/dataset and returns them as a dictionary.

Bug fixes

The start_index and end_index arguments to load_dataset are now functional. Previously, argument values could be passed in but would not affect the function behavior.
Fixed a minor bug in the tutorial by updating the expected exception when trying to append a dataset whose name already exists.

Internal changes

The repository files have been reorganized.
The documentation now uses markdown syntax.
The documentation has been modified to reduce redundant/duplicated files/text.

high5py 0.1

First public release.