Welcome
high5py
is a high-level interface to h5py
, which is itself a high-level interface to the HDF5 library.
You can use high5py
to make one-line calls for the most common HDF5 tasks, like saving and loading data.
For example:
import numpy as np
import high5py as hi5
hi5.save_dataset('data.h5', np.random.rand(100), name='x')
x = hi5.load_dataset('data.h5', name='x')
Installation
From PyPI
The easiest way to install high5py
is using pip (and PyPI):
pip install high5py
From source
To install from source, download the source code from Github:
# Using SSH
git clone git://github.com:jhtu/high5py.git
# Using HTTPS
git clone https://github.com/jhtu/high5py.git
Next, navigate to the high5py
root directory (the one containing setup.py
). Then run
pip install .
Testing the code
To be sure the code is working, run the unit tests:
python -c 'import high5py as hi5; hi5.run_all_tests()'
Documentation
A tutorial notebook is available in the source code at examples/tutorial.ipynb
.
The full documentation is available at ReadTheDocs.
You can also build it manually with Sphinx.
To do so, navigate to the high5py
root directory (the one containing setup.py
).
Then run
sphinx-build docs docs/_build
You can then open docs/_build/index.html
in a web browser.
Licensing
high5py
is published under the BSD 3-clause license.
The license file is available here.
API
- high5py.append_attributes(filepath, attributes, name='data')[source]
Append HDF5 group/dataset attributes (never overwrites existing attributes).
- Parameters
filepath (str) – Path to HDF5 file.
attributes (dict) – Attributes to append.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to ‘data’.
- high5py.append_dataset(filepath, data, name='data', description=None, compression_level=None)[source]
Append dataset to HDF5 file (never overwrites file).
- Parameters
filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.
- high5py.delete(filepath, name)[source]
Delete group/dataset in HDF5 file.
- Parameters
filepath (str) – Path to HDF5 file.
name (str) – HDF5 name (e.g., /group/old_dataset).
- high5py.exists(filepath, name)[source]
Determine if group/dataset name exists in HDF5 file.
- Parameters
filepath (str) – Path to HDF5 file.
name (str) – HDF5 group/dataset name (e.g., /group/dataset).
- Returns
exists (bool) – Boolean describing if path exists in HDF5 file.
- high5py.from_npz(npz_filepath, h5_filepath)[source]
Load data from an NPZ (compressed numpy archive) file and save to HDF5. NPZ array names are preserved.
- Parameters
npz_filepath (str) – Path to NPZ file.
h5_filepath (str) – Path to HDF5 file.
- high5py.info(filepath, name='/', return_info=False)[source]
Print and return information about HDF5 file/group/dataset.
- Parameters
filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to root group (‘/’).
return_info (bool, optional) – If True, return a dictionary of results. Defaults to False.
- Returns
info (dict, optional) – Dictionary of key, value pairs describing specified file/group/dataset. Only provided if return_info is True.
- high5py.list_all(filepath, name='/', return_info=False)[source]
List all groups and datasets in HDF5 file or group.
- Parameters
filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 group name (e.g., /group). Defaults to root group (‘/’).
return_into (bool, optional) – If True, return a dictionary of results. Defaults to False.
- Returns
info (dict, optional) – Dictionary of key, value pairs describing specified file/group. Only provided if return_info is True.
- high5py.load_attributes(filepath, name='data')[source]
Load HDF5 group/dataset attributes from HDF5 file.
- Parameters
filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
- Returns
attributes (dict) – Dictionary of loaded attributes.
- high5py.load_dataset(filepath, name='data', start_index=None, end_index=None)[source]
Load dataset from HDF5 file.
- Parameters
filepath (str) – Path to HDF5 file.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
start_index (int, optional) – Start index for slicing HDF5 dataset. Providing a slice index here may be more efficient than returning the entire dataset and then slicing. Defaults to None, for which no slicing will be done on the beginning of the dataset.
end_index (int, optional) – End index for slicing HDF5 dataset. Providing a slice index here may be more efficient than returning the entire dataset and then slicing. Defaults to None, for which no slicing will be done on the end of the dataset.
- Returns
data (array-like, scalar, or str) – Dataset values will be returned with same type they were saved (usually some sort of numpy array), except that single-element arrays will be returned as scalars.
- high5py.rename(filepath, old_name, new_name, new_description=None)[source]
Rename group/dataset in HDF5 file.
- Parameters
filepath (str) – Path to HDF5 file.
old_name (str) – Old HDF5 name (e.g., /group/old_dataset).
new_name (str) – New HDF5 name (e.g., /group/new_dataset).
description (str, optional) – New string describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which the old description is kept.
- high5py.replace_dataset(filepath, data, name='data', description=None, compression_level=None)[source]
Replace/overwrite a dataset in an HDF5 file (do not overwrite the whole file).
- Parameters
filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.
- high5py.save_attributes(filepath, attributes, name='data', overwrite=True)[source]
Save HDF5 group/dataset attributes (overwrites existing attributes by default).
- Parameters
filepath (str) – Path to HDF5 file.
attributes (dict) – Attributes to save.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to ‘data’.
overwrite (bool) – If True, saving overwrites existing attributes. Otherwise, new attributes are appended to existing ones. Defaults to True.
- high5py.save_dataset(filepath, data, name='data', description=None, overwrite=True, compression_level=None)[source]
Save dataset to HDF5 file (overwrites file by default).
- Parameters
filepath (str) – Path to HDF5 file.
data (array-like, scalar, or str) – Data to save.
name (str, optional) – HDF5 dataset name (e.g., /group/dataset). Defaults to ‘data’.
description (str, optional) – String describing dataset. Description is saved as an HDF5 attribute of the dataset. Defaults to None, for which no description is saved.
overwrite (bool) – If True, saving overwrites the file. Otherwise, data is appended to the file. Defaults to True.
compression_level (int or None, optional) – Integer from 0 to 9 specifying compression level for gzip filter, which is available on all h5py installations and offers good compression with moderate speed. Defaults to None, for which no compression/filter is applied.
- high5py.to_npz(h5_filepath, npz_filepath, name='/')[source]
Save an HDF5 group/dataset to NPZ (compressed numpy archive) format. Subgroups such as path/group/subgroup/dataset will be saved with array names such as path_group_subgroup_dataset.
- Parameters
h5_filepath (str) – Path to HDF5 file.
npz_filepath (str) – Path to NPZ file.
name (str, optional) – HDF5 group/dataset name (e.g., /group/dataset). Defaults to root group (‘/’).
Release notes
high5py 0.2
A few new features, a small bug fix, and some internal changes.
New features and improvements
Added new method
load_attributes
that loads all attributes from a group/dataset and returns them as a dictionary.
Bug fixes
The
start_index
andend_index
arguments toload_dataset
are now functional. Previously, argument values could be passed in but would not affect the function behavior.Fixed a minor bug in the tutorial by updating the expected exception when trying to append a dataset whose name already exists.
Internal changes
The repository files have been reorganized.
The documentation now uses markdown syntax.
The documentation has been modified to reduce redundant/duplicated files/text.
high5py 0.1
First public release.