All Projects → jamesmudd → jhdf

jamesmudd / jhdf

Licence: MIT license
A pure Java HDF5 library

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to jhdf

Uproot3
ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+275.9%)
Mutual labels:  bigdata, file-format
Vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Stars: ✭ 6,793 (+8084.34%)
Mutual labels:  bigdata, hdf5
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-3.61%)
Mutual labels:  bigdata, file-format
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+51.81%)
Mutual labels:  bigdata
zipdump
Analyze zipfile, either local, or from url
Stars: ✭ 25 (-69.88%)
Mutual labels:  file-format
greycat
GreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (+25.3%)
Mutual labels:  bigdata
npy2bdv
Fast writing of numpy 3d-arrays into HDF5 Fiji/BigDataViewer files.
Stars: ✭ 25 (-69.88%)
Mutual labels:  hdf5
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+34.94%)
Mutual labels:  bigdata
EMsoft
Public EMsoft repository
Stars: ✭ 44 (-46.99%)
Mutual labels:  hdf5
2019 egu workshop jupyter notebooks
Short course on interactive analysis of Big Earth Data with Jupyter Notebooks
Stars: ✭ 29 (-65.06%)
Mutual labels:  bigdata
nix
Neuroscience information exchange format
Stars: ✭ 64 (-22.89%)
Mutual labels:  file-format
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-34.94%)
Mutual labels:  bigdata
jupyterlab-h5web
A JupyterLab extension to explore and visualize HDF5 file contents. Based on https://github.com/silx-kit/h5web.
Stars: ✭ 41 (-50.6%)
Mutual labels:  hdf5
chatnoir-resiliparse
A robust web archive analytics toolkit
Stars: ✭ 26 (-68.67%)
Mutual labels:  bigdata
albis
Albis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-75.9%)
Mutual labels:  file-format
PersonNotes
个人笔记集中营,快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧
Stars: ✭ 61 (-26.51%)
Mutual labels:  bigdata
dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+62.65%)
Mutual labels:  bigdata
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-75.9%)
Mutual labels:  bigdata
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-55.42%)
Mutual labels:  bigdata
ReClassicfication
Maybe one day a WINE-style implementation of the classic Mac Toolbox.
Stars: ✭ 29 (-65.06%)
Mutual labels:  file-format

jHDF - Pure Java HDF5 library

jHDF CI Coverage Maven Central Javadocs JetBrains Supported DOI

This project is a pure Java implementation for accessing HDF5 files. It is written from the file format specification and is not using any HDF Group code, it is not a wrapper around the C libraries. The file format specification is available from the HDF Group here. More information on the format is available on Wikipedia.

The intention is to make a clean Java API to access HDF5 data. Currently, the project is targeting HDF5 read-only compatibility. For progress see the change log. Java 8, 11 and 17 are officially supported.

Here is an example of reading a dataset with jHDF (see ReadDataset.java)

try (HdfFile hdfFile = new HdfFile(Paths.get("/path/to/file.hdf5")) {
	Dataset dataset = hdfFile.getDatasetByPath("/path/to/dataset");
	// data will be a Java array with the dimensions of the HDF5 dataset
	Object data = dataset.getData();
}

For an example of traversing the tree inside a HDF5 file see PrintTree.java. For accessing attributes see ReadAttribute.java.

Why should I use jHDF?

  • Easy integration with JVM based projects. The library is available on Maven Central, and GitHub Packages, so using it should be as easy as adding any other dependency. To use the libraries supplied by the HDF Group you need to load native code, which means you need to handle this in your build, and it complicates distribution of your software on multiple platforms.
  • The API design intends to be familiar to Java programmers, so hopefully it works as you might expect. (If this is not the case, open an issue with suggestions for improvement)
  • No use of JNI, so you avoid all the issues associated with calling native code from the JVM.
  • Fully debug-able you can step fully through the library with a Java debugger.
  • Provides access to datasets ByteBuffers to allow for custom reading logic, or integration with other libraries.
  • Integration with Java logging via SLF4J
  • Performance? Maybe, the library uses Java NIO MappedByteBuffers which should provide fast file access. In addition, when accessing chunked datasets the library is parallelized to take advantage of modern CPUs. jHDF will also allow parallel reading of multiple datasets or multiple files. I have seen cases where jHDF is significantly faster than the C libraries, but as with all performance issues, it is case specific, so you will need to do your own tests on the cases you care about. If you do run tests please post the results so everyone can benefit, here are some results I am aware of:

Why should I not use jHDF?

  • If you want to write HDF5 files. Currently, this is not supported. This will be supported in the future, but full read-only compatibility is currently the goal. If you would be intrested in this please comment on, or react to issue #354.
  • If jHDF does not yet support a feature you need. If this is the case you should receive a UnsupportedHdfException, open an issue and support can be added. For scheduling, the features which will allow the most files to be read are prioritized. If you really want to use a new feature feel free to work on it and open a PR, any help is much appreciated.
  • If you want to read slices of chunked datasets (slicing of contiguous datasets is supported since v0.6.6). This is an excellent feature of HDF5, and one reason why it's suited to large datasets. Support will be added in the future, but currently it is not possible. If you would be interested in this please comment on, or react to issue #52
  • If you want to read datasets larger than can fit in a Java array (i.e. Integer.MAX_VALUE elements). This issue would also be addressed by slicing.

Why did I start jHDF?

Mostly it's a challenge, HDF5 is a fairly complex file format with lots of flexibility, writing a library to access it is interesting. Also, as a widely used file format for storing scientific, engineering, and commercial data, it would seem like a good idea to be able to read HDF5 files with more than one library. In particular JVM languages are among the most widely used so having a native HDF5 implementation seems useful.

Developing jHDF

  • Fork this repository and clone your fork
  • Inside the jhdf directory run ./gradlew build (./gradlew.bat build on Windows) this will run the build and tests fetching dependencies.
  • Import the Gradle project jhdf into your IDE.
  • Make your changes and add tests.
  • Run ./gradlew check to run the build and tests.
  • Once you have made any changes please open a pull request.

To see other available Gradle tasks run ./gradlew tasks

If you have read this far please consider staring this repo. If you are using jHDF in a commercial product please consider making a donation. Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].