All Projects → scikit-hep → root_pandas

scikit-hep / root_pandas

Licence: MIT license
A Python module for conveniently loading/saving ROOT files as pandas DataFrames

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to root pandas

decaylanguage
Package to parse decay files, describe and convert particle decays between digital representations.
Stars: ✭ 34 (-68.52%)
Mutual labels:  analysis, hep, scikit-hep
particle
Package to deal with particles, the PDG particle data table, PDGIDs, etc.
Stars: ✭ 113 (+4.63%)
Mutual labels:  analysis, hep, scikit-hep
pyjet
The interface between FastJet and NumPy
Stars: ✭ 31 (-71.3%)
Mutual labels:  hep, scikit-hep
numpythia
The interface between PYTHIA and NumPy
Stars: ✭ 33 (-69.44%)
Mutual labels:  hep, scikit-hep
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-25.93%)
Mutual labels:  analysis, hep
pylhe
Lightweight Python interface to read Les Houches Event (LHE) files
Stars: ✭ 29 (-73.15%)
Mutual labels:  hep, scikit-hep
Uproot3
ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+188.89%)
Mutual labels:  analysis, hep
weaver
Streamlined neural network training.
Stars: ✭ 22 (-79.63%)
Mutual labels:  hep, root-cern
Corpuscles.jl
Julia package for particle physics
Stars: ✭ 25 (-76.85%)
Mutual labels:  analysis, hep
Scikit Hep
Metapackage of Scikit-HEP project data analysis packages for Particle Physics.
Stars: ✭ 131 (+21.3%)
Mutual labels:  analysis, hep
UnROOT.jl
Native Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (-51.85%)
Mutual labels:  analysis, hep
pingnoo
An open-source cross-platform traceroute/ping analyser.
Stars: ✭ 149 (+37.96%)
Mutual labels:  analysis
pathpy
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
Stars: ✭ 124 (+14.81%)
Mutual labels:  analysis
jacoco-report
Github action that publishes the JaCoCo report as a comment in the Pull Request
Stars: ✭ 31 (-71.3%)
Mutual labels:  analysis
Odysis
Jupyter Interactive Widgets library for 3-D mesh analysis
Stars: ✭ 15 (-86.11%)
Mutual labels:  analysis
cis
Home of the Community Intercomparison Suite.
Stars: ✭ 30 (-72.22%)
Mutual labels:  analysis
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-84.26%)
Mutual labels:  analysis
tnb-analysis
Gain insights about thenewboston digital crypto currency network by doing some analysis
Stars: ✭ 24 (-77.78%)
Mutual labels:  analysis
very good analysis
Lint rules for Dart and Flutter used internally at Very Good Ventures 🦄
Stars: ✭ 194 (+79.63%)
Mutual labels:  analysis
shared-latent-space
Shared Latent Space VAE's
Stars: ✭ 15 (-86.11%)
Mutual labels:  analysis

⚠️root_pandas is deprecated and unmaintained⚠️

root_pandas is built upon root_numpy which has not been actively maintained in several years. This is mostly due to the emergence of new alternatives which are both faster and more flexible.

root_pandas: conveniently loading/saving ROOT files as pandas DataFrames

PyPI DOI Build Status Coverage Status

root_pandas is a convenience package built around the root_numpy library. It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics.

It's modeled closely after the existing pandas API for reading and writing HDF5 files. This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa.

On top of that, root_pandas offers several features that go beyond what pandas offers with read_hdf and to_hdf.

These include

  • Specifying multiple input filenames, in which case they are read as if they were one continuous file.
  • Selecting several columns at once using * globbing and {A,B} shell patterns.
  • Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables.

Python versions supported:

Reading ROOT files

This is how you can read the contents of a ROOT file into a DataFrame:

from root_pandas import read_root

df = read_root('myfile.root')

If there are several ROOT trees in the input file, you have to specify the tree key:

df = read_root('myfile.root', 'mykey')

You can also directly read multiple ROOT files at once by passing a list of file names:

df = read_root(['file1.root', 'file2.root'], 'mykey')

In this case, each file must have the same set of columns under the given key.

Specific columns can be selected like this:

df = read_root('myfile.root', columns=['variable1', 'variable2'])

You can also use * in the column names to read in any matching branch:

df = read_root('myfile.root', columns=['variable*'])

In addition, you can use shell brace patterns as in

df = read_root('myfile.root', columns=['variable{1,2}'])

You can also use * and {a,b} simultaneously, and several times per string.

If you want to transform your variables using a ROOT selection string, you have to put a noexpand: prefix in front of the column name that you want to use the selection string in:

df = read_root('myfile.root', columns=['noexpand:sqrt(variable1)']

Working with stored arrays can be a bit inconventient in pandas. root_pandas makes it easy to flatten your input data, providing you with a DataFrame containing only scalars:

df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=['arrayvariable'])

Assuming the ROOT file contains the array [1, 2, 3] in the first arrayvariable column, flattening will expand this into three entries, where each contains one of the array elements. All other scalar entries are duplicated. The automatically created __array_index column also allows you to get the index that each array element had in its array before flattening.

There is also support for working with files that don't fit into memory: If the chunksize parameter is specified, read_root returns an iterator that yields DataFrames, each containing up to chunksize rows.

for df in read_root('bigfile.root', chunksize=100000):
    # process df here

If bigfile.root doesn't contain an index, the default indices of the individual DataFrame chunks will still increase continuously, as if they were parts of a single large DataFrame.

You can also combine any of the above options at the same time.

Reading in chunks also supports progress bars

from progressbar import ProgressBar
pbar = ProgressBar()
for df in pbar(read_root('bigfile.root', chunksize=100000)):
    # process df here

# or
from tqdm import tqdm
for df in tqdm(read_root('bigfile.root', chunksize=100000), unit='chunks'):
    # process df here

Writing ROOT files

root_pandas patches the pandas DataFrame to have a to_root method that allows you to save it into a ROOT file:

df.to_root('out.root', key='mytree')

You can also call the to_root function and specify the DataFrame as the first argument:

to_root(df, 'out.root', key='mytree')

By default, to_root erases the existing contents of the file. Use mode='a' to append:

for df in read_root('bigfile.root', chunksize=100000):
    df.to_root('out.root', mode='a')

Warning: When using this feature to stream data from one ROOT file into another, you shouldn't forget to os.remove the output file first, otherwise you will append more and more data to it on each run of your program.

The DataFrame index

When reading a ROOT file, root_pandas will automatically add a pandas index to the DataFrame, which starts at 1 and counts up for each entry. When writing the DataFrame to a ROOT file, it stores the DataFrame index in a __index__ branch. Currently, only single-dimensional indices are supported.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].