Crimson
Crimson converts non-standard bioinformatics tool outputs to JSON or YAML.
Currently it can convert outputs of the following tools:
- FastQC (
fastqc
) - FusionCatcher (
fusioncatcher
) - samtools flagstat (
flagstat
) - Picard metrics tools (
picard
) - STAR log file (
star
) - STAR-Fusion hits table (
star-fusion
) - Variant Effect Predictor
plain text output (
vep
)
The conversion can be done using the command line interface or by calling the tool-specificparser functions in your Python script.
Installation
Crimson is available on the Python Package Index
and you can install it via pip
:
$ pip install crimson
It is also available on
BioConda, both through the
conda
package manager or as a
Docker container.
For running as Docker, you may also use the GitHub Docker registry. This registry hosts the latest version, but does not host any versions from 1.1.0 and earlier.
docker pull ghcr.io/bow/crimson
Usage
As a command line tool
The general command is crimson {program_name}
and by default the output is written to
stdout
. For example, to use the picard
parser, you would execute:
$ crimson picard /path/to/a/picard.metrics
You can also specify a file name directly to write to a file. The following command will
write the output to a file named converted.json
:
$ crimson picard /path/to/a/picard.metrics converted.json
Some parsers may also accept additional input format. The FastQC parser, for example, also works if you specify a path to a FastQC output directory:
$ crimson fastqc /path/to/a/fastqc/dir
or path to a zipped result:
$ crimson fastqc /path/to/a/fastqc_result.zip
When in doubt, use the --help
flag:
$ crimson --help # for the general help
$ crimson fastqc --help # for parser-specific (FastQC) help
As a Python library function
Generally, the function to import is located at crimson.{program_name}.parser
. For
example, to use the picard
parser in your script, you can do:
from crimson import picard
# You can specify the input file name as a string ...
parsed = picard.parse("/path/to/a/picard.metrics")
# ... or a file handle
with open("/path/to/a/picard.metrics") as src:
parsed = picard.parse(src)
Why?
- Not enough tools use standard output formats.
- Writing and re-writing the same parsers across different scripts is not a productive way to spend the day.
Local Development
Setting up a local development requires that you set up all of the supported Python versions. We use pyenv for this.
# Clone the repository and cd into it.
$ git clone https://github.com/bow/crimson
$ cd crimson
# Create your local development environment.
$ make install-dev
# Run the test and linter suite to verify the setup.
$ make lint test
# Whenever in doubt, just run `make` without any arguments.
$ make
Contributing
If you are interested, Crimson accepts the following types contribution:
- Documentation additions (if anything seems unclear, feel free to open an issue)
- Bug reports
- Support for tools' outputs which can be converted to JSON or YAML.
For any of these, feel free to open an issue in the issue tracker or submit a pull request.
License
Crimson is BSD-licensed. Refer to the LICENSE
file for the full license.