All Projects → quinlan-lab → hts-python

quinlan-lab / hts-python

Licence: MIT license
pythonic wrapper for htslib

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to hts-python

hts-python
pythonic wrapper for libhts (moved to: https://github.com/quinlan-lab/hts-python)
Stars: ✭ 48 (+166.67%)
Mutual labels:  genomics, sam, bam, htslib
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+372.22%)
Mutual labels:  genomics, sam, bam
simplesam
Simple pure Python SAM parser and objects for working with SAM records
Stars: ✭ 50 (+177.78%)
Mutual labels:  genomics, sam, bam
Genozip
Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too
Stars: ✭ 53 (+194.44%)
Mutual labels:  genomics, sam
pheniqs
Fast and accurate sequence demultiplexing
Stars: ✭ 14 (-22.22%)
Mutual labels:  sam, bam
BioD
A D library for computational biology and bioinformatics
Stars: ✭ 45 (+150%)
Mutual labels:  sam, bam
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+1122.22%)
Mutual labels:  genomics, sam
bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-11.11%)
Mutual labels:  sam, bam
bin
My bioinfo toolbox
Stars: ✭ 42 (+133.33%)
Mutual labels:  sam, bam
fuc
Frequently used commands in bioinformatics
Stars: ✭ 23 (+27.78%)
Mutual labels:  sam, bam
adapt
A package for designing activity-informed nucleic acid diagnostics for viruses.
Stars: ✭ 16 (-11.11%)
Mutual labels:  genomics
bxtools
Tools for analyzing 10X Genomics data
Stars: ✭ 39 (+116.67%)
Mutual labels:  genomics
get phylomarkers
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+88.89%)
Mutual labels:  genomics
rnaseq-nf
A proof of concept of RNAseq pipeline
Stars: ✭ 44 (+144.44%)
Mutual labels:  genomics
aws-sam-build-images
AWS SAM build images
Stars: ✭ 21 (+16.67%)
Mutual labels:  sam
RATTLE
Reference-free reconstruction and error correction of transcriptomes from Nanopore long-read sequencing
Stars: ✭ 35 (+94.44%)
Mutual labels:  genomics
MindTheGap
MindTheGap is a SV caller for short read sequencing data dedicated to insertion variants (all sizes and types). It can also be used as a local assembly tool.
Stars: ✭ 30 (+66.67%)
Mutual labels:  genomics
macrel
Predict AMPs in (meta)genomes and peptides
Stars: ✭ 34 (+88.89%)
Mutual labels:  genomics
HumanIdiogramLibrary
Resource of human chromosome schematics & images
Stars: ✭ 76 (+322.22%)
Mutual labels:  genomics
souporcell
Clustering scRNAseq by genotypes
Stars: ✭ 88 (+388.89%)
Mutual labels:  genomics

hts-python

pythonic wrapper for htslib C-API using python cffi.

There is enough functionality for this to be useful, but it still needs a lot of work.

Build Status codecov Docs

A taste

>>> import os.path as op

>>> from hts import Bam
>>> bam = Bam("hts/test/small.bam") #bam stolen from pybedtools [thanks]
>>> list(bam.header.seqs)
['chr2L', 'chr2R', 'chr3L', 'chr3R', 'chr4', 'chrX']

# region query creates index if needed:
>>> a = next(bam('chr2L:9000-11000'))
>>> a
Alignment('HWUSI-NAME:2:69:512:1017#0')
>>> a.target, a.pos, a.strand
('chr2L', 9329, '-')
>>> a.qlen, a.rlen
(36, 36)
>>> a.strand
'-'
>>> a.seq
'TACAAATCTTACGTAAACACTCCAAGCATGAATTCG'
>>> a.qual[:10]
[56, 63, 53, 62, 64, 62, 51, 44, 58, 59]

>>> a.flag, a.flag_str
(16, 'REVERSE')

>>> a.cigar
Cigar('36M')

>>> str(a)[:40]
'HWUSI-NAME:2:69:512:1017#0\t16\tchr2L\t9330'

There are also wrappers for:

  • Fai for fasta querying fasta files.
  • Tbx for tabix files (indexed bed/gff/sam, etc.).
  • fisher for fisher's exact test.

Installation

  1. Install [htslib](https://github.com/samtools/htslib.git htslib) using make install
  2. pip/easy_install python cffi.
  3. run python setup.py install (--user) from this directory.

Development

This is a work in progress that relies on the hts library. All of the wrapped functions are included in hts/hts_concat.h and then available from python as, e.g. htslib.sam_read1

When C-functions not provided by the api are needed, they are added to hts_extra.c/.h.

One can run the tests with: python -c "import hts; hts.doctests()"

There is enough functionality for this to be quite useful but most of it is limited to getters, not setters, to, for example update an INFO field or modify the bam quality scores.

Things to work on:

  1. Make properties settable in hts.bam. Currently, they are read-only properties. At very least, it will be useful to have setters for seq, base_q, qname, tname, pos, strand, flag.

  2. Wrap B/VCF stuff? (in progress)

Why

Why use this when pysam exists? It's an experiment with python cffi and to provide a pythonic access to htslib.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].