All Projects → mgalardini → Pyseer

mgalardini / Pyseer

Licence: apache-2.0
SEER, reimplemented in python 🐍🔮

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyseer

openscience
Empirical Software Engineering journal (EMSE) open science and reproducible research initiative
Stars: ✭ 28 (-37.78%)
Mutual labels:  reproducible-science
single-cell-papers-with-code
Papers with code for single cell related papers
Stars: ✭ 20 (-55.56%)
Mutual labels:  reproducible-science
FGMachine
Future Gadget Machine
Stars: ✭ 66 (+46.67%)
Mutual labels:  reproducible-science
ngs-preprocess
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Stars: ✭ 22 (-51.11%)
Mutual labels:  reproducible-science
ReproducibleScience
Short course on reproducible science: what, why, how
Stars: ✭ 23 (-48.89%)
Mutual labels:  reproducible-science
r10e-ds-py
Reproducible Data Science in Python (SciPy 2019 Tutorial)
Stars: ✭ 12 (-73.33%)
Mutual labels:  reproducible-science
hydra-zen
Pythonic functions for creating and enhancing Hydra applications
Stars: ✭ 165 (+266.67%)
Mutual labels:  reproducible-science
Rrtools
rrtools: Tools for Writing Reproducible Research in R
Stars: ✭ 508 (+1028.89%)
Mutual labels:  reproducible-science
researchcompendium
NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach
Stars: ✭ 26 (-42.22%)
Mutual labels:  reproducible-science
papers-as-modules
Software Papers as Software Modules: Towards a Culture of Reusable Results
Stars: ✭ 18 (-60%)
Mutual labels:  reproducible-science
showyourwork
Fully reproducible, open source scientific articles in LaTeX.
Stars: ✭ 361 (+702.22%)
Mutual labels:  reproducible-science
Reproducibilidad
Reproducible Science: what, why, how
Stars: ✭ 39 (-13.33%)
Mutual labels:  reproducible-science
reprozip-examples
Examples and demos for ReproZip
Stars: ✭ 13 (-71.11%)
Mutual labels:  reproducible-science
ukbrest
ukbREST: efficient and streamlined data access for reproducible research of large biobanks
Stars: ✭ 32 (-28.89%)
Mutual labels:  reproducible-science
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
Stars: ✭ 3,678 (+8073.33%)
Mutual labels:  reproducible-science
analysis-flow
Data Analysis Workflows & Reproducibility Learning Resources
Stars: ✭ 108 (+140%)
Mutual labels:  reproducible-science
emp
🔬 Empirical CLI
Stars: ✭ 42 (-6.67%)
Mutual labels:  reproducible-science
Analysispreservation.cern.ch
Source code for the CERN Analysis Preservation portal
Stars: ✭ 37 (-17.78%)
Mutual labels:  reproducible-science
Wdl
Workflow Description Language - Specification and Implementations
Stars: ✭ 438 (+873.33%)
Mutual labels:  reproducible-science
SuperNNova
Open Source Photometric classification https://supernnova.readthedocs.io
Stars: ✭ 18 (-60%)
Mutual labels:  reproducible-science

pyseer

SEER, reimplemented in python by Marco Galardini and John Lees

pyseer --phenotypes phenotypes.tsv --kmers kmers.gz --distances structure.tsv --min-af 0.01 --max-af 0.99 --cpu 15 --filter-pvalue 1E-8

Run tests Documentation Status Anaconda package

Motivation

Kmers-based GWAS analysis is particularly well suited for bacterial samples, given their high genetic variability. This approach has been implemented by Lees, Vehkala et al., in the form of the SEER software.

The reimplementation presented here should be consistent with the current version of the C++ seer (though we do not guarantee this for all possible cases).

In this version, as well as all the original features, many new features (input types, association models and output parsing) have been implemented. See the documentation and paper for full details.

Citations

Unitigs and elastic net preprint: Lees, John A., Tien Mai, T., et al. Improved inference and prediction of bacterial genotype-phenotype associations using pangenome-spanning regressions. bioRxiv 852426 (2019) doi: 10.1101/852426

pyseer and LMM implementation paper: Lees, John A., Galardini, M., et al. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34:4310–4312 (2018). doi: 10.1093/bioinformatics/bty539

Original SEER implementation paper: Lees, John A., et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature communications 7:12797 (2016). doi: 10.1038/ncomms12797

Documentation

Full documentation is available at readthedocs.

You can also build the docs locally (requires sphinx) by typing:

cd docs/
make html

Prerequisites

Between parenthesis the versions the script was tested against:

  • python 3+ (3.6.6)
  • numpy (1.15.2)
  • scipy (1.1.0)
  • pandas (0.23.4)
  • scikit-learn (0.20.0)
  • statsmodels (0.9.0)
  • pysam (0.15.1)
  • glmnet_python (commit 946b65c)
  • DendroPy (4.4.0)

If you would like to use the scree_plot_pyseer script you will also need to have matplotlib installed. If you would like to use the scripts to map and annotate kmers, you will also need bwa, bedtools, bedops and pybedtools installed.

Installation

The easiest way to install pyseer and its dependencies is through conda::

conda install pyseer

If you need conda, download miniconda and add the necessary channels::

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

pyseer can also be installed through pip; download this repository (or one of the releases), cd into it, followed by:

python -m pip install .

If you want multithreading make sure that you are using a version 3 python interpreter:

python3 -m pip install .

If you want the next pre-release, just clone/download this repository and run:

python pyseer-runner.py

Copyright

Copyright 2017 EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].