All Projects → biocore → Emp

biocore / Emp

Licence: bsd-3-clause
Code repository of the Earth Microbiome Project.

Projects that are alternatives of or similar to Emp

Cadl
ARCHIVED: Contains historical course materials/Homework materials for the FREE MOOC course on "Creative Applications of Deep Learning w/ Tensorflow" #CADL
Stars: ✭ 1,478 (+1185.22%)
Mutual labels:  jupyter-notebook
Text generators
Python code for building a text generator using LSTMs.
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Cpm
Convolutional Pose Machines in TensorFlow
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Quantum Algorithms Tutorials
Tutorials for Quantum Algorithms with Qiskit implementations.
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Gutenberg Poetry Corpus
A corpus of poetry from Project Gutenberg
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
A Nice Mc
Code for "A-NICE-MC: Adversarial Training for MCMC"
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Generative adversarial networks live
Stars: ✭ 114 (-0.87%)
Mutual labels:  jupyter-notebook
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+1218.26%)
Mutual labels:  jupyter-notebook
Robust Physical Attack
Physical adversarial attack for fooling the Faster R-CNN object detector
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Bayareadlschool
Slides and exercises for the Theano tutorial at the Deep Learning School in Stanford, September 24-25, 2016
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Coursera reinforcement learning
Coursera Reinforcement Learning Specialization by University of Alberta & Alberta Machine Intelligence Institute
Stars: ✭ 114 (-0.87%)
Mutual labels:  jupyter-notebook
Reachy
Open source interactive robot to explore real-world applications!
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Baselines Results
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Kg Beijing
北京知识图谱学习小组
Stars: ✭ 1,554 (+1251.3%)
Mutual labels:  jupyter-notebook
Jwst
Python library for science observations from the James Webb Space Telescope
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Machine Learning Demystified
A weekly workshop series at ITP to teach machine learning with a focus on deep learning
Stars: ✭ 114 (-0.87%)
Mutual labels:  jupyter-notebook
Lnpr book codes
Codes for Lecture Notes in Probabilistic Robotics
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Nlp Models Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Stars: ✭ 1,603 (+1293.91%)
Mutual labels:  jupyter-notebook
Flotilla
Reproducible machine learning analysis of gene expression and alternative splicing data
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook
Traffic Signs
Building a CNN based traffic signs classifier.
Stars: ✭ 115 (+0%)
Mutual labels:  jupyter-notebook

Earth Microbiome Project

The Earth Microbiome Project (EMP) is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. Most of the data generated to this point are from 16S rRNA amplicon sequencing - the majority of these data are described, published (article), and are referred to as EMP 16S Release 1. The project also includes additional 16S data, as well as data from 18S and ITS amplicon sequencing, shotgun metagenomic sequencing, and metabolomic profiling - referred to as the EMP Multi-omics. For more information about the EMP -- people, publications, news, protocols and standards, and more -- please see the EMP website.

This GitHub repository describes the EMP catalogue -- how it is generated and how to use it. The EMP dataset is generated from samples that individual researchers have compiled and contributed to the EMP. Samples from each group of researchers represent individual EMP studies. In addition to analyses by contributing researchers on individual studies, we perform cross-study meta-analyses. EMP 16S Release 1, a meta-analysis of the first 97 16S rRNA amplicon studies, has been published (article, preprint), and the code and methods used for that manuscript are provided here. EMP 16S Release 2, currently unpublished, includes additional 16S rRNA amplicon data. We are currently finalizing the EMP500 - a mult-omics meta-analysis of 50 studies including >500 samples each processed for 16S, 18S, ITS amplicon sequencing, shotgun metagenomic sequencing, and metabolic profiling. Methods and standard operating procedures (SOPs) for additional amplicon sequencing, shotgun sequencing, and metabolomics related to EMP 16S release 2 and the EMP500 are provided here as well.

Organization of this repository

This repository contains the directories listed below. Each directory will have contents related to EMP 16S Release 1 and EMP Multi-omics (EMP500).

  • methods Methods used in EMP analyses. Includes sample processing for extraction and sequencing, and computational methods for performing analyses and generating figures for meta-analyses of the EMP dataset.
  • protocols Laboratory protocols and SOPs for sample and metadata collection, sample tracking, amplicon sequencing, shotgun sequencing, and metabolomics.
  • code IPython notebooks and scripts (Python, Java, R, Bash) developed for meta-analysis of EMP data; this code is used in methods.
  • data Data files resulting from or used in processing and analysis.
  • papers Preprints of major meta-analyses of the EMP dataset and links to papers about individual studies.
  • presentations Links to slide decks from presentations on the EMP.
  • legacy Early code, results, and website documents from the initial phase of the EMP (2010-2013).

Getting involved

There are several ways to get involved with the EMP:

  • Use the EMP catalogue in your own research. Download the whole catalogue or just a few studies, merge and analyze them with your own data, or query the catalogue. Please skip to the next section for detailed instructions.
  • Join the analysis team. If you are interested in getting involved with EMP meta-analyses, you can begin by reviewing the open issues on this GitHub page. You can add comments to an existing issue to propose your ideas, or create a new issue entirely. Note that the initial meta-analysis of the EMP has been published. You can view the existing code and methods (instructions) for generating figures for the meta-analysis.
  • Contribute samples. We are not currently soliciting samples for the EMP. If you have an idea for samples you might like to submit in the future, you may email Dr. Justin Shaffer.

Using the EMP catalogue

The EMP catalogue is a diverse and standardized set of thousands of microbiomes for use by the public. Here are some of the ways you can use this resource:

  • Download EMP Release 1 from our FTP site. EMP 16S Release 1 contains merged and quality-filtered mapping files, BIOM tables, OTU/sequence information, and alpha/beta-diversity results for ~25,000 samples in 97 studies of the initial meta-analysis of the EMP. The FTP site contains README files about its contents, and the individual files are listed here.

  • Download individual studies from the Qiita EMP Portal. For each study, you can download metadata (mapping file), feature tables (BIOM file), and demultiplexed raw sequence files. Like the rest of Qiita, the EMP Portal requires the Google Chrome browser.

  • Merge your data with all or part of the EMP dataset. If you sequenced your sample using the EMP 16S rRNA primers and picked OTUs using either Deblur or closed-reference against Greengenes 13.8 or Silva 123, you can merge your BIOM table with the relevant merged EMP 16S Release 1 BIOM table or one of the individual per-study BIOM tables from Qiita. Basic instructions for initial processing of your data are provided. You can then use QIIME1 or QIIME2 to merge the BIOM tables and mapping files.

  • Query the EMP catalogue using Redbiom. Redbiom is a command-line tool that allows users to query the Qiita database, including EMP studies. It allows you to find samples based on the sequences or taxa they contain or on sample metadata, and to export selected sample data and metadata. Once you have Redbiom installed, you can carry out queries such as those described here:

    # First, summarize the contexts available. A context represents a partition by 
    # processing parameters (e.g., closed-reference OTU picking) and preparation 
    # (e.g., 16S V4).
    
    redbiom summarize contexts | cut -f 1,2,3
    
    # Create a variable for the context. For this example, we will use the closed-
    # reference 16S V4 context by setting a local bash variable "ctx". 
    
    ctx=Pick_closed-reference_OTUs-illumina-16S-v4-66f541
    
    # Query 1: "Show me all the genera that were observed at pH > 8."
    # First we search for samples with pH > 8, then select the features from those 
    # samples, then summarize the taxonomy of those features, then grep for just 
    # the genera and count them.
    
    redbiom search metadata "where ph > 8" | redbiom select features-from-samples \
    --context $ctx | redbiom summarize taxonomy --context $ctx | grep g__ | wc -l
    
    # Answer: There are 1423 genera found in samples with pH > 8.
    
    # Query 2: "Show me all sites where Pyrobaculum are found." 
    # First we search for features that are genus Pyrobaculum, then search for 
    # samples containing those features, then fetch sample metadata for those 
    # samples and output the metadata file, then grab the columns for latitude and 
    # longitude (note: these are not guaranteed to reside in columns 10 and 11).
    
    redbiom search taxon --context $ctx g__Pyrobaculum | redbiom search features \
    --context $ctx | redbiom fetch sample-metadata --context $ctx \
    --output g__Pyrobaculum_metadata.txt; cut g__Pyrobaculum_metadata.txt -f 10,11
    

Citing the EMP

If you use the EMP 16S Release 1 data in your research, please cite Thompson et al., "A communal catalogue reveals Earth's multiscale microbial diversity", Nature, 2017 (article).

If you use EMP protocols in your research, please cite earthmicrobiome.org and the relevant papers referenced therein.

File name abbreviation conventions

Some abbreviations used in this repository:

  • demux is shorthand for "demultiplexed", which describes the fastq data after it is split into per-sample fastq files using barcodes.
  • deblur refers to the exact-sequence de novo OTU picking method Deblur.
  • cr refers to closed-reference OTU picking.
  • or refers to open-reference OTU picking.
  • refseqs refers to reference sequence collections that could be used in reference-based OTU picking.
  • mc2 refers to minimum sequence count in an OTU to be included equals to 2.

Finding older data

If you're looking for data generated and used for the ISME 14 EMP presentations, look here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].