Alternatives and detailed information of crazydoc

Edinburgh-Genome-Foundry / crazydoc

Licence: MIT License

Read DNA sequences from colourful Microsoft Word documents

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to crazydoc

poly

A Go package for engineering organisms.

Stars: ✭ 270 (+1400%)

Mutual labels: synthetic-biology, molecular-biology

reg-gen

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.

Stars: ✭ 64 (+255.56%)

Mutual labels: bioinformatics

OpenGene.jl

(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia

Stars: ✭ 60 (+233.33%)

Mutual labels: bioinformatics

GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.

Stars: ✭ 64 (+255.56%)

Mutual labels: bioinformatics

full spectrum bioinformatics

An open-access bioinformatics text

Stars: ✭ 26 (+44.44%)

Mutual labels: bioinformatics

ccs

CCS: Generate Highly Accurate Single-Molecule Consensus Reads (HiFi Reads)

Stars: ✭ 79 (+338.89%)

Mutual labels: bioinformatics

dna-sculpture

3D printed sculpture of a DNA molecule, showing my own genome

Stars: ✭ 22 (+22.22%)

Mutual labels: bioinformatics

chromap

Fast alignment and preprocessing of chromatin profiles

Stars: ✭ 93 (+416.67%)

Mutual labels: bioinformatics

geneview

Genomics data visualization in Python by using matplotlib.

Stars: ✭ 38 (+111.11%)

Mutual labels: bioinformatics

StackedDAE

Stacked Denoising AutoEncoder based on TensorFlow

Stars: ✭ 23 (+27.78%)

Mutual labels: bioinformatics

bio tools

Useful bioinformatic scripts

Stars: ✭ 35 (+94.44%)

Mutual labels: bioinformatics

referenceseeker

Rapid determination of appropriate reference genomes.

Stars: ✭ 65 (+261.11%)

Mutual labels: bioinformatics

flexidot

Highly customizable, ambiguity-aware dotplots for visual sequence analyses

Stars: ✭ 73 (+305.56%)

Mutual labels: bioinformatics

SSAKE

🍶Genome assembly with short sequence reads

Stars: ✭ 20 (+11.11%)

Mutual labels: dna-sequences

perbase

Per-base per-nucleotide depth analysis

Stars: ✭ 46 (+155.56%)

Mutual labels: bioinformatics

adversarial-relation-classification

Unsupervised domain adaptation method for relation extraction

Stars: ✭ 18 (+0%)

Mutual labels: bioinformatics

SumStatsRehab

GWAS summary statistics files QC tool

Stars: ✭ 19 (+5.56%)

Mutual labels: bioinformatics

plasmidtron

Assembling the cause of phenotypes and genotypes from NGS data

Stars: ✭ 27 (+50%)

Mutual labels: bioinformatics

epiviz

EpiViz is a scientific information visualization tool for genetic and epigenetic data, used to aid in the exploration and understanding of correlations between various genome features.

Stars: ✭ 65 (+261.11%)

Mutual labels: bioinformatics

seqviz

DNA sequence viewer supporting custom, GenBank, FASTA, NCBI accession, and iGEM input.

Stars: ✭ 99 (+450%)

Mutual labels: synthetic-biology

View All Similar Projects ➔

https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/crazydoc/badge.svg?branch=master

Crazydoc is a Python library to parse one of the most common DNA representation formats: the joyfully coloured and stylishly annotated MS-Word document.

Crazydoc returns Biopython records of the sequences contained in an MS-Word document, with record features corresponding to the various sequence highlightings (background color, boldness, italics, case change, etc.). The records can saved as GenBanks or easily plotted.

Motivation

While other standards such as FASTA or Genbank are better supported by modern sequence editors, none enjoys the same popularity among molecular biologist as MS-Word's .docx format, which is limited only by the sophistication and creativity of the user.

Relying on a loose syntax and unclear specifications, this format has however suffered from a lack of support in the developers community and is generally incompatible with mainstream software pipelines. This library allows to convert MS-Word DNA sequences to more computing friendly formats: Biopython records, FASTA, or annotated Genbanks.

Usage

To obtain all sequences contained in a docx as annotated Biopython records (such as this one):

from crazydoc import CrazydocParser
parser = CrazydocParser(['highlight_color', 'bold', 'underline'])
biopython_records = parser.parse_doc_file("./example.docx")

You can then plot the obtained records:

from crazydoc import CrazydocSketcher
sketcher = CrazydocSketcher()
for record in biopython_records:
    sketch = sketcher.translate_record(record)
    ax, _ = sketch.plot()
    ax.set_title(record.id)
    ax.figure.savefig('%s.png' % record.id)

To write the sequences down as Genbank records, with annotations:

from crazydoc import records_to_genbank
records_to_genbank(biopython_records)

Note that records_to_genbank() will truncate the record name to 20 characters, to fit in the GenBank format. Additionally, slashes (/) will be replaced with hyphens (-) in the filenames. To read protein sequences, pass is_protein=True:

biopython_records = parse_doc_file(protein_path, is_protein=True)

This will return protein records, which will be saved with a GenPept extension (.gp) by records_to_genbank(biopython_records, is_protein=True), unless specified otherwise with extension=.

Installation

You can install crazydoc through PIP:

sudo pip install crazydoc

Alternatively, you can unzip the sources in a folder and type:

sudo python setup.py install

License = MIT

Everyone is welcome to contribute!

More biology software

https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png

Crazydoc is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Edinburgh-Genome-Foundry / crazydoc

Programming Languages

Labels

Projects that are alternatives of or similar to crazydoc

Usage

Installation

License = MIT

More biology software