All Projects → nlppln → Nlppln

nlppln / Nlppln

Licence: apache-2.0
NLP pipeline software using common workflow language

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Nlppln

bistro
A library to build and execute typed scientific workflows
Stars: ✭ 43 (+38.71%)
Mutual labels:  workflow, pipeline
Targets
Function-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+845.16%)
Mutual labels:  pipeline, workflow
DNAscan
DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.
Stars: ✭ 36 (+16.13%)
Mutual labels:  workflow, pipeline
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+300%)
Mutual labels:  pipeline, workflow
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+2519.35%)
Mutual labels:  pipeline, workflow
Batchflow
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (+403.23%)
Mutual labels:  pipeline, workflow
cli-property-manager
Use this Property Manager CLI to automate Akamai property changes and deployments across many environments.
Stars: ✭ 22 (-29.03%)
Mutual labels:  workflow, pipeline
Machine
Machine is a workflow/pipeline library for processing data
Stars: ✭ 78 (+151.61%)
Mutual labels:  pipeline, workflow
Toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
Stars: ✭ 733 (+2264.52%)
Mutual labels:  pipeline, workflow
Pipeline
Pipeline is a package to build multi-staged concurrent workflows with a centralized logging output.
Stars: ✭ 433 (+1296.77%)
Mutual labels:  pipeline, workflow
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+261.29%)
Mutual labels:  pipeline, workflow
Cookiecutter
DEPRECIATED! Please use nf-core/tools instead
Stars: ✭ 18 (-41.94%)
Mutual labels:  pipeline, workflow
Vistrails
VisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the computational processes that derive these products and their executions.
Stars: ✭ 94 (+203.23%)
Mutual labels:  pipeline, workflow
Rnaseq Workflow
A repository for setting up a RNAseq workflow
Stars: ✭ 170 (+448.39%)
Mutual labels:  pipeline, workflow
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+4096.77%)
Mutual labels:  pipeline, workflow
snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
Stars: ✭ 56 (+80.65%)
Mutual labels:  workflow, text-mining
Atacseq
ATAC-seq peak-calling, QC and differential analysis pipeline
Stars: ✭ 72 (+132.26%)
Mutual labels:  pipeline, workflow
Flowr
Robust and efficient workflows using a simple language agnostic approach
Stars: ✭ 73 (+135.48%)
Mutual labels:  pipeline, workflow
Rnaseq
RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
Stars: ✭ 305 (+883.87%)
Mutual labels:  pipeline, workflow
Scipipe
Robust, flexible and resource-efficient pipelines using Go and the commandline
Stars: ✭ 826 (+2564.52%)
Mutual labels:  pipeline, workflow

NLP Pipeline

|codacy_grade| |travis| |documentation| |pypi_version| |pypi_supported| |zenodo|

nlppln is a python package for creating NLP pipelines using Common Workflow Language <http://www.commonwl.org/>_ (CWL). It provides steps for (generic) NLP functionality, such as tokenization, lemmatization, and part of speech tagging, and helps users to construct workflows from these steps.

A text processing step consist of a (Python) command line tool and a CWL specification to use this tool. Most tools provided by nppln wrap existing NLP functionality. The command line tools are made with Click <http://click.pocoo.org>_, a Python package for creating command line interfaces.

To create a workflow, you have to write a Python script: ::

from nlppln import WorkflowGenerator

with WorkflowGenerator() as wf: txt_dir = wf.add_input(txt_dir='Directory')

frogout = wf.frog_dir(in_dir=txt_dir)
saf = wf.frog_to_saf(in_files=frogout)
ner_stats = wf.save_ner_data(in_files=saf)
new_saf = wf.replace_ner(metadata=ner_stats, in_files=saf)
txt = wf.saf_to_txt(in_files=new_saf)

wf.add_outputs(ner_stats=ner_stats, txt=txt)

wf.save('anonymize.cwl')

The resulting workflow can be run using a CWL runner, such as cwltool <https://github.com/common-workflow-language/cwltool/>_:

.. code-block:: sh

cwltool anonymize.cwl --txt_dir /path/to/directory/with/txt/files/

For creating new (e.g., project specific) NLP functionality, you can use nlppln-gen <https://github.com/nlppln/nlppln-gen>_ to generate boilerplate (i.e., empty) command line tools and CWL specifications.

The full documentation can be found on Read the Docs <http://nlppln.readthedocs.io/en/latest/>_.

Installation ############

Install nlppln using pip:

.. code-block:: sh

pip install nlppln

Please check the installation guidelines <http://nlppln.readthedocs.io/en/latest/installation.html>_ for additional required software.

License #######

Copyright (c) 2016-2018, Netherlands eScience Center, University of Twente

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

.. |codacy_grade| image:: https://api.codacy.com/project/badge/Grade/24cd15fe1d9e4a51ab4be8c247e95c47 :target: https://www.codacy.com/app/jvdzwaan/nlppln?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=nlppln/nlppln&amp;utm_campaign=Badge_Grade :alt: Codacy Badge

.. |travis| image:: https://travis-ci.org/nlppln/nlppln.svg?branch=master :target: https://travis-ci.org/nlppln/nlppln :alt: Build Status

.. |documentation| image:: https://readthedocs.org/projects/nlppln/badge/?version=latest :target: http://nlppln.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. |pypi_version| image:: https://badge.fury.io/py/nlppln.svg :target: https://badge.fury.io/py/nlppln :alt: PyPI version

.. |pypi_supported| image:: https://img.shields.io/pypi/pyversions/nlppln.svg :target: https://pypi.python.org/pypi/nlppln :alt: PyPI

.. |zenodo| image:: https://zenodo.org/badge/65198876.svg :target: https://zenodo.org/badge/latestdoi/65198876 :alt: DOI

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].