All Projects → asyml → Forte

asyml / Forte

Licence: apache-2.0
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Forte

Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+6067.42%)
Mutual labels:  natural-language-processing, natural-language
Drl4nlp.scratchpad
Notes on Deep Reinforcement Learning for Natural Language Processing papers
Stars: ✭ 26 (-70.79%)
Mutual labels:  information-retrieval, natural-language-processing
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+556.18%)
Mutual labels:  information-retrieval, natural-language-processing
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Stars: ✭ 500 (+461.8%)
Mutual labels:  information-retrieval, natural-language-processing
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-56.18%)
Mutual labels:  natural-language-processing, natural-language
Deep Semantic Similarity Model
My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
Stars: ✭ 509 (+471.91%)
Mutual labels:  information-retrieval, natural-language-processing
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+6397.75%)
Mutual labels:  natural-language-processing, natural-language
Nlp Python Deep Learning
NLP in Python with Deep Learning
Stars: ✭ 374 (+320.22%)
Mutual labels:  natural-language-processing, natural-language
Wikisql
A large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (+984.27%)
Mutual labels:  natural-language-processing, natural-language
Pke
Python Keyphrase Extraction module
Stars: ✭ 855 (+860.67%)
Mutual labels:  information-retrieval, natural-language-processing
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+416.85%)
Mutual labels:  information-retrieval, natural-language-processing
Scdv
Text classification with Sparse Composite Document Vectors.
Stars: ✭ 54 (-39.33%)
Mutual labels:  information-retrieval, natural-language-processing
Botlibre
An open platform for artificial intelligence, chat bots, virtual agents, social media automation, and live chat automation.
Stars: ✭ 412 (+362.92%)
Mutual labels:  natural-language-processing, natural-language
Languagetool
Style and Grammar Checker for 25+ Languages
Stars: ✭ 5,641 (+6238.2%)
Mutual labels:  natural-language-processing, natural-language
Cogcomp Nlp
CogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (+360.67%)
Mutual labels:  natural-language-processing, natural-language
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 636 (+614.61%)
Mutual labels:  data-processing, natural-language-processing
Link Grammar
The CMU Link Grammar natural language parser
Stars: ✭ 286 (+221.35%)
Mutual labels:  natural-language-processing, natural-language
Lingua
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (+283.15%)
Mutual labels:  natural-language-processing, natural-language
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+849.44%)
Mutual labels:  information-retrieval, natural-language-processing
Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+8908.99%)
Mutual labels:  natural-language-processing, natural-language



Build Status codecov Documentation Status License Chat

Forte is a toolkit for building Natural Language Processing pipelines, featuring cross-task interaction, adaptable data-model interfaces and composable pipeline. Forte was originally developed in CMU and is actively contributed by Petuum in collaboration with other institutes. This project is part of the CASL Open Source family.

Forte provides a platform to assemble state-of-the-art NLP and ML technologies in a highly-composable fashion, including a wide spectrum of tasks ranging from Information Retrieval, Natural Language Understanding to Natural Language Generation.

With Forte, it is extremely simple to build an integrated system that can search documents, analyze, extract information and generate language all in one place. This allows developers to fully utilize the strength of individual module, combine the results from each step, and enables the system to make fully informed decision at the end of the pipeline.

Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these examples!), but also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and a convenient model-data interface for casting tasks to models.

Core Design Principles

The core design principle of Forte is the abstraction of NLP concepts and machine learning models. It not only separates data, model and tasks but also enables interactions between different components of the pipeline. Based on this principle, we make Forte:

  • Composable: Forte helps users to decompose a problem into data, models and tasks. The tasks can further be divided into sub-tasks. A complex use case can be solved by composing heterogeneous modules via straightforward python APIs or declarative configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly swapped in and out, as long as the API contracts are matched. This approach greatly improves module reusability, enables fast development and enhances the flexibility of using libraries.

  • Generalizable and Extensible: Forte not only generalizes well on a wide range of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte provides the Ontology system that helps users define types according to their specific tasks. Users can declaratively specify the type through simple JSON files and our Code Generation tool will automatically generate ready-to-use python files for your project. Check out our Ontology Generation documentation for more details.

  • Universal Data Flow: Forte enables a universal data flow that supports seamless data flow between different steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible process interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as a perfect tool for data inspection, component swapping and result sharing. This is particularly helpful during team collaborations!


forte_arch.jpg
A high level Architecture of Forte showing how ontology and entries work with the pipeline.

forte_results.jpg
Forte stores results in data packs and use the ontology to represent task logic.

Package Overview

forte an open-source toolkit for NLP
forte.data.readers a data module for reading different formats of text data like CoNLL, Ontonotes etc
forte.processors a collection of processors for building NLP pipelines
forte.trainer a collection of modules for training different NLP tasks
ft.onto.base_ontology a module containing basic ontologies like Token, Sentence, Document etc

Library API Example

A simple code example that runs Named Entity Recognizer

import yaml

from forte.pipeline import Pipeline
from forte.data.readers import CoNLL03Reader
from forte.processors import CoNLLNERPredictor
from ft.onto.base_ontology import Token, Sentence
from forte.common.configuration import Config


config_data = yaml.safe_load(open("config_data.yml", "r"))
config_model = yaml.safe_load(open("config_model.yml", "r"))

config = Config({}, default_hparams=None)
config.add_hparam('config_data', config_data)
config.add_hparam('config_model', config_model)


pl = Pipeline()
pl.set_reader(CoNLL03Reader())
pl.add(CoNLLNERPredictor(), config=config)

pl.initialize()

for pack in pl.process_dataset(config.config_data.test_path):
    for pred_sentence in pack.get_data(context_type=Sentence, request={Token: {"fields": ["ner"]}}):
        print("============================")
        print(pred_sentence["context"])
        print("The entities are...")
        print(pred_sentence["Token"]["ner"])
        print("============================")

Find more examples here.

Download and Installation

To install the released version from PyPI:

pip install forte

To install from source,

git clone https://github.com/asyml/forte.git
cd forte
pip install .

Getting Started

Trouble Shooting

  1. If you try to run generate_ontology script but encounter the following
    Traceback (most recent call last):
      File "~/anaconda3/bin/generate_ontology", line 33, in <module>
        sys.exit(load_entry_point('forte', 'console_scripts', 'generate_ontology')())
      File "~/anaconda3/bin/generate_ontology", line 22, in importlib_load_entry_point
        for entry_point in distribution(dist_name).entry_points
      File "~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 418, in distribution
        return Distribution.from_name(package)
      File "~/anaconda3/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 184, in from_name
        raise PackageNotFoundError(name)
    importlib_metadata.PackageNotFoundError: forte
    
    This is likely to be caused by multiple conflicting installation, such as installing both from source or from PIP. One way to solve this is to manually remove the script ~/anaconda3/bin/generate_ontology and re-install the package.

Contributing

If you are interested in making enhancement to Forte, please first go over our Code of Conduct and Contribution Guideline

License

Apache License 2.0

Companies and Universities Supporting Forte

                  

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].