All Projects → CogComp → Cogcomp Nlp

CogComp / Cogcomp Nlp

Licence: other
CogComp's Natural Language Processing libraries and Demos:

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Cogcomp Nlp

Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-75.85%)
Mutual labels:  big-data, data-mining
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-96.1%)
Mutual labels:  data-mining, big-data
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-75.61%)
Mutual labels:  big-data, data-mining
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (-53.41%)
Mutual labels:  data-mining, natural-language-processing
Textract
extract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+671.95%)
Mutual labels:  data-mining, natural-language-processing
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+10.73%)
Mutual labels:  big-data, natural-language-processing
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-96.1%)
Mutual labels:  data-mining, big-data
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-91.95%)
Mutual labels:  data-mining, natural-language-processing
Link Grammar
The CMU Link Grammar natural language parser
Stars: ✭ 286 (-30.24%)
Mutual labels:  natural-language-processing, natural-language
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-30.98%)
Mutual labels:  big-data, natural-language-processing
Data Science Toolkit
Collection of stats, modeling, and data science tools in Python and R.
Stars: ✭ 169 (-58.78%)
Mutual labels:  data-mining, natural-language-processing
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-12.68%)
Mutual labels:  data-mining, natural-language-processing
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+3012.93%)
Mutual labels:  data-mining, natural-language-processing
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+108.29%)
Mutual labels:  big-data, data-mining
Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (-71.95%)
Mutual labels:  data-mining, natural-language-processing
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-66.59%)
Mutual labels:  big-data, data-mining
Book Socialmediaminingpython
Companion code for the book "Mastering Social Media Mining with Python"
Stars: ✭ 462 (+12.68%)
Mutual labels:  data-mining, natural-language-processing
Biolitmap
Code for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.
Stars: ✭ 18 (-95.61%)
Mutual labels:  data-mining, natural-language-processing
Knowage Server
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (-32.68%)
Mutual labels:  big-data, data-mining
Lingua
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
Stars: ✭ 341 (-16.83%)
Mutual labels:  natural-language-processing, natural-language

CogCompNLP

Build Status Build status

This project collects a number of core libraries for Natural Language Processing (NLP) developed by Cognitive Computation Group.

How to use it?

Depending on what you are after, follow one of the items:

  • If you want to annotate your raw text (i.e. no need to open the annotator boxes to retrain them) you should look into the pipeline.
  • If you want to train and test an NLP annotator (i.e. you want to open an annotator box), see the list of components below and choose the desired one. We recommend using JDK8, as no other versions are officially supported and tested.
  • If you want to read a corpus you should look into the corpus-readers module.
  • If you want to do feature-extraction you should look into edison module.

CogComp's main NLP libraries

Each library contains detailed readme and instructions on how to use it. In addition the javadoc of the whole project is available here.

Module Description
nlp-pipeline Provides an end-to-end NLP processing application that runs a variety of NLP tools on input text.
core-utilities Provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc.
corpusreaders Provides classes to read documents from corpora into core-utilities data structures.
curator Supports use of CogComp NLP Curator, a tool to run NLP applications as services.
edison A library for feature extraction from core-utilities data structures.
lemmatizer An application that uses WordNet and simple rules to find the root forms of words in plain text.
tokenizer An application that identifies sentence and word boundaries in plain text.
transliteration An application that transliterates names between different scripts.
pos An application that identifies the part of speech (e.g. verb + tense, noun + number) of each word in plain text.
ner An application that identifies named entities in plain text according to two different sets of categories.
md An application that identifies entity mentions in plain text.
relation-extraction An application that identifies entity mentions, then identify relation pairs among the mentions detected.
quantifier This tool detects mentions of quantities in the text, as well as normalizes it to a standard form.
inference A suite of unified wrappers to a set optimization libraries, as well as some basic approximate solvers.
depparse An application that identifies the dependency parse tree of a sentence.
verbsense This system addresses the verb sense disambiguation (VSD) problem for English.
prepsrl An application that identifies semantic relations expressed by prepositions and develops statistical learning models for predicting the relations.
commasrl This software extracts relations that commas participate in.
similarity This software compare objects --especially Strings-- and return a score indicating how similar they are.
temporal-normalizer A temporal extractor and normalizer.
dataless-classifier Classifies text into a user-specified label hierarchy from just the textual label descriptions
external-annotators A collection useful external annotators.
  • Questions? Have a look at our FAQs.

Using each library programmatically

To include one of the modules in your Maven project, add the following snippet with the #modulename# and #version entries replaced with the relevant module name and the version listed in this project's pom.xml file. Note that you also add to need the <repository> element for the CogComp maven repository in the <repositories> element.

    <dependencies>
         ...
        <dependency>
            <groupId>edu.illinois.cs.cogcomp</groupId>
            <artifactId>#modulename#</artifactId>
            <version>#version#</version>
        </dependency>
        ...
    </dependencies>
    ...
    <repositories>
        <repository>
            <id>CogCompSoftware</id>
            <name>CogCompSoftware</name>
            <url>http://cogcomp.org/m2repo/</url>
        </repository>
    </repositories>

Citing

If you are using the framework, please cite our paper:

@inproceedings{2018_lrec_cogcompnlp,
    author = {Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth},
    title = {CogCompNLP: Your Swiss Army Knife for NLP},
    booktitle = {11th Language Resources and Evaluation Conference},
    year = {2018},
    url = "http://cogcomp.org/papers/2018_lrec_cogcompnlp.pdf",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].