Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → statsmaths → Cleannlp

statsmaths / Cleannlp

Licence: lgpl-2.1

R package providing annotators and a normalized data model for natural language processing

Programming Languages

7636 projects

Labels

natural-language-processing r-package spacy

Projects that are alternatives of or similar to Cleannlp

Tageditor

🏖TagEditor - Annotation tool for spaCy

Stars: ✭ 92 (-47.13%)

Mutual labels: natural-language-processing, spacy

Spacy Dev Resources

💫 Scripts, tools and resources for developing spaCy

Stars: ✭ 123 (-29.31%)

Mutual labels: natural-language-processing, spacy

Jupyterlab Prodigy

🧬 A JupyterLab extension for annotating data with Prodigy

Stars: ✭ 97 (-44.25%)

Mutual labels: natural-language-processing, spacy

Sense2vec

🦆 Contextually-keyed word vectors

Stars: ✭ 1,184 (+580.46%)

Mutual labels: natural-language-processing, spacy

Googlelanguager

R client for the Google Translation API, Google Cloud Natural Language API and Google Cloud Speech API

Stars: ✭ 145 (-16.67%)

Mutual labels: natural-language-processing, r-package

Python nlp tutorial

This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)

Stars: ✭ 72 (-58.62%)

Mutual labels: natural-language-processing, spacy

Spacy Js

🎀 JavaScript API for spaCy with Python REST API

Stars: ✭ 123 (-29.31%)

Mutual labels: natural-language-processing, spacy

Spacy Transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Stars: ✭ 919 (+428.16%)

Mutual labels: natural-language-processing, spacy

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+973.56%)

Mutual labels: natural-language-processing, spacy

Rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Stars: ✭ 13,219 (+7497.13%)

Mutual labels: natural-language-processing, spacy

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+550.57%)

Mutual labels: natural-language-processing, spacy

Spacymoji

💙 Emoji handling and meta data for spaCy with custom extension attributes

Stars: ✭ 151 (-13.22%)

Mutual labels: natural-language-processing, spacy

Spacy Lookups Data

📂 Additional lookup tables and data resources for spaCy

Stars: ✭ 48 (-72.41%)

Mutual labels: natural-language-processing, spacy

Spacy Graphql

🤹‍♀️ Query spaCy's linguistic annotations using GraphQL

Stars: ✭ 81 (-53.45%)

Mutual labels: natural-language-processing, spacy

Rdrpostagger

R package for Ripple Down Rules-based Part-Of-Speech Tagging (RDRPOS). On more than 45 languages.

Stars: ✭ 31 (-82.18%)

Mutual labels: natural-language-processing, r-package

Pytextrank

Python implementation of TextRank for phrase extraction and summarization of text documents

Stars: ✭ 1,675 (+862.64%)

Mutual labels: natural-language-processing, spacy

Spacy Stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy

Stars: ✭ 508 (+191.95%)

Mutual labels: natural-language-processing, spacy

Spacy Models

💫 Models for the spaCy Natural Language Processing (NLP) library

Stars: ✭ 796 (+357.47%)

Mutual labels: natural-language-processing, spacy

Textacy

NLP, before and after spaCy

Stars: ✭ 1,849 (+962.64%)

Mutual labels: natural-language-processing, spacy

Spacy Course

👩‍🏫 Advanced NLP with spaCy: A free online course

Stars: ✭ 1,920 (+1003.45%)

Mutual labels: natural-language-processing, spacy

View All Similar Projects ➔

cleanNLP: A Tidy Data Model for Natural Language Processing

Author: Taylor B. Arnold
License: LGPL-2

Overview

The cleanNLP package is designed to make it as painless as possible to turn raw text into feature-rich data frames. A minimal working example of using cleanNLP consists of loading the package, setting up the NLP backend, initializing the backend, and running the function cnlp_annotate. The output is given as a list of data frame objects (classed as an "cnlp_annotation"). Here is an example using the udpipe backend:

library(cleanNLP)
cnlp_init_udpipe()

annotation <- cnlp_annotate(input = c(
        "Here is the first text. It is short.",
        "Here's the second. It is short too!",
        "The third text is the shortest."
))
lapply(annotation, head)

$token
  doc_id sid tid token token_with_ws lemma  upos xpos
1      1   1   1  Here         Here   here   ADV   RB
2      1   1   2    is           is     be   AUX  VBZ
3      1   1   3   the          the    the   DET   DT
4      1   1   4 first        first  first   ADJ   JJ
5      1   1   5  text          text  text  NOUN   NN
6      1   1   6     .            .      . PUNCT    .
                                                  feats tid_source relation
1                                          PronType=Dem          0     root
2 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin          1      cop
3                             Definite=Def|PronType=Art          5      det
4                                Degree=Pos|NumType=Ord          5     amod
5                                           Number=Sing          1    nsubj
6                                                  <NA>          1    punct

$document
  doc_id
1      1
2      2
3      3

The token output table breaks the text into tokens, provides lemmatized forms of the words, part of speech tags, and dependency relationships. Two short case-studies are linked to from the repository to show sample usage of the library:

Please see the notes below, and the official package documentation on CRAN, for more options to control the way that text is parsed.

Installation

You can download the package from within R directly from CRAN:

install.packages("cleanNLP")

After installation, you should be able to use the udpipe backend (as used the minimal example and case-studies above; model files will be installed automatically) or the stringi backend without any additional setup. For most users, we find that these out-of-the-box solutions are a good starting point. In order to use the two Python backends, you must install the associated cleannlp python module. We recommend and support the Python 3.7 version of Anaconda Python. After obtaining Python, install the module by running pip in a terminal:

pip install cleannlp

Once installed, running the respective backend initialization functions will provide further instructions for download the required models.

API Overview

V3

There have been numerous changes to the package in the newly released version 3.0.0. These changes, while requiring some changes to existing code, have been carefully designed to make the package easier to both install and use. The three most important changes include:

The object returned by cnlp_annotate is now a named list. Users can access its elements with the dollar sign operator. Functions such as cnlp_get_token and cnlp_get_dependency are no longer needed or included.
The dependencies are now attached to the tokens table to make them easier to use.
The CoreNLP backend now uses the Python backend. It is simplier to install than the Java backend but does is still missing some features.

If you are running into any issues with the package, first make sure you are using updated materials (mostly available from links within this repository).

Backends

The cleanNLP package is designed to allow users to make use of various NLP annotation algorithms without having to worry (too much) about the output format, which is standardizes at best as possible. There are four backends currently available, each with their own pros and cons. They are:

stringi: a fast parser that only requires the stringi package, but produces only tokenized text
udpipe: a parser with no external dependencies that produces tokens, lemmas, part of speech tags, and dependency relationships. The recommended starting point given its balance between ease of use and functionality. It also supports the widest range of natural languages.
spacy: based on the Python library, a more feature complete parser that included named entity recognition and word embeddings. It does require a working Python installation and some other set-up. Recommended for users who are familiar with Python or plan to make heavy use of the package.
corenlp: another Python library (formally Java) that is an official port of the Java library of the same name.

The second two backends (spacy and corenlp) require some additional setup, namely installing Python and the associated Python library, as documented above. To select the desired backend, simply initialize the model prior to running the annotation.

cnlp_init_stringi(locale="en_GB")
cnlp_init_udpipe(model_name="english")
cnlp_init_spacy(model_name="en")
cnlp_init_corenlp(lang="en")

The code above explicitly sets the default/English model. You can use a different model/language when starting the model. For udpipe the models will be downloaded automatically. For spacy and coreNLP the following helper functions are available:

cnlp_download_spacy(model_name="en")
cnlp_download_corenlp(lang="en")

Simply change the model name or language code to download alternative models.

Citation

If you make use of the toolkit in your work, please cite the following paper.

@article{,
  title   = "A Tidy Data Model for Natural Language Processing Using cleanNLP",
  author  = "Arnold, Taylor B",
  journal = "R Journal",
  volume  = "9",
  number  = "2",
  year    = "2017"
}

Please, however, note that the library has evolved since the paper was published. For specific help with the package's API please check the updated documents linked to from this site.

Note

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 174

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗