A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (+13.13%)

Mutual labels: natural-language-processing, text-mining

Lazynlp

Library to scrape and clean web pages to create massive datasets.

Stars: ✭ 1,985 (+1140.63%)

Mutual labels: natural-language-processing, text-mining

Articutapi

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Stars: ✭ 252 (+57.5%)

Mutual labels: natural-language-processing, pos-tagging

Jumanpp

Juman++ (a Morphological Analyzer Toolkit)

Stars: ✭ 254 (+58.75%)

Mutual labels: tokenizer, pos-tagging

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+123.75%)

Mutual labels: natural-language-processing, text-mining

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (-76.25%)

Mutual labels: tokenizer, pos-tagging

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+393.75%)

Mutual labels: natural-language-processing, text-mining

Metasra Pipeline

MetaSRA: normalized sample-specific metadata for the Sequence Read Archive

Stars: ✭ 33 (-79.37%)

Mutual labels: natural-language-processing, text-mining

Greynir

The greynir.is natural language processing website for Icelandic

Stars: ✭ 47 (-70.62%)

Mutual labels: tokenizer, natural-language-processing

Gsoc2018 3gm

💫 Automated codification of Greek Legislation with NLP

Stars: ✭ 36 (-77.5%)

Mutual labels: natural-language-processing, text-mining

Chemdataextractor

Automatically extract chemical information from scientific documents

Stars: ✭ 152 (-5%)

Mutual labels: natural-language-processing, text-mining

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (-40%)

Mutual labels: natural-language-processing, pos-tagging

Cogcomp Nlpy

CogComp's light-weight Python NLP annotators

Stars: ✭ 115 (-28.12%)

Mutual labels: natural-language-processing, text-mining

Hanlp

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Stars: ✭ 24,626 (+15291.25%)

Mutual labels: natural-language-processing, pos-tagging

Awesome Nlp

📖 A curated list of resources dedicated to Natural Language Processing (NLP)

Stars: ✭ 12,626 (+7791.25%)

Mutual labels: natural-language-processing, text-mining

Kadot

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (-32.5%)

Mutual labels: tokenizer, natural-language-processing

Py Nltools

A collection of basic python modules for spoken natural language processing

Stars: ✭ 46 (-71.25%)

Mutual labels: tokenizer, natural-language-processing

Cleannlp

R package providing annotators and a normalized data model for natural language processing

Stars: ✭ 174 (+8.75%)

Mutual labels: natural-language-processing, r-package

Vntk

Vietnamese NLP Toolkit for Node

Stars: ✭ 170 (+6.25%)

Mutual labels: natural-language-processing, pos-tagging

Hands On Natural Language Processing With Python

This repository is for my students of Udemy. You can find all lecture codes along with mentioned files for reading in here. So, feel free to clone it and if you have any problem just raise a question.

Stars: ✭ 146 (-8.75%)

Mutual labels: natural-language-processing, text-mining

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Stars: ✭ 101 (-36.87%)

Mutual labels: tokenizer, r-package

crminer

⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'

Stars: ✭ 17 (-89.37%)

Mutual labels: text-mining, r-package

Nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

Stars: ✭ 265 (+65.63%)

Mutual labels: natural-language-processing, text-mining

Text-Classification-LSTMs-PyTorch

The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.

Stars: ✭ 45 (-71.87%)

Mutual labels: text-mining, tokenizer

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (+121.25%)

Mutual labels: natural-language-processing, pos-tagging

Graphbrain

Language, Knowledge, Cognition

Stars: ✭ 294 (+83.75%)

Mutual labels: natural-language-processing, text-mining

Pyshorttextcategorization

Various Algorithms for Short Text Mining

Stars: ✭ 429 (+168.13%)

Mutual labels: natural-language-processing, text-mining

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (-24.37%)

Mutual labels: natural-language-processing, text-mining

Jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch

Stars: ✭ 754 (+371.25%)

Mutual labels: natural-language-processing, pos-tagging

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (+346.88%)

Mutual labels: natural-language-processing, text-mining

Kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+246.25%)

Mutual labels: tokenizer, pos-tagging

Nlp Notebooks

A collection of notebooks for Natural Language Processing from NLP Town

Stars: ✭ 513 (+220.63%)

Mutual labels: natural-language-processing, text-mining

Thot

Thot toolkit for statistical machine translation

Stars: ✭ 53 (-66.87%)

Mutual labels: tokenizer, natural-language-processing

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+976.25%)

Mutual labels: natural-language-processing, text-mining

Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (-17.5%)

Mutual labels: tokenizer, natural-language-processing

Finnlp Progress

NLP progress in Fintech. A repository to track the progress in Natural Language Processing (NLP) related to the domain of Finance, including the datasets, papers, and current state-of-the-art results for the most popular tasks.

Stars: ✭ 148 (-7.5%)

Mutual labels: natural-language-processing

Visdial Rl

PyTorch code for Learning Cooperative Visual Dialog Agents using Deep Reinforcement Learning

Stars: ✭ 157 (-1.87%)

Mutual labels: natural-language-processing

Rentrez

talk with NCBI entrez using R

Stars: ✭ 151 (-5.62%)

Mutual labels: r-package

Qualtrics

Download ⬇️ Qualtrics survey data directly into R!