A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.

Stars: ✭ 318 (+639.53%)

Mutual labels: nlp-machine-learning

Lyrics Corpora

An unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts

Stars: ✭ 13 (-69.77%)

Mutual labels: corpus

Dstc8 Schema Guided Dialogue

The Schema-Guided Dialogue Dataset

Stars: ✭ 277 (+544.19%)

Mutual labels: nlp-machine-learning

Nlp base

自然语言基础模型

Stars: ✭ 524 (+1118.6%)

Mutual labels: nlp-machine-learning

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (+493.02%)

Mutual labels: corpus

Letslearnai.github.io

Lets Learn AI

Stars: ✭ 33 (-23.26%)

Mutual labels: nlp-machine-learning

wordfish-python

extract relationships from standardized terms from corpus of interest with deep learning 🐟

Stars: ✭ 19 (-55.81%)

Mutual labels: corpus

Small Chinese Corpus

Some useful Chinese corpus datasets 中文语料小数据

Stars: ✭ 462 (+974.42%)

Mutual labels: corpus

ConveRT-pytorch

ConveRT Paper Pytorch Implementation

Stars: ✭ 49 (+13.95%)

Mutual labels: nlp-machine-learning

Insuranceqa Corpus Zh

🚁 保险行业语料库，聊天机器人

Stars: ✭ 821 (+1809.3%)

Mutual labels: corpus

Chinese Nlp Corpus

Collections of Chinese NLP corpus

Stars: ✭ 438 (+918.6%)

Mutual labels: corpus

fairseq-tagging

a Fairseq fork for sequence tagging/labeling tasks

Stars: ✭ 26 (-39.53%)

Mutual labels: nlp-machine-learning

SpiCE-Corpus

An open-access corpus of conversational bilingual speech in Cantonese and English

Stars: ✭ 33 (-23.26%)

Mutual labels: corpus

Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Stars: ✭ 378 (+779.07%)

Mutual labels: corpus

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+15379.07%)

Mutual labels: corpus

Text mining resources

Resources for learning about Text Mining and Natural Language Processing

Stars: ✭ 358 (+732.56%)

Mutual labels: nlp-machine-learning

Sdtm mapper

AI SDTM mapping (R for ML, Python, TensorFlow for DL)

Stars: ✭ 27 (-37.21%)

Mutual labels: nlp-machine-learning

Lingua

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Stars: ✭ 341 (+693.02%)

Mutual labels: nlp-machine-learning

Deeppavlov

An open source library for deep learning end-to-end dialog systems and chatbots.

Stars: ✭ 5,525 (+12748.84%)

Mutual labels: nlp-machine-learning

Dab

Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ

Stars: ✭ 294 (+583.72%)

Mutual labels: nlp-machine-learning

Talismane

NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser

Stars: ✭ 38 (-11.63%)

Mutual labels: nlp-machine-learning

Cluecorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

Stars: ✭ 278 (+546.51%)

Mutual labels: corpus

Chinese models for spacy

SpaCy 中文模型 | Models for SpaCy that support Chinese

Stars: ✭ 543 (+1162.79%)

Mutual labels: nlp-machine-learning

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (+534.88%)

Mutual labels: nlp-machine-learning

Company Names Corpus

公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。

Stars: ✭ 868 (+1918.6%)

Mutual labels: corpus

Customer satisfaction analysis

基于在线民宿 UGC 数据的意见挖掘项目，包含数据挖掘和NLP 相关的处理，负责数据采集、主题抽取、情感分析等任务。目的是克服用户打分和评论不一致，实时对在线民宿的满意度评测，包含在线评论采集和情感可视化分析。搭建了百度地图POI查询入口，可以进行自动化的批量查询 POI 信息的功能；构建了基于在线民宿语料的 LDA 自动主题聚类模型，利用主题中心词能找出对应的主题属性字典；以用户打分作为标注，然后 litNlp 自带的字符级 TextCNN 进行情感分析，将情感分类概率分布作为情感趋势，最后通过 POI 热力图的方式对不同地域的民宿满意度进行展示。软件版本请见链接。

Stars: ✭ 262 (+509.3%)

Mutual labels: nlp-machine-learning

Cluepretrainedmodels

高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型

Stars: ✭ 493 (+1046.51%)

Mutual labels: corpus

Medical-Names-Corpus

医疗语料库。医疗机构名语料库。药品本位码。

Stars: ✭ 26 (-39.53%)

Mutual labels: corpus

Tika Python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Stars: ✭ 997 (+2218.6%)

Mutual labels: nlp-machine-learning

EdgarAllanPoetry

Computer-generated poetry

Stars: ✭ 22 (-48.84%)

Mutual labels: corpus

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+981.4%)

Mutual labels: corpus

fastmorph

Fast corpus search engine originally made for the Corpus of Written Tatar language

Stars: ✭ 14 (-67.44%)

Mutual labels: corpus

Naive Bayes Classifier

Naive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.

Stars: ✭ 6 (-86.05%)

Mutual labels: corpus

Species-Names-Corpus

物种名称语料库。植物名,动物名。

Stars: ✭ 23 (-46.51%)

Mutual labels: corpus

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+969.77%)

Mutual labels: corpus

DeepSentiPers

Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"

Stars: ✭ 17 (-60.47%)

Mutual labels: corpus

Chatterbot Corpus

A multilingual dialog corpus

Stars: ✭ 964 (+2141.86%)

Mutual labels: corpus

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-48.84%)

Mutual labels: corpus

Bookcorpus

Crawl BookCorpus

Stars: ✭ 443 (+930.23%)

Mutual labels: corpus

fuzzing-corpus

My fuzzing corpus

Stars: ✭ 120 (+179.07%)

Mutual labels: corpus

Rasa Ui

Rasa UI is a frontend for the Rasa Framework

Stars: ✭ 796 (+1751.16%)

Mutual labels: nlp-machine-learning

knowledge-extraction-recipes-forms

Knowledge Extraction For Forms Accelerators & Examples

Stars: ✭ 144 (+234.88%)

Mutual labels: nlp-machine-learning

Corpora

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

Stars: ✭ 4,293 (+9883.72%)

Mutual labels: corpus

Predicting Myers Briggs Type Indicator With Recurrent Neural Networks

Stars: ✭ 43 (+0%)

Mutual labels: nlp-machine-learning

Coursera Natural Language Processing Specialization

Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.

Stars: ✭ 39 (-9.3%)

Mutual labels: nlp-machine-learning

Typing Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Stars: ✭ 32 (-25.58%)

Mutual labels: corpus

Seq2seq Chatbot

Chatbot in 200 lines of code using TensorLayer

Stars: ✭ 777 (+1706.98%)

Mutual labels: corpus

1-60 of 274 similar projects

›