All Projects → Nlp_chinese_corpus → Similar Projects or Alternatives

2239 Open source projects that are alternatives of or similar to Nlp_chinese_corpus

Eda nlp for chinese

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

Stars: ✭ 660 (-90.08%)

Mutual labels: chinese, text-classification

Chinese Xinhua

📙 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Stars: ✭ 8,705 (+30.78%)

Mutual labels: chinese, chinese-nlp

Reuters Full Data Set

Full dataset of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016). Generate your own up to today!

Stars: ✭ 159 (-97.61%)

Mutual labels: news, dataset

Zhopenie

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Stars: ✭ 98 (-98.53%)

Mutual labels: chinese, chinese-nlp

Ngram2vec

Four word embedding models implemented in Python. Supporting arbitrary context features

Stars: ✭ 703 (-89.44%)

Mutual labels: chinese, word2vec

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

Stars: ✭ 93 (-98.6%)

Mutual labels: news, word2vec

Weibo terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

Stars: ✭ 2,295 (-65.52%)

Mutual labels: chinese, corpus

Char Rnn Chinese

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.

Stars: ✭ 192 (-97.12%)

Mutual labels: chinese, language-model

Cnn Text Classification Tf Chinese

CNN for Chinese Text Classification in Tensorflow

Stars: ✭ 237 (-96.44%)

Mutual labels: chinese, text-classification

FinBERT-QA

Financial Domain Question Answering with pre-trained BERT Language Model

Stars: ✭ 70 (-98.95%)

Mutual labels: question-answering, bert

DrFAQ

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Stars: ✭ 29 (-99.56%)

Mutual labels: question-answering, bert

Simpletransformers

Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI

Stars: ✭ 2,881 (-56.72%)

Mutual labels: question-answering, text-classification

sqlmap-wiki-zhcn

可能是最完整的 sqlmap 中文文档。

Stars: ✭ 51 (-99.23%)

Mutual labels: wiki, chinese

trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Stars: ✭ 711 (-89.32%)

Mutual labels: news, corpus

Fakenewscorpus

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-96.17%)

Mutual labels: dataset, corpus

Chinese Text Classification

Chinese-Text-Classification，Tensorflow CNN（卷积神经网络）实现的中文文本分类。QQ群：522785813，微信群二维码：http://www.tensorflownews.com/

Stars: ✭ 284 (-95.73%)

Mutual labels: chinese, text-classification

word2vec-movies

Bag of words meets bags of popcorn in Python 3 中文教程

Stars: ✭ 54 (-99.19%)

Mutual labels: word2vec, chinese

wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Stars: ✭ 39 (-99.41%)

Mutual labels: language-model, bert

KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

Stars: ✭ 58 (-99.13%)

Mutual labels: question-answering, bert

sarcasm-detection-for-sentiment-analysis

Sarcasm Detection for Sentiment Analysis

Stars: ✭ 21 (-99.68%)

Mutual labels: text-classification, word2vec

embedding study

中文预训练模型生成字向量学习，测试BERT，ELMO的中文效果

Stars: ✭ 94 (-98.59%)

Mutual labels: chinese, bert

classifier multi label

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification

Stars: ✭ 127 (-98.09%)

Mutual labels: text-classification, bert

TV4Dialog

No description or website provided.

Stars: ✭ 33 (-99.5%)

Mutual labels: corpus, chinese

trove

Weakly supervised medical named entity classification

Stars: ✭ 55 (-99.17%)

Mutual labels: text-classification, bert

classifier multi label seq2seq attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Stars: ✭ 26 (-99.61%)

Mutual labels: text-classification, bert

bert-movie-reviews-sentiment-classifier

Build a Movie Reviews Sentiment Classifier with Google's BERT Language Model

Stars: ✭ 12 (-99.82%)

Mutual labels: language-model, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-99.64%)

Mutual labels: text-classification, bert

Text Cnn

嵌入Word2vec词向量的CNN中文文本分类

Stars: ✭ 298 (-95.52%)

Mutual labels: text-classification, word2vec

Nlu sim

all kinds of baseline models for sentence similarity 句子对语义相似度模型

Stars: ✭ 286 (-95.7%)

Mutual labels: question-answering, word2vec

Albert zh

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型

Stars: ✭ 3,500 (-47.42%)

Mutual labels: bert, chinese-corpus

Cluener2020

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

Stars: ✭ 689 (-89.65%)

Mutual labels: chinese, dataset

SQUAD2.Q-Augmented-Dataset

Augmented version of SQUAD 2.0 for Questions

Stars: ✭ 31 (-99.53%)

Mutual labels: question-answering, bert

CLUEmotionAnalysis2020

CLUE Emotion Analysis Dataset 细粒度情感分析数据集

Stars: ✭ 3 (-99.95%)

Mutual labels: corpus, chinese

bert tokenization for java

This is a java version of Chinese tokenization descried in BERT.

Stars: ✭ 39 (-99.41%)

Mutual labels: chinese-nlp, bert

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (-99.1%)

Mutual labels: text-classification, bert

TorchBlocks

A PyTorch-based toolkit for natural language processing

Stars: ✭ 85 (-98.72%)

Mutual labels: text-classification, bert

textgo

Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!

Stars: ✭ 33 (-99.5%)

Mutual labels: text-classification, bert

Giveme5W

Extraction of the five journalistic W-questions (5W) from news articles

Stars: ✭ 16 (-99.76%)

Mutual labels: news, question-answering

feedIO

A Feed Aggregator that Knows What You Want to Read.

Stars: ✭ 26 (-99.61%)

Mutual labels: news, text-classification

cdQA-ui

⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.

Stars: ✭ 19 (-99.71%)

Mutual labels: question-answering, bert

Zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese language

Stars: ✭ 418 (-93.72%)

Mutual labels: chinese, chinese-nlp

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (-99.5%)

Mutual labels: text-classification, bert

Species-Names-Corpus

物种名称语料库。植物名,动物名。

Stars: ✭ 23 (-99.65%)

Mutual labels: corpus, dataset

iamQA

中文wiki百科QA阅读理解问答系统，使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型，还有W2V词向量搜索,使用torchserve部署

Stars: ✭ 46 (-99.31%)

Mutual labels: question-answering, bert

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (-97.73%)

Mutual labels: text-classification, bert

AskNowNQS

A question answering system for RDF knowledge graphs.

Stars: ✭ 32 (-99.52%)

Mutual labels: word2vec, question-answering

Medical-Names-Corpus

医疗语料库。医疗机构名语料库。药品本位码。

Stars: ✭ 26 (-99.61%)

Mutual labels: corpus, dataset

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (-95.76%)

Mutual labels: text-classification, language-model

FewCLUE

FewCLUE 小样本学习测评基准，中文版

Stars: ✭ 251 (-96.23%)

Mutual labels: chinese, bert

Tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Stars: ✭ 5,077 (-23.72%)

Mutual labels: language-model, bert

Cluecorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

Stars: ✭ 278 (-95.82%)

Mutual labels: chinese, corpus

Giveme5w1h

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?

Stars: ✭ 316 (-95.25%)

Mutual labels: news, question-answering

Chinese-Word-Segmentation-in-NLP

State of the art Chinese Word Segmentation with Bi-LSTMs

Stars: ✭ 23 (-99.65%)

Mutual labels: chinese, language-model

Nlp Projects

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

Stars: ✭ 360 (-94.59%)

Mutual labels: text-classification, word2vec

Bert Pytorch

Google AI 2018 BERT pytorch implementation

Stars: ✭ 4,642 (-30.26%)

Mutual labels: language-model, bert

Small Chinese Corpus

Some useful Chinese corpus datasets 中文语料小数据

Stars: ✭ 462 (-93.06%)

Mutual labels: corpus, chinese-nlp

Dynamic Memory Networks Plus Pytorch

Implementation of Dynamic memory networks plus in Pytorch