KashgariKashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+3963.64%)
Delfta Deep Learning Framework for Text
Stars: ✭ 289 (+425.45%)
textgoText preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
Stars: ✭ 33 (-40%)
Kevinpro-NLP-demoAll NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (+112.73%)
MetaCatMinimally Supervised Categorization of Text with Metadata (SIGIR'20)
Stars: ✭ 52 (-5.45%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-60%)
kwxBERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-40%)
MacadamMacadam是一个以Tensorflow(Keras)和bert4keras为基础,专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。
Stars: ✭ 149 (+170.91%)
ASTRASelf-training with Weak Supervision (NAACL 2021)
Stars: ✭ 127 (+130.91%)
WeFEND-AAAI20Dataset for paper "Weak Supervision for Fake News Detection via Reinforcement Learning" published in AAAI'2020.
Stars: ✭ 67 (+21.82%)
ganbert-pytorchEnhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (+9.09%)
CleanlabThe standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.
Stars: ✭ 2,526 (+4492.73%)
weaselWeakly Supervised End-to-End Learning (NeurIPS 2021)
Stars: ✭ 117 (+112.73%)
text2classMulti-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-72.73%)
BertweetBERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Stars: ✭ 282 (+412.73%)
TorchBlocksA PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+54.55%)
Nlp Experiments In PytorchPyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-36.36%)
Dan Jurafsky Chris Manning NlpMy solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+125.45%)
anonymisationAnonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (+54.55%)
Nlp pytorch projectEmbedding, NMT, Text_Classification, Text_Generation, NER etc.
Stars: ✭ 153 (+178.18%)
concept-based-xaiLibrary implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (-25.45%)
WeSHClass[AAAI 2019] Weakly-Supervised Hierarchical Text Classification
Stars: ✭ 83 (+50.91%)
Learning-From-RulesImplementation of experiments in paper "Learning from Rules Generalizing Labeled Exemplars" to appear in ICLR2020 (https://openreview.net/forum?id=SkeuexBtDr)
Stars: ✭ 46 (-16.36%)
knodleA PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
Stars: ✭ 76 (+38.18%)
Bert Bilstm Crf NerTensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
Stars: ✭ 3,838 (+6878.18%)
classifier multi labelmulti-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification
Stars: ✭ 127 (+130.91%)
keras-bert-nerKeras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF model with Pretrained Language Model: supporting BERT/RoBERTa/ALBERT
Stars: ✭ 7 (-87.27%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+174.55%)
WSDM-Cup-2019[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.
Stars: ✭ 62 (+12.73%)
NSP-BERTThe code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Stars: ✭ 166 (+201.82%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-60%)
HiGitClassHiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories (ICDM'19)
Stars: ✭ 58 (+5.45%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+4478.18%)
WeSTClass[CIKM 2018] Weakly-Supervised Neural Text Classification
Stars: ✭ 67 (+21.82%)
Bert seq2seqpytorch实现bert做seq2seq任务,使用unilm方案,现在也可以做自动摘要,文本分类,情感分析,NER,词性标注等任务,支持GPT2进行文章续写。
Stars: ✭ 298 (+441.82%)
parsbert-ner🤗 ParsBERT Persian NER Tasks
Stars: ✭ 15 (-72.73%)
Chatbot cn基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+1338.18%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+12001.82%)
Marktool这是一款基于web的通用文本标注工具,支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注,同时也支持文本的迭代标注和实体的嵌套标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模,并设计了全新高效的标注方式,提升了用户体验和标注效率。此外,本工具增加了审核环节,可对多人的标注结果进行一致性检验和调整,提高了标注语料的准确率和可靠性。
Stars: ✭ 190 (+245.45%)
backpropBackprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+316.36%)
wrenchWRENCH: Weak supeRvision bENCHmark
Stars: ✭ 185 (+236.36%)
Spacy Streamlit👑 spaCy building blocks and visualizers for Streamlit apps
Stars: ✭ 360 (+554.55%)
Snips NluSnips Python library to extract meaning from text
Stars: ✭ 3,583 (+6414.55%)
neuro-comma🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (-16.36%)
WSDECWeakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.
Stars: ✭ 95 (+72.73%)