Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+9688.24%)
Lightnlp基于Pytorch和torchtext的自然语言处理深度学习框架。
Stars: ✭ 739 (+986.76%)
BertweetBERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Stars: ✭ 282 (+314.71%)
Bert language understandingPre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Stars: ✭ 933 (+1272.06%)
ShallowlearnAn experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+188.24%)
ulm-basenetImplementation of ULMFit algorithm for text classification via transfer learning
Stars: ✭ 94 (+38.24%)
backpropBackprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+236.76%)
Product-Categorization-NLPMulti-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-55.88%)
Text Cnn嵌入Word2vec词向量的CNN中文文本分类
Stars: ✭ 298 (+338.24%)
Ml ProjectsML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (+86.76%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+1061.76%)
Few Shot Text ClassificationFew-shot binary text classification with Induction Networks and Word2Vec weights initialization
Stars: ✭ 32 (-52.94%)
Spacy💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+32220.59%)
Lotclass[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
Stars: ✭ 160 (+135.29%)
Nlp learning结合python一起学习自然语言处理 (nlp): 语言模型、HMM、PCFG、Word2vec、完形填空式阅读理解任务、朴素贝叶斯分类器、TFIDF、PCA、SVD
Stars: ✭ 188 (+176.47%)
FNet-pytorchUnofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+200%)
Nlp Projectsword2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding
Stars: ✭ 360 (+429.41%)
CukatifyCukatify is a music social media project
Stars: ✭ 21 (-69.12%)
Icdar 2019 SroieICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction
Stars: ✭ 202 (+197.06%)
PLBARTOfficial code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
Stars: ✭ 151 (+122.06%)
Nlp classificationImplementing nlp papers relevant to classification with PyTorch, gluonnlp
Stars: ✭ 202 (+197.06%)
JfasttextJava interface for fastText
Stars: ✭ 193 (+183.82%)
Ai lawall kinds of baseline models for long text classificaiton( text categorization)
Stars: ✭ 243 (+257.35%)
Marktool这是一款基于web的通用文本标注工具,支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注,同时也支持文本的迭代标注和实体的嵌套标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模,并设计了全新高效的标注方式,提升了用户体验和标注效率。此外,本工具增加了审核环节,可对多人的标注结果进行一致性检验和调整,提高了标注语料的准确率和可靠性。
Stars: ✭ 190 (+179.41%)
Pyss3A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (+180.88%)
Text ClassificationMachine Learning and NLP: Text Classification using python, scikit-learn and NLTK
Stars: ✭ 239 (+251.47%)
SimpletransformersTransformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
Stars: ✭ 2,881 (+4136.76%)
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (+180.88%)
theedhum-nandrumA sentiment classifier on mixed language (and mixed script) reviews in Tamil, Malayalam and English
Stars: ✭ 16 (-76.47%)
KashgariKashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+3186.76%)
FastnlpfastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+3489.71%)
tkseemArabic Tokenization Library. It provides many tokenization algorithms.
Stars: ✭ 45 (-33.82%)
TextvecText vectorization tool to outperform TFIDF for classification tasks
Stars: ✭ 167 (+145.59%)
Fancy NlpNLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Stars: ✭ 233 (+242.65%)
TextanalyzerA text analyzer which is based on machine learning,statistics and dictionaries that can analyze text. So far, it supports hot word extracting, text classification, part of speech tagging, named entity recognition, chinese word segment, extracting address, synonym, text clustering, word2vec model, edit distance, chinese word segment, sentence similarity,word sentiment tendency, name recognition, idiom recognition, placename recognition, organization recognition, traditional chinese recognition, pinyin transform.
Stars: ✭ 162 (+138.24%)
clustextEasy, fast clustering of texts
Stars: ✭ 18 (-73.53%)
Pytorch Transformers ClassificationBased on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (+236.76%)