Text mining resourcesResources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+37.69%)
Pyss3A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (-26.54%)
Artificial Adversary🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+33.85%)
RmdlRMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+44.23%)
TextClassification基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
Stars: ✭ 86 (-66.92%)
algorithmsbasic algorithms and solutions
Stars: ✭ 22 (-91.54%)
anomalyDetectionAn R package for implementing augmented network log anomaly detection procedures
Stars: ✭ 21 (-91.92%)
text2classMulti-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-94.23%)
node-fasttextNodejs binding for fasttext representation and classification.
Stars: ✭ 39 (-85%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-91.54%)
DaDengAndHisPython【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱
[email protected] Stars: ✭ 59 (-77.31%)
kwxBERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-87.31%)
text-classification-svmThe missing SVM-based text classification module implementing HanLP's interface
Stars: ✭ 46 (-82.31%)
HiLAPCode for paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019
Stars: ✭ 116 (-55.38%)
BTM-JavaA java implement of Biterm Topic Model
Stars: ✭ 18 (-93.08%)
DeepClassifierDeepClassifier is aimed at building general text classification model library.It's easy and user-friendly to build any text classification task.
Stars: ✭ 25 (-90.38%)
TextUnderstandingTsetlinMachineUsing the Tsetlin Machine to learn human-interpretable rules for high-accuracy text categorization with medical applications
Stars: ✭ 48 (-81.54%)
monkeylearn-javaOfficial Java client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Java apps.
Stars: ✭ 23 (-91.15%)
spmf-pyPython SPMF Wrapper 🐍 🎁
Stars: ✭ 35 (-86.54%)
NIDS-Intrusion-DetectionSimple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN
Stars: ✭ 45 (-82.69%)
NSP-BERTThe code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"
Stars: ✭ 166 (-36.15%)
diabetes use caseSample use case for Xavier AI in Healthcare conference: https://www.xavierhealth.org/ai-summit-day2/
Stars: ✭ 22 (-91.54%)
TorchBlocksA PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (-67.31%)
augmentyAugmenty is an augmentation library based on spaCy for augmenting texts.
Stars: ✭ 101 (-61.15%)
twitter-analytics-wrapperA simple Python wrapper to download tweets data from the Twitter Analytics platform. Particularly interesting for the impressions metrics that are unavailable on current Twitter API. Also works for the videos data.
Stars: ✭ 44 (-83.08%)
imgur-scraperRetrieve years of imgur.com's data without any authentication.
Stars: ✭ 26 (-90%)
Data-mining-python-scriptIt contain various script on web crawling/ data mining of social web(RSS,facebook,twitter,Linkedin)
Stars: ✭ 24 (-90.77%)
genieclustGenie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-86.92%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-91.54%)
Lbl2VecLbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.
Stars: ✭ 25 (-90.38%)
jdsJenesis Data Store: a dynamic, cross platform, high performance, ORM data-mapper. Designed to assist in rapid development and data mining
Stars: ✭ 17 (-93.46%)
crowdsource-video-experiments-on-androidCrowdsourcing video experiments (such as collaborative benchmarking and optimization of DNN algorithms) using Collective Knowledge Framework across diverse Android devices provided by volunteers. Results are continuously aggregated in the open repository:
Stars: ✭ 29 (-88.85%)
synaptic-simple-trainerA ready to go text classification trainer based on synaptic (https://github.com/cazala/synaptic)
Stars: ✭ 19 (-92.69%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-45.38%)
fake-news-detectionThis repo is a collection of AWESOME things about fake news detection, including papers, code, etc.
Stars: ✭ 34 (-86.92%)
HiGitClassHiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories (ICDM'19)
Stars: ✭ 58 (-77.69%)
evineInteractive CLI Web Crawler
Stars: ✭ 140 (-46.15%)
ebe-datasetEvidence-based Explanation Dataset (AACL-IJCNLP 2020)
Stars: ✭ 16 (-93.85%)
kasthack.ospГенератор сырых дампов пользователей VK.
Stars: ✭ 15 (-94.23%)
NewsMTSCTarget-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (-79.23%)
ML2017FALLMachine Learning (EE 5184) in NTU
Stars: ✭ 66 (-74.62%)
actComputational synthetic biology: Predicting DNA edits for bioengineering
Stars: ✭ 67 (-74.23%)
genieGenie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-91.92%)
CaverCaver: a toolkit for multilabel text classification.
Stars: ✭ 38 (-85.38%)
SHAP FOLD(Explainable AI) - Learning Non-Monotonic Logic Programs From Statistical Models Using High-Utility Itemset Mining
Stars: ✭ 35 (-86.54%)