tf-idf-pythonTerm frequency–inverse document frequency for Chinese novel/documents implemented in python.
Stars: ✭ 98 (-63.43%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+194.78%)
lda2vecMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-89.93%)
blueprints-textJupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
Stars: ✭ 103 (-61.57%)
elpresidente🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'
Stars: ✭ 21 (-92.16%)
lucillaFast, efficient, in-memory Full Text Search for Kotlin
Stars: ✭ 102 (-61.94%)
gofastrMake a DocumentTermMatrix faster
Stars: ✭ 19 (-92.91%)
snorkelingExtracting biomedical relationships from literature with Snorkel 🏊
Stars: ✭ 56 (-79.1%)
DaDengAndHisPython【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱
[email protected] Stars: ✭ 59 (-77.99%)
weibo-summary微博自动摘要系统 Chinese Microblog Automatic Summary System
Stars: ✭ 28 (-89.55%)
sensimSentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-94.4%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-47.01%)
iresearchIResearch is a cross-platform, high-performance document oriented search engine library written entirely in C++ with the focus on a pluggability of different ranking/similarity models
Stars: ✭ 121 (-54.85%)
tg crawlerJust a crawler based on tg-cli for Telegram. Deprecated by now, please use telegram-export.
Stars: ✭ 71 (-73.51%)
ipo-minerIPO Investment via Text Mining.
Stars: ✭ 20 (-92.54%)
HumanOrRobota solution for competition of kaggle `Human or Robot`
Stars: ✭ 16 (-94.03%)
sacred📖 Sacred texts in R
Stars: ✭ 19 (-92.91%)
Guten-gutterStrips boilerplate from Project Gutenberg text files
Stars: ✭ 16 (-94.03%)
NewsSearch主要使用python+Scrapy框架去抓取新闻网站
Stars: ✭ 23 (-91.42%)
fb scraperFBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (-77.24%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-92.16%)
ruimteholR package to Embed All the Things! using StarSpace
Stars: ✭ 95 (-64.55%)
TextDatasetCleaner🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-89.93%)
named-entity-recognitionNotebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
Stars: ✭ 18 (-93.28%)
eventextraction中文复合事件抽取,能识别文本的模式,包括条件事件、顺承事件、反转事件等,可以用于文本逻辑性分析。
Stars: ✭ 17 (-93.66%)
textstemTools for fast text stemming & lemmatization
Stars: ✭ 36 (-86.57%)
Diabetic-Retinopathy-DetectionDIAGNOSIS OF DIABETIC RETINOPATHY FROM FUNDUS IMAGES USING SVM, KNN, and attention-based CNN models with GradCam score for interpretability,
Stars: ✭ 31 (-88.43%)
textdigesterTextDigester: document summarization java library
Stars: ✭ 23 (-91.42%)
AI-ProjectStock predictor using Machine Learning
Stars: ✭ 22 (-91.79%)
datahubDataHub - Synthetic data library
Stars: ✭ 66 (-75.37%)
lorcaNatural Language Processing for Spanish in Node.js. Stemmer, sentiment analysis, readability, tf-idf with batteries, concordance and more!
Stars: ✭ 95 (-64.55%)
Text-AnalysisExplaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-82.09%)
SparseLSHA Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Stars: ✭ 127 (-52.61%)
watchmanWatchman: An open-source social-media event-detection system
Stars: ✭ 18 (-93.28%)
AttentionwalkA PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).
Stars: ✭ 266 (-0.75%)
occupationcoderGiven a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.
Stars: ✭ 30 (-88.81%)
skippaSciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-87.69%)
codeflareSimplifying the definition and execution, scaling and deployment of pipelines on the cloud.
Stars: ✭ 163 (-39.18%)
scikit-learnبه فارسی، برای مشارکت scikit-learn
Stars: ✭ 19 (-92.91%)
sklearn-audio-classificationAn in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (-88.43%)
Kaio-machine-learning-human-face-detectionMachine Learning project a case study focused on the interaction with digital characters, using a character called "Kaio", which, based on the automatic detection of facial expressions and classification of emotions, interacts with humans by classifying emotions and imitating expressions
Stars: ✭ 18 (-93.28%)
MachineLearning机器学习教程,本教程包含基于numpy、sklearn与tensorflow机器学习,也会包含利用spark、flink加快模型训练等用法。本着能够较全的引导读者入门机器学习。
Stars: ✭ 23 (-91.42%)
TwEaterA Python Bot for Scraping Conversations from Twitter
Stars: ✭ 16 (-94.03%)
text2textText2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (-29.85%)