Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+4837.5%)

Mutual labels: text-mining

Php Ml

PHP-ML - Machine Learning library for PHP

Stars: ✭ 7,900 (+49275%)

Mutual labels: data-mining

readability

Fast readability scores for text data

Stars: ✭ 22 (+37.5%)

Mutual labels: text-mining

Ai For Security Learning

安全场景、基于AI的安全算法和安全数据分析学习资料整理

Stars: ✭ 986 (+6062.5%)

Mutual labels: data-mining

Bigartm

Fast topic modeling platform

Stars: ✭ 563 (+3418.75%)

Mutual labels: text-mining

Mldm

потоковый курс "Машинное обучение и анализ данных (Machine Learning and Data Mining)" на факультете ВМК МГУ имени М.В. Ломоносова

Stars: ✭ 35 (+118.75%)

Mutual labels: data-mining

scikit-cycling

Tools to analyze cycling data

Stars: ✭ 25 (+56.25%)

Mutual labels: data-mining

Invoice2data

Extract structured data from PDF invoices

Stars: ✭ 943 (+5793.75%)

Mutual labels: data-mining

Listed Company News Crawl And Text Analysis

从新浪财经、每经网、金融界、中国证券网、证券时报网上，爬取上市公司（个股）的历史新闻文本数据进行文本分析、提取特征集，然后利用SVM、随机森林等分类器进行训练，最后对实施抓取的新闻数据进行分类预测

Stars: ✭ 494 (+2987.5%)

Mutual labels: text-mining

Clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Stars: ✭ 887 (+5443.75%)

Mutual labels: data-mining

RecommendationEngine

Source code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"

Stars: ✭ 43 (+168.75%)

Mutual labels: hadoop

Data mining

The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms

Stars: ✭ 10 (-37.5%)

Mutual labels: data-mining

Awesome Sentiment Analysis

Repository with all what is necessary for sentiment analysis and related areas

Stars: ✭ 459 (+2768.75%)

Mutual labels: text-mining

SEDTWik-Event-Detection-from-Tweets

Segmentation based event detection from Tweets. Published at NAACL SRW 2019

Stars: ✭ 58 (+262.5%)

Mutual labels: text-mining

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+450%)

Mutual labels: hadoop

Awesome Fraud Detection Papers

A curated list of data mining papers about fraud detection.

Stars: ✭ 843 (+5168.75%)

Mutual labels: data-mining

Open Semantic Search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Stars: ✭ 386 (+2312.5%)

Mutual labels: text-mining

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (+62.5%)

Mutual labels: big-data

dh-core

Functional data science

Stars: ✭ 123 (+668.75%)

Mutual labels: data-mining

mod harbour

Apache mod for Harbour

Stars: ✭ 40 (+150%)

Mutual labels: iis

Texthero

Text preprocessing, representation and visualization from zero to hero.

Stars: ✭ 2,407 (+14943.75%)

Mutual labels: text-mining

odinson

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Stars: ✭ 59 (+268.75%)

Mutual labels: text-mining

Biolitmap

Code for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.

Stars: ✭ 18 (+12.5%)

Mutual labels: data-mining

KaliIntelligenceSuite

Kali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.

Stars: ✭ 58 (+262.5%)

Mutual labels: data-mining

Dataproofer

A proofreader for your data

Stars: ✭ 628 (+3825%)

Mutual labels: data-mining

Graphbrain

Language, Knowledge, Cognition

Stars: ✭ 294 (+1737.5%)

Mutual labels: text-mining

Elki

ELKI Data Mining Toolkit

Stars: ✭ 613 (+3731.25%)

Mutual labels: data-mining

ambari-hdp-docker

Dockerfiles and Docker Compose for HDP 2.6 with Blueprints

Stars: ✭ 23 (+43.75%)

Mutual labels: hadoop

Data Science With Ruby

Practical Data Science with Ruby based tools.

Stars: ✭ 549 (+3331.25%)

Mutual labels: data-mining

Interpretable machine learning with python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

Stars: ✭ 530 (+3212.5%)

Mutual labels: data-mining

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (+106.25%)

Mutual labels: big-data

Twitter-Sentiment-Analyzer

Twitter Sentiment Analyzer

Stars: ✭ 13 (-18.75%)

Mutual labels: text-mining

Nlp profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.

Stars: ✭ 181 (+1031.25%)

Mutual labels: text-mining

Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Stars: ✭ 31 (+93.75%)

Mutual labels: text-mining

deduce

Deduce: de-identification method for Dutch medical text

Stars: ✭ 40 (+150%)

Mutual labels: text-mining

text-mined-synthesis public

Codes for text-mined solid-state reactions dataset