Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.

Stars: ✭ 16 (+0%)

Mutual labels: hadoop

javaer-mind

Java 程序员进阶学习的思维导图

Stars: ✭ 66 (+312.5%)

Mutual labels: big-data

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (+168.75%)

Mutual labels: big-data

PaperWeeklyAI

📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.

Stars: ✭ 50 (+212.5%)

Mutual labels: data-mining

estratto

parsing fixed width files content made easy

Stars: ✭ 12 (-25%)

Mutual labels: text-mining

hierarchical-clustering

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Stars: ✭ 62 (+287.5%)

Mutual labels: data-mining

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+1318.75%)

Mutual labels: big-data

beanszoo

Distributed Java micro-services using ZooKeeper

Stars: ✭ 12 (-25%)

Mutual labels: hadoop

orion

Management and automation platform for Stateful Distributed Systems

Stars: ✭ 77 (+381.25%)

Mutual labels: hadoop

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+131.25%)

Mutual labels: hadoop

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (+125%)

Mutual labels: big-data

Answerable

Recommendation system for Stack Overflow unanswered questions

Stars: ✭ 13 (-18.75%)

Mutual labels: text-mining

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+393.75%)

Mutual labels: hadoop

openPDC

Open Source Phasor Data Concentrator

Stars: ✭ 109 (+581.25%)

Mutual labels: hadoop

Data-Mining-on-Social-Media

Python scripts to extract tweets and facebook posts from public users.

Stars: ✭ 99 (+518.75%)

Mutual labels: data-mining

hadoop-ansible

Install hadoop cluster with ansible

Stars: ✭ 35 (+118.75%)

Mutual labels: hadoop

leetspeek

Open and collaborative content from leet hackers!

Stars: ✭ 11 (-31.25%)

Mutual labels: big-data

Asclepius

Open Price Comparison for US Hospitals

Stars: ✭ 20 (+25%)

Mutual labels: data-mining

accumulo-testing

Apache Accumulo Testing

Stars: ✭ 14 (-12.5%)

Mutual labels: big-data

sciblox

sciblox - Easier Data Science and Machine Learning

Stars: ✭ 48 (+200%)

Mutual labels: data-mining

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (+31.25%)

Mutual labels: hadoop

woolly

The Text Mining Elixir

Stars: ✭ 48 (+200%)

Mutual labels: text-mining

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (+143.75%)

Mutual labels: big-data

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (+275%)

Mutual labels: big-data

imbalanced-ensemble

Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库

Stars: ✭ 199 (+1143.75%)

Mutual labels: data-mining

TextClassification

基于scikit-learn实现对新浪新闻的文本分类，数据集为100w篇文档，总计10类，测试集与训练集1:1划分。分类算法采用SVM和Bayes，其中Bayes作为baseline。

Stars: ✭ 86 (+437.5%)

Mutual labels: data-mining

sugarcube

Monoidal data processes.

Stars: ✭ 32 (+100%)

Mutual labels: data-mining

readability

Fast readability scores for text data

Stars: ✭ 22 (+37.5%)

Mutual labels: text-mining

scikit-cycling

Tools to analyze cycling data

Stars: ✭ 25 (+56.25%)

Mutual labels: data-mining

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (+237.5%)

Mutual labels: big-data

Semantic-Bus

object flow treatment, data transformation

Stars: ✭ 49 (+206.25%)

Mutual labels: data-mining

RecommendationEngine

Source code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"

Stars: ✭ 43 (+168.75%)

Mutual labels: hadoop

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+450%)

Mutual labels: hadoop

bagri

XML/Document DB on top of distributed cache

Stars: ✭ 40 (+150%)

Mutual labels: big-data

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (+62.5%)

Mutual labels: big-data

KaliIntelligenceSuite

Kali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.

Stars: ✭ 58 (+262.5%)

Mutual labels: data-mining

awesome-Python-data-science-books

Probably the best curated list of data science books in Python

Stars: ✭ 331 (+1968.75%)

Mutual labels: data-mining

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+668.75%)

Mutual labels: hadoop

scikit-hubness

A Python package for hubness analysis and high-dimensional data mining