Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-96%)

Mutual labels: data-wrangling, data-preparation

wrangler

Wrangler Transform: A DMD system for transforming Big Data

Stars: ✭ 63 (-95.34%)

Mutual labels: data-transformation, data-cleansing

Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Stars: ✭ 31 (-97.71%)

Mutual labels: data-wrangling, data-cleaning

Data Forge Ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 967 (-28.42%)

Mutual labels: data-wrangling, data-cleaning

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-0.96%)

Mutual labels: bigdata, pyspark

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-97.48%)

Mutual labels: bigdata, pyspark

bamboolib binder template

bamboolib - template for creating your own binder notebook

Stars: ✭ 19 (-98.59%)

Mutual labels: data-transformation, data-exploration

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-96.3%)

Mutual labels: bigdata, pyspark

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

Stars: ✭ 42 (-96.89%)

Mutual labels: bigdata, pyspark

Big Data Study

🐳 big data study

Stars: ✭ 141 (-89.56%)

Mutual labels: bigdata

Node Hbase

Asynchronous HBase client for NodeJs using REST

Stars: ✭ 226 (-83.27%)

Mutual labels: bigdata

Ecommercerecommendsystem

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Stars: ✭ 139 (-89.71%)

Mutual labels: bigdata

Tipdm

TipDM建模平台，开源的数据挖掘工具。

Stars: ✭ 130 (-90.38%)

Mutual labels: bigdata

bigdatatutorial

Stars: ✭ 34 (-97.48%)

Mutual labels: bigdata

Flink Boot

懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系，使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序，懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本（不需要理解分布式计算的理论知识和Flink框架的细节）便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度，该脚手架默认集成Spring框架进行Bean管理，同时将微服务以及WEB开发领域中经常用到的框架集成进来，进一步提升开发速度。比如集成Mybatis ORM框架，Hibernate Validator校验框架,Spring Retry重试框架等，具体见下面的脚手架特性。

Stars: ✭ 209 (-84.53%)

Mutual labels: bigdata

Fpart

Sort files and pack them into partitions

Stars: ✭ 127 (-90.6%)

Mutual labels: bigdata

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-90.67%)

Mutual labels: bigdata

Poli

An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.

Stars: ✭ 1,850 (+36.94%)

Mutual labels: bigdata

Tdengine

An open-source big data platform designed and optimized for the Internet of Things (IoT).

Stars: ✭ 17,434 (+1190.45%)

Mutual labels: bigdata

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-89.64%)

Mutual labels: bigdata

spark-dgraph-connector

A connector for Apache Spark and PySpark to Dgraph databases.

Stars: ✭ 36 (-97.34%)

Mutual labels: pyspark

Twitwork

Monitor twitter stream

Stars: ✭ 133 (-90.16%)

Mutual labels: bigdata

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-84.09%)

Mutual labels: bigdata

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+27.39%)

Mutual labels: bigdata

popmon

Monitor the stability of a Pandas or Spark dataframe ⚙︎

Stars: ✭ 434 (-67.88%)

Mutual labels: data-profiling

Volcano

A Cloud Native Batch System (Project under CNCF)

Stars: ✭ 2,114 (+56.48%)

Mutual labels: bigdata

Shifu

An end-to-end machine learning and data mining framework on Hadoop

Stars: ✭ 207 (-84.68%)

Mutual labels: bigdata

docker-kaggle-ko

머신러닝/딥러닝(PyTorch, TensorFlow) 전용 도커입니다. 한글 폰트, 한글 자연어처리 패키지(konlpy), 형태소 분석기, Timezone 등의 설정 등을 추가 하였습니다.

Stars: ✭ 46 (-96.6%)

Mutual labels: cudf

Liteflow

liteflow是一个基于任务版本来实现的分布式任务流调度系统

Stars: ✭ 112 (-91.71%)

Mutual labels: bigdata

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (-84.97%)

Mutual labels: bigdata

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+14.29%)

Mutual labels: bigdata

Lambda Arch

Applying Lambda Architecture with Spark, Kafka, and Cassandra.

Stars: ✭ 111 (-91.78%)

Mutual labels: bigdata

Awesome Learning

实践源码库：https://github.com/jast90/bigdata 。微信搜索Jast关注公众号，获取最新技术分享😯。

Stars: ✭ 197 (-85.42%)

Mutual labels: bigdata

Books

技术书籍等

Stars: ✭ 110 (-91.86%)

Mutual labels: bigdata

Flinkstreamsql

基于开源的flink，对其实时sql进行扩展；主要实现了流与维表的join，支持原生flink SQL所有的语法

Stars: ✭ 1,682 (+24.5%)

Mutual labels: bigdata

LDWizard

A generic framework for simplifying the creation of linked data.

Stars: ✭ 17 (-98.74%)

Mutual labels: data-transformation

workflUX

An open-source, cloud-ready web application for simplified deployment of big data workflows.

Stars: ✭ 26 (-98.08%)

Mutual labels: bigdata

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (-81.57%)

Mutual labels: bigdata

Kotlin Spark Api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Stars: ✭ 183 (-86.45%)

Mutual labels: bigdata

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-91.93%)

Mutual labels: bigdata

Daudit

🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!

Stars: ✭ 108 (-92.01%)

Mutual labels: bigdata

Flinkx

Based on Apache Flink. support data synchronization/integration and streaming SQL computation.

Stars: ✭ 2,651 (+96.23%)

Mutual labels: bigdata

Awesome Bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

Stars: ✭ 10,478 (+675.57%)

Mutual labels: bigdata

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (-92.08%)

Mutual labels: bigdata

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (-81.87%)

Mutual labels: bigdata

Bigdata practice

大数据分析可视化实践

Stars: ✭ 166 (-87.71%)

Mutual labels: bigdata

Griddb

GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Stars: ✭ 1,587 (+17.47%)

Mutual labels: bigdata

Flink Notes

flink学习笔记

Stars: ✭ 106 (-92.15%)

Mutual labels: bigdata

Java Notes

☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据

Stars: ✭ 160 (-88.16%)

Mutual labels: bigdata

Sparktutorial

Source code for James Lee's Aparch Spark with Java course

Stars: ✭ 105 (-92.23%)

Mutual labels: bigdata

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (-92.23%)

Mutual labels: bigdata

1-60 of 401 similar projects

›

next*5