AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Stars: ✭ 21 (-80.19%)

Mutual labels: bigdata

Datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

Stars: ✭ 347 (+227.36%)

Mutual labels: bigdata

Uproot4

ROOT I/O in pure Python and NumPy.

Stars: ✭ 80 (-24.53%)

Mutual labels: bigdata

Datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Stars: ✭ 327 (+208.49%)

Mutual labels: bigdata

Spark Streaming Monitoring With Lightning

Plot live-stats as graph from ApacheSpark application using Lightning-viz

Stars: ✭ 15 (-85.85%)

Mutual labels: bigdata

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-18.87%)

Mutual labels: bigdata

Flinkforward201704

Flink Forward 2017-04-10 &11 ppt

Stars: ✭ 57 (-46.23%)

Mutual labels: flink

Awesome Flink

😎 A curated list of amazingly awesome Flink and Flink ecosystem resources

Stars: ✭ 530 (+400%)

Mutual labels: flink

df data service

DataFibers Data Service

Stars: ✭ 31 (-70.75%)

Mutual labels: flink

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+162.26%)

Mutual labels: flink

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+776.42%)

Mutual labels: bigdata

Arvados

An open source platform for managing and analyzing biomedical big data

Stars: ✭ 274 (+158.49%)

Mutual labels: bigdata

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+1027.36%)

Mutual labels: flink

Larkmidtableweb

基于flink的分布式数据分析系统

Stars: ✭ 259 (+144.34%)

Mutual labels: flink

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+679.25%)

Mutual labels: flink

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (+146.23%)

Mutual labels: bigdata

Mnemonic

Apache Mnemonic - A non-volatile hybrid memory storage oriented library

Stars: ✭ 91 (-14.15%)

Mutual labels: bigdata

DetEdit

A graphical user interface for annotating and editing events detected in long-term acoustic monitoring data

Stars: ✭ 20 (-81.13%)

Mutual labels: bigdata

flink-tutorials

Flink Tutorial Project

Stars: ✭ 104 (-1.89%)

Mutual labels: flink

Kamu Cli

Next generation tool for decentralized exchange and transformation of semi-structured data

Stars: ✭ 69 (-34.91%)

Mutual labels: flink

flink-parameter-server

Parameter Server implementation in Apache Flink

Stars: ✭ 51 (-51.89%)

Mutual labels: flink

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+607.55%)

Mutual labels: bigdata

proteic

Streaming and static data visualization for the modern web.

Stars: ✭ 37 (-65.09%)

Mutual labels: bigdata

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (-0.94%)

Mutual labels: bigdata

mriya

Real-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.

Stars: ✭ 65 (-38.68%)

Mutual labels: flink

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+602.83%)

Mutual labels: bigdata

np-flink

flink详细学习实践

Stars: ✭ 26 (-75.47%)

Mutual labels: flink

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-33.02%)

Mutual labels: bigdata

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (-63.21%)

Mutual labels: bigdata

Running Elasticsearch Fun Profit

A book about running Elasticsearch

Stars: ✭ 664 (+526.42%)

Mutual labels: bigdata

flink-prometheus-example

Example setup to demonstrate Prometheus integration of Apache Flink

Stars: ✭ 69 (-34.91%)

Mutual labels: flink

Ignite Book Code Samples

All code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.

Stars: ✭ 86 (-18.87%)

Mutual labels: bigdata

centurion

Kotlin Bigdata Toolkit

Stars: ✭ 320 (+201.89%)

Mutual labels: bigdata

Flink Forward China 2018

Flink Forward China 2018 Slides

Stars: ✭ 583 (+450%)

Mutual labels: flink

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-81.13%)

Mutual labels: flink

Flink Shaded

Apache Flink shaded artifacts repository

Stars: ✭ 67 (-36.79%)

Mutual labels: flink

LarkMidTable

LarkMidTable 是一站式开源的数据中台，实现中台的基础建设，数据治理，数据开发，监控告警，数据服务，数据的可视化，实现高效赋能数据前台并提供数据服务的产品。

Stars: ✭ 873 (+723.58%)

Mutual labels: flink

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+422.64%)

Mutual labels: flink

2018-flink-forward-china

Flink Forward China 2018 第一届记录，视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming

Stars: ✭ 25 (-76.42%)

Mutual labels: flink

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1162.26%)

Mutual labels: bigdata

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+204.72%)

Mutual labels: bigdata

Cds

Data syncing in golang for ClickHouse.

Stars: ✭ 501 (+372.64%)

Mutual labels: bigdata

pulsar-user-group-loc-cn

Workspace for China local user group.

Stars: ✭ 19 (-82.08%)

Mutual labels: bigdata

fb scraper

FBLYZE is a Facebook scraping system and analysis system.

Stars: ✭ 61 (-42.45%)

Mutual labels: flink

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

Stars: ✭ 57 (-46.23%)

Mutual labels: flink

Apache Flink Docs Zh Translation

Apache Flink官方文档中文翻译计划

Stars: ✭ 485 (+357.55%)

Mutual labels: flink

datasphere-service

an open source dataworks platform

Stars: ✭ 20 (-81.13%)

Mutual labels: bigdata

room-renting

用Python爬取安居客房源信息，并用高德地图进行可视化

Stars: ✭ 16 (-84.91%)

Mutual labels: bigdata

Yauaa

Yet Another UserAgent Analyzer

Stars: ✭ 472 (+345.28%)

Mutual labels: flink

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-80.19%)

Mutual labels: bigdata

FlinkTutorial

FlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容，使用Java开发，同时含有Scala部分核心代码。欢迎关注我的博客及github。

Stars: ✭ 46 (-56.6%)

Mutual labels: flink

Mlsql

The Programming Language Designed For Big Data and AI

Stars: ✭ 1,262 (+1090.57%)

Mutual labels: bigdata

Pulsar Spark

When Apache Pulsar meets Apache Spark