Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-20.75%)
AutocrawlerGoogle, Naver multiprocess image web crawler (Selenium)
Stars: ✭ 957 (+802.83%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+245.28%)
EagleReal time data processing system based on flink and CEP
Stars: ✭ 95 (-10.38%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (+241.51%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-80.19%)
DatawaveDataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Stars: ✭ 347 (+227.36%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-24.53%)
DatafakerDatafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+208.49%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-18.87%)
Awesome Flink😎 A curated list of amazingly awesome Flink and Flink ecosystem resources
Stars: ✭ 530 (+400%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+162.26%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+776.42%)
ArvadosAn open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+158.49%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1027.36%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (+146.23%)
MnemonicApache Mnemonic - A non-volatile hybrid memory storage oriented library
Stars: ✭ 91 (-14.15%)
DetEditA graphical user interface for annotating and editing events detected in long-term acoustic monitoring data
Stars: ✭ 20 (-81.13%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-34.91%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+607.55%)
proteicStreaming and static data visualization for the modern web.
Stars: ✭ 37 (-65.09%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-0.94%)
mriyaReal-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.
Stars: ✭ 65 (-38.68%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+602.83%)
np-flinkflink详细学习实践
Stars: ✭ 26 (-75.47%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-63.21%)
Ignite Book Code SamplesAll code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Stars: ✭ 86 (-18.87%)
centurionKotlin Bigdata Toolkit
Stars: ✭ 320 (+201.89%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-81.13%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-36.79%)
LarkMidTableLarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (+723.58%)
2018-flink-forward-chinaFlink Forward China 2018 第一届记录,视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming
Stars: ✭ 25 (-76.42%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1162.26%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+204.72%)
CdsData syncing in golang for ClickHouse.
Stars: ✭ 501 (+372.64%)
fb scraperFBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (-42.45%)
room-renting用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-84.91%)
YauaaYet Another UserAgent Analyzer
Stars: ✭ 472 (+345.28%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-80.19%)
FlinkTutorialFlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
Stars: ✭ 46 (-56.6%)
MlsqlThe Programming Language Designed For Big Data and AI
Stars: ✭ 1,262 (+1090.57%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-48.11%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+342.45%)