macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+4267.19%)

Mutual labels: spark

Airflow Tutorial

Airflow basics tutorial

Stars: ✭ 305 (+138.28%)

Mutual labels: airflow

Spark Examples

Spark examples

Stars: ✭ 41 (-67.97%)

Mutual labels: spark

Awesome Ada

A curated list of awesome resources related to the Ada and SPARK programming language

Stars: ✭ 299 (+133.59%)

Mutual labels: spark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (-32.03%)

Mutual labels: spark

Azure Kusto Spark

Apache Spark Connector for Azure Kusto

Stars: ✭ 40 (-68.75%)

Mutual labels: spark

Spark Hbase Connector

Connect Spark to HBase for reading and writing data with ease

Stars: ✭ 299 (+133.59%)

Mutual labels: spark

Whirl

Fast iterative local development and testing of Apache Airflow workflows

Stars: ✭ 111 (-13.28%)

Mutual labels: airflow

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+123.44%)

Mutual labels: hadoop

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+679.69%)

Mutual labels: spark

Spark Druid Olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 282 (+120.31%)

Mutual labels: spark

Laravel Spark Google2fa

Google Authenticator support for Laravel Spark

Stars: ✭ 86 (-32.81%)

Mutual labels: spark

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+3478.91%)

Mutual labels: hadoop

Snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Stars: ✭ 995 (+677.34%)

Mutual labels: spark

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (+117.19%)

Mutual labels: spark

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (-6.25%)

Mutual labels: spark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+112.5%)

Mutual labels: spark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+670.31%)

Mutual labels: spark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+2319.53%)

Mutual labels: spark

Flint

Webex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)

Stars: ✭ 85 (-33.59%)

Mutual labels: spark

Introtohadoopandmr udacity course

🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"

Stars: ✭ 110 (-14.06%)

Mutual labels: hadoop

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (+98.44%)

Mutual labels: spark

Jsr203 Hadoop

A Java NIO file system provider for HDFS

Stars: ✭ 35 (-72.66%)

Mutual labels: hadoop

laravel-spark-camera

Profile Photo Camera support for Laravel Spark

Stars: ✭ 30 (-76.56%)

Mutual labels: spark

Airflow Training

Airflow training for the crunch conf

Stars: ✭ 83 (-35.16%)

Mutual labels: airflow

Book

本项目收藏这些年来看过或者听过的一些不错的书籍，在整理文件时看见这些，发现删掉有点可惜，放着又太浪费空间，本着分享的原则，就把它们共享出来，一方面给需要的读者提供这些书籍，另一方面也是一种像知识库的积累吧

Stars: ✭ 47 (-63.28%)

Mutual labels: spark

Objinsync

Continuously synchronize directories from remote object store to local filesystem

Stars: ✭ 29 (-77.34%)

Mutual labels: airflow

Spark Bigquery Connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Stars: ✭ 126 (-1.56%)

Mutual labels: spark

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+8123.44%)

Mutual labels: spark

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+378.91%)

Mutual labels: hadoop

Akkeeper

An easy way to deploy your Akka services to a distributed environment.

Stars: ✭ 30 (-76.56%)

Mutual labels: hadoop

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-75%)

Mutual labels: spark

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+377.34%)

Mutual labels: spark

pulse

phData Pulse application log aggregation and monitoring

Stars: ✭ 13 (-89.84%)

Mutual labels: hadoop

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+645.31%)

Mutual labels: spark

dbt-on-airflow

No description or website provided.

Stars: ✭ 30 (-76.56%)

Mutual labels: airflow

Java learning practice

java 进阶之路：面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等

Stars: ✭ 110 (-14.06%)

Mutual labels: spark

Spark Ffm

FFM (Field-Awared Factorization Machine) on Spark

Stars: ✭ 101 (-21.09%)

Mutual labels: spark

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-49.22%)

Mutual labels: spark

Incubator Dolphinscheduler

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.

Stars: ✭ 6,916 (+5303.13%)

Mutual labels: airflow

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+4207.03%)

Mutual labels: spark

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Stars: ✭ 65 (-49.22%)

Mutual labels: spark

Mongo Spark

The MongoDB Spark Connector

Stars: ✭ 588 (+359.38%)

Mutual labels: spark

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-10.94%)

Mutual labels: spark

Jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Stars: ✭ 64 (-50%)

Mutual labels: hadoop

Hadoop study

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Stars: ✭ 567 (+342.97%)

Mutual labels: hadoop

301-360 of 666 similar projects