The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

Stars: ✭ 47 (-14.55%)

Mutual labels: spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+1185.45%)

Mutual labels: spark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-78.18%)

Mutual labels: spark

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (+1076.36%)

Mutual labels: dataframe

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1692.73%)

Mutual labels: spark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+1050.91%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-80%)

Mutual labels: spark

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+10063.64%)

Mutual labels: spark

Spark Submit Ui

This is a based on playframwork for submit spark app

Stars: ✭ 53 (-3.64%)

Mutual labels: spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+1458.18%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+9923.64%)

Mutual labels: spark

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-32.73%)

Mutual labels: spark

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (+972.73%)

Mutual labels: dataframe

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-52.73%)

Mutual labels: spark

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+9680%)

Mutual labels: spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-18.18%)

Mutual labels: spark

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+627.27%)

Mutual labels: spark

Heracles

High performance HBase / Spark SQL engine

Stars: ✭ 27 (-50.91%)

Mutual labels: spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+1341.82%)

Mutual labels: spark

Arquero

Query processing and transformation of array-backed data tables.

Stars: ✭ 384 (+598.18%)

Mutual labels: dataframe

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+878.18%)

Mutual labels: spark

Pandas Ta

Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators

Stars: ✭ 962 (+1649.09%)

Mutual labels: dataframe

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+832.73%)

Mutual labels: spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-58.18%)

Mutual labels: spark

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+821.82%)

Mutual labels: spark

Docker Hadoop

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

Stars: ✭ 54 (-1.82%)

Mutual labels: spark

Pointblank

Data validation and organization of metadata for data frames and database tables

Stars: ✭ 480 (+772.73%)

Mutual labels: spark

Boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (-58.18%)

Mutual labels: dataframe

Spark

Cross-platform real-time collaboration client optimized for business and organizations.

Stars: ✭ 471 (+756.36%)

Mutual labels: spark

Spark Summit East 2017

Stars: ✭ 33 (-40%)

Mutual labels: spark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+39987.27%)

Mutual labels: spark

Spark Scala Tutorial

A free tutorial for Apache Spark.

Stars: ✭ 907 (+1549.09%)

Mutual labels: spark

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+10823.64%)

Mutual labels: spark

Spark Examples

Spark examples

Stars: ✭ 41 (-25.45%)

Mutual labels: spark

Yanagishima

Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL

Stars: ✭ 424 (+670.91%)

Mutual labels: spark

Yandex Big Data Engineering

Stars: ✭ 17 (-69.09%)

Mutual labels: spark

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (+670.91%)

Mutual labels: spark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+1634.55%)

Mutual labels: spark

Learningspark

Scala examples for learning to use Spark

Stars: ✭ 421 (+665.45%)

Mutual labels: spark

Big Data Scala Spark

Coursera's big data course with Scala and Spark

Stars: ✭ 16 (-70.91%)

Mutual labels: spark

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (+661.82%)

Mutual labels: spark

Spark Nkp

Natural Korean Processor for Apache Spark

Stars: ✭ 50 (-9.09%)

Mutual labels: spark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (+649.09%)

Mutual labels: spark

Dataframe

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Stars: ✭ 828 (+1405.45%)

Mutual labels: dataframe

Spark Solr

Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

Stars: ✭ 411 (+647.27%)

Mutual labels: spark

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+1625.45%)

Mutual labels: spark

Tutorial

Java全栈知识架构体系总结