Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-66.38%)

Mutual labels: pyspark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+4775.86%)

Mutual labels: spark

flask-spark-docker

Just a boilerplate for PySpark and Flask

Stars: ✭ 32 (-72.41%)

Mutual labels: pyspark

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-48.28%)

Mutual labels: spark

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Stars: ✭ 72 (-37.93%)

Mutual labels: pyspark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+4652.59%)

Mutual labels: spark

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (-52.59%)

Mutual labels: pyspark

Pyspark Tutorial

PySpark Code for Hands-on Learners

Stars: ✭ 91 (-21.55%)

Mutual labels: pyspark

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (-50%)

Mutual labels: pyspark

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+4537.07%)

Mutual labels: spark

isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Stars: ✭ 28 (-75.86%)

Mutual labels: pyspark

Zemberek Nlp Server

Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu

Stars: ✭ 60 (-48.28%)

Mutual labels: spark

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (-80.17%)

Mutual labels: pyspark

Spark Daria

Essential Spark extensions and helper methods ✨😲

Stars: ✭ 553 (+376.72%)

Mutual labels: spark

Xlearning Xdml

extremely distributed machine learning

Stars: ✭ 113 (-2.59%)

Mutual labels: spark

Bigdata Notebook

Stars: ✭ 100 (-13.79%)

Mutual labels: spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-35.34%)

Mutual labels: spark

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+27156.9%)

Mutual labels: spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+121.55%)

Mutual labels: spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+2524.14%)

Mutual labels: spark

Lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

Stars: ✭ 530 (+356.9%)

Mutual labels: spark

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (+114.66%)

Mutual labels: spark

Pyspark Examples

Code examples on Apache Spark using python

Stars: ✭ 58 (-50%)

Mutual labels: spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+112.93%)

Mutual labels: spark

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+338.79%)

Mutual labels: spark

Neo4j Spark Connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Stars: ✭ 245 (+111.21%)

Mutual labels: spark

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)