All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

osm-parquetizer
A converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (-38.26%)
Mutual labels:  apache-spark
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1307.83%)
Mutual labels:  big-data
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (-54.78%)
Mutual labels:  pyspark
Cmak
CMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+9068.7%)
Mutual labels:  big-data
cloud-integration
Spark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-82.61%)
Mutual labels:  apache-spark
sparklygraphs
Old repo for R interface for GraphFrames
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+0%)
Mutual labels:  big-data
Orc
An ORC file format reader and writer for Go.
Stars: ✭ 97 (-15.65%)
Mutual labels:  big-data
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-66.09%)
Mutual labels:  big-data
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+33.04%)
Mutual labels:  big-data
Asakusafw
Asakusa Framework
Stars: ✭ 114 (-0.87%)
Mutual labels:  big-data
Treeviz
Tree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-17.39%)
Mutual labels:  big-data
phoenix-queryserver
Apache Phoenix Query Server
Stars: ✭ 33 (-71.3%)
Mutual labels:  big-data
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+26.96%)
Mutual labels:  apache-spark
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1213.91%)
Mutual labels:  big-data
Smart Array To Tree
Convert large amounts of data array to tree fastly
Stars: ✭ 91 (-20.87%)
Mutual labels:  big-data
streamsx.kafka
Repository for integration with Apache Kafka
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-28.7%)
Mutual labels:  big-data
airavata-php-gateway
Mirror of Apache Airavata PHP Gateway
Stars: ✭ 15 (-86.96%)
Mutual labels:  big-data
Uproot4
ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-30.43%)
Mutual labels:  big-data
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-1.74%)
Mutual labels:  big-data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-31.3%)
Mutual labels:  big-data
net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Stars: ✭ 72 (-37.39%)
Mutual labels:  apache-spark
Spark Website
Apache Spark Website
Stars: ✭ 75 (-34.78%)
Mutual labels:  big-data
Location-based-Restaurants-Recommendation-System
Big Data Management and Analysis Final Project
Stars: ✭ 44 (-61.74%)
Mutual labels:  apache-spark
Bookkeeper
Apache Bookkeeper
Stars: ✭ 1,178 (+924.35%)
Mutual labels:  big-data
azure-big-data-starter
A boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-88.7%)
Mutual labels:  big-data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-38.26%)
Mutual labels:  big-data
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-49.57%)
Mutual labels:  pyspark
Countly Sdk Cordova
Countly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-40%)
Mutual labels:  big-data
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-78.26%)
Mutual labels:  apache-spark
Hazelcast Cpp Client
Hazelcast IMDG C++ Client
Stars: ✭ 67 (-41.74%)
Mutual labels:  big-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+10.43%)
Mutual labels:  big-data
proxima-platform
The Proxima platform.
Stars: ✭ 17 (-85.22%)
Mutual labels:  apache-spark
Ambari
Mirror of Apache Ambari
Stars: ✭ 1,576 (+1270.43%)
Mutual labels:  big-data
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-43.48%)
Mutual labels:  big-data
beam-site
Apache Beam Site
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+879.13%)
Mutual labels:  big-data
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+9.57%)
Mutual labels:  big-data
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.13%)
Mutual labels:  pyspark
Attic Lens
Mirror of Apache Lens
Stars: ✭ 58 (-49.57%)
Mutual labels:  big-data
predictionio-sdk-python
PredictionIO Python SDK
Stars: ✭ 199 (+73.04%)
Mutual labels:  big-data
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-50.43%)
Mutual labels:  big-data
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-79.13%)
Mutual labels:  pyspark
Lifion Kinesis
A native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-53.04%)
Mutual labels:  big-data
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (-84.35%)
Mutual labels:  apache-spark
Oodt
Mirror of Apache OODT
Stars: ✭ 52 (-54.78%)
Mutual labels:  big-data
Trck
Query engine for TrailDB
Stars: ✭ 48 (-58.26%)
Mutual labels:  big-data
bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-68.7%)
Mutual labels:  big-data
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+791.3%)
Mutual labels:  big-data
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-53.04%)
Mutual labels:  big-data
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1242.61%)
Mutual labels:  big-data
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-4.35%)
Mutual labels:  big-data
spark
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+429.57%)
Mutual labels:  apache-spark
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-62.61%)
Mutual labels:  big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-5.22%)
Mutual labels:  big-data
Attic Predictionio Sdk Java
PredictionIO Java SDK
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-61.74%)
Mutual labels:  big-data
241-300 of 536 similar projects