All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Stars: ✭ 72 (-37.39%)
Mutual labels:  apache-spark
azure-big-data-starter
A boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-88.7%)
Mutual labels:  big-data
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-78.26%)
Mutual labels:  apache-spark
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.13%)
Mutual labels:  pyspark
RemoteShuffleService
Celeborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+127.83%)
Mutual labels:  big-data
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-79.13%)
Mutual labels:  pyspark
v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+180.87%)
Mutual labels:  big-data
spark-sql-internals
The Internals of Spark SQL
Stars: ✭ 331 (+187.83%)
Mutual labels:  apache-spark
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-74.78%)
Mutual labels:  apache-spark
terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-78.26%)
Mutual labels:  big-data
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-53.04%)
Mutual labels:  big-data
net.jgp.books.spark.ch07
Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
parquet-dotnet
🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+73.04%)
Mutual labels:  apache-spark
IoT-system-PLC-data-to-InfluxDB
This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
bftkv
A distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-76.52%)
Mutual labels:  big-data
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-79.13%)
Mutual labels:  apache-spark
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-82.61%)
Mutual labels:  pyspark
spark-root
Apache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
pulsar-adapters
Apache Pulsar Adapters
Stars: ✭ 18 (-84.35%)
Mutual labels:  apache-spark
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-82.61%)
Mutual labels:  big-data
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-78.26%)
Mutual labels:  big-data
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+7026.96%)
Mutual labels:  big-data
ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+920%)
Mutual labels:  big-data
Real Time Social Media Mining
DevOps pipeline for Real Time Social/Web Mining
Stars: ✭ 22 (-80.87%)
Mutual labels:  big-data
falcon
Mirror of Apache Falcon
Stars: ✭ 95 (-17.39%)
Mutual labels:  big-data
PysparkCheatsheet
PySpark Cheatsheet
Stars: ✭ 25 (-78.26%)
Mutual labels:  apache-spark
Big-Data-Demo
基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (+26.96%)
Mutual labels:  big-data
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-56.52%)
Mutual labels:  big-data
airavata-django-portal
Mirror of Apache Airavata Django Portal
Stars: ✭ 20 (-82.61%)
Mutual labels:  big-data
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-59.13%)
Mutual labels:  big-data
lcbo-api
A crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (+32.17%)
Mutual labels:  big-data
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-78.26%)
Mutual labels:  pyspark
talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (+28.7%)
Mutual labels:  big-data
oshinko-s2i
This is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-86.09%)
Mutual labels:  pyspark
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-47.83%)
Mutual labels:  big-data
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-33.04%)
Mutual labels:  pyspark
automile-php
Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-78.26%)
Mutual labels:  pyspark
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-85.22%)
Mutual labels:  pyspark
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+312.17%)
Mutual labels:  pyspark
couchdb-mango
Mirror of Apache CouchDB Mango
Stars: ✭ 34 (-70.43%)
Mutual labels:  big-data
xcast
A High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
FlameStream
Distributed stream processing model and its implementation
Stars: ✭ 14 (-87.83%)
Mutual labels:  big-data
lubeck
High level linear algebra library for Dlang
Stars: ✭ 57 (-50.43%)
Mutual labels:  big-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-85.22%)
Mutual labels:  big-data
flask-spark-docker
Just a boilerplate for PySpark and Flask
Stars: ✭ 32 (-72.17%)
Mutual labels:  pyspark
ngm
swissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-80%)
Mutual labels:  big-data
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-45.22%)
Mutual labels:  big-data
hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (-73.04%)
Mutual labels:  apache-spark
nifi
Deploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-67.83%)
Mutual labels:  big-data
automile-net
Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
big-data-upf
RECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-81.74%)
Mutual labels:  big-data
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-86.09%)
Mutual labels:  big-data
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (-66.09%)
Mutual labels:  apache-spark
couchdb-couch-plugins
Mirror of Apache CouchDB
Stars: ✭ 14 (-87.83%)
Mutual labels:  big-data
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+0%)
Mutual labels:  pyspark
OSCI
Open Source Contributor Index
Stars: ✭ 107 (-6.96%)
Mutual labels:  pyspark
predictionio-template-ecom-recommender
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 73 (-36.52%)
Mutual labels:  big-data
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+1952.17%)
Mutual labels:  big-data
61-120 of 536 similar projects