All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+26.96%)
Mutual labels:  apache-spark
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1213.91%)
Mutual labels:  big-data
Sylph
Stream computing platform for bigdata
Stars: ✭ 362 (+214.78%)
Mutual labels:  big-data
Vespa
The open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+3158.26%)
Mutual labels:  big-data
Calcite
Apache Calcite
Stars: ✭ 2,816 (+2348.7%)
Mutual labels:  big-data
Attic Apex Core
Mirror of Apache Apex core
Stars: ✭ 346 (+200.87%)
Mutual labels:  big-data
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-47.83%)
Mutual labels:  big-data
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (+194.78%)
Mutual labels:  big-data
Couchdb Docker
Semi-official Apache CouchDB Docker images
Stars: ✭ 194 (+68.7%)
Mutual labels:  big-data
Grouparoo
🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (+190.43%)
Mutual labels:  big-data
automile-php
Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
Tez
Apache Tez
Stars: ✭ 313 (+172.17%)
Mutual labels:  big-data
Attic Predictionio Sdk Ruby
PredictionIO Ruby SDK
Stars: ✭ 192 (+66.96%)
Mutual labels:  big-data
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-78.26%)
Mutual labels:  pyspark
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-1.74%)
Mutual labels:  big-data
Fluid
Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud
Stars: ✭ 265 (+130.43%)
Mutual labels:  big-data
Presto Go Client
A Presto client for the Go programming language.
Stars: ✭ 183 (+59.13%)
Mutual labels:  big-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+10.43%)
Mutual labels:  big-data
proxima-platform
The Proxima platform.
Stars: ✭ 17 (-85.22%)
Mutual labels:  apache-spark
Ambari
Mirror of Apache Ambari
Stars: ✭ 1,576 (+1270.43%)
Mutual labels:  big-data
lubeck
High level linear algebra library for Dlang
Stars: ✭ 57 (-50.43%)
Mutual labels:  big-data
Couchdb Fauxton
Apache CouchDB
Stars: ✭ 295 (+156.52%)
Mutual labels:  big-data
Smooks
An extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+154.78%)
Mutual labels:  big-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-85.22%)
Mutual labels:  big-data
Flink
Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.
Stars: ✭ 17,781 (+15361.74%)
Mutual labels:  big-data
Keyvi
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (+48.7%)
Mutual labels:  big-data
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+3883.48%)
Mutual labels:  big-data
ngm
swissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-80%)
Mutual labels:  big-data
Geopyspark
GeoTrellis for PySpark
Stars: ✭ 167 (+45.22%)
Mutual labels:  big-data
Datahub
The Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+3580%)
Mutual labels:  big-data
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-45.22%)
Mutual labels:  big-data
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1242.61%)
Mutual labels:  big-data
Fluo
Apache Fluo
Stars: ✭ 159 (+38.26%)
Mutual labels:  big-data
bigstatsr
R package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (+20.87%)
Mutual labels:  big-data
automile-net
Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-4.35%)
Mutual labels:  big-data
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+32.17%)
Mutual labels:  big-data
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-87.83%)
Mutual labels:  big-data
spark
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+429.57%)
Mutual labels:  apache-spark
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-62.61%)
Mutual labels:  big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-5.22%)
Mutual labels:  big-data
Attic Predictionio Sdk Java
PredictionIO Java SDK
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+33.04%)
Mutual labels:  big-data
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-86.09%)
Mutual labels:  big-data
ibmpairs
open source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (-80%)
Mutual labels:  big-data
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (-66.09%)
Mutual labels:  apache-spark
predictionio-template-attribute-based-classifier
PredictionIO Classification Engine Template (Scala-based parallelized engine)
Stars: ✭ 38 (-66.96%)
Mutual labels:  big-data
100daysofmlcode
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (+26.96%)
Mutual labels:  big-data
predictionio-template-ecom-recommender
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 73 (-36.52%)
Mutual labels:  big-data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-61.74%)
Mutual labels:  big-data
Mysql perf analyzer
MySQL performance monitoring and analysis.
Stars: ✭ 1,423 (+1137.39%)
Mutual labels:  big-data
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-12.17%)
Mutual labels:  big-data
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (-42.61%)
Mutual labels:  big-data
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-13.04%)
Mutual labels:  big-data
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-13.91%)
Mutual labels:  big-data
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+97.39%)
Mutual labels:  big-data
Samza Hello Samza
Mirror of Apache Samza
Stars: ✭ 99 (-13.91%)
Mutual labels:  big-data
Metamodel
Mirror of Apache Metamodel
Stars: ✭ 143 (+24.35%)
Mutual labels:  big-data
361-420 of 536 similar projects