All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-1.74%)
Mutual labels:  big-data
K8s Ingress Claim
An admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-87.83%)
Mutual labels:  big-data
parquet-dotnet
🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+73.04%)
Mutual labels:  apache-spark
Dremio Oss
Dremio - the missing link in modern data
Stars: ✭ 862 (+649.57%)
Mutual labels:  big-data
Social-Network-Analysis-in-Python
Social Network Facebook Analysis (Python, Networkx)
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
Accumulo
Apache Accumulo
Stars: ✭ 857 (+645.22%)
Mutual labels:  big-data
IoT-system-PLC-data-to-InfluxDB
This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+642.61%)
Mutual labels:  big-data
predictionio-sdk-ruby
PredictionIO Ruby SDK
Stars: ✭ 192 (+66.96%)
Mutual labels:  big-data
bftkv
A distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-76.52%)
Mutual labels:  big-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+10.43%)
Mutual labels:  big-data
proxima-platform
The Proxima platform.
Stars: ✭ 17 (-85.22%)
Mutual labels:  apache-spark
Ambari
Mirror of Apache Ambari
Stars: ✭ 1,576 (+1270.43%)
Mutual labels:  big-data
Bandar Log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-83.48%)
Mutual labels:  big-data
spark-connector
A connector for Apache Spark to access Exasol
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
Sqoop
Mirror of Apache Sqoop
Stars: ✭ 817 (+610.43%)
Mutual labels:  big-data
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-82.61%)
Mutual labels:  pyspark
Titanoboa
Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+584.35%)
Mutual labels:  big-data
masc
Microsoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-82.61%)
Mutual labels:  big-data
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-79.13%)
Mutual labels:  apache-spark
Cython
The most widely used Python to C compiler
Stars: ✭ 6,588 (+5628.7%)
Mutual labels:  big-data
Samza
Mirror of Apache Samza
Stars: ✭ 676 (+487.83%)
Mutual labels:  big-data
spark-root
Apache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
Sdc
Intel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (+441.74%)
Mutual labels:  big-data
spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Stars: ✭ 36 (-68.7%)
Mutual labels:  pyspark
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4818.26%)
Mutual labels:  big-data
pulsar-adapters
Apache Pulsar Adapters
Stars: ✭ 18 (-84.35%)
Mutual labels:  apache-spark
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4693.91%)
Mutual labels:  big-data
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2546.96%)
Mutual labels:  big-data
Scanner
Efficient video analysis at scale
Stars: ✭ 569 (+394.78%)
Mutual labels:  big-data
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+7026.96%)
Mutual labels:  big-data
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+384.35%)
Mutual labels:  big-data
Cboard
An easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+2330.43%)
Mutual labels:  big-data
ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1242.61%)
Mutual labels:  big-data
Beam
Apache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+4377.39%)
Mutual labels:  big-data
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+113.91%)
Mutual labels:  big-data
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (+340.87%)
Mutual labels:  big-data
Real Time Social Media Mining
DevOps pipeline for Real Time Social/Web Mining
Stars: ✭ 22 (-80.87%)
Mutual labels:  big-data
Stream Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+3879.13%)
Mutual labels:  big-data
Trafodion
Apache Trafodion
Stars: ✭ 242 (+110.43%)
Mutual labels:  big-data
Redislite
Redis in a python module.
Stars: ✭ 464 (+303.48%)
Mutual labels:  big-data
falcon
Mirror of Apache Falcon
Stars: ✭ 95 (-17.39%)
Mutual labels:  big-data
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+294.78%)
Mutual labels:  big-data
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+106.09%)
Mutual labels:  big-data
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+19072.17%)
Mutual labels:  big-data
airavata-django-portal
Mirror of Apache Airavata Django Portal
Stars: ✭ 20 (-82.61%)
Mutual labels:  big-data
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+270.43%)
Mutual labels:  big-data
Books
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+93.04%)
Mutual labels:  big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+260%)
Mutual labels:  big-data
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-59.13%)
Mutual labels:  big-data
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-4.35%)
Mutual labels:  big-data
spark
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+429.57%)
Mutual labels:  apache-spark
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-62.61%)
Mutual labels:  big-data
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-5.22%)
Mutual labels:  big-data
Attic Predictionio Sdk Java
PredictionIO Java SDK
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-61.74%)
Mutual labels:  big-data
Mysql perf analyzer
MySQL performance monitoring and analysis.
Stars: ✭ 1,423 (+1137.39%)
Mutual labels:  big-data
301-360 of 536 similar projects