All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+0%)
Mutual labels:  pyspark
OSCI
Open Source Contributor Index
Stars: ✭ 107 (-6.96%)
Mutual labels:  pyspark
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-66.09%)
Mutual labels:  big-data
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+33.04%)
Mutual labels:  big-data
classifai
🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-13.91%)
Mutual labels:  big-data
streamsx.kafka
Repository for integration with Apache Kafka
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
airavata-php-gateway
Mirror of Apache Airavata PHP Gateway
Stars: ✭ 15 (-86.96%)
Mutual labels:  big-data
net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Stars: ✭ 72 (-37.39%)
Mutual labels:  apache-spark
Location-based-Restaurants-Recommendation-System
Big Data Management and Analysis Final Project
Stars: ✭ 44 (-61.74%)
Mutual labels:  apache-spark
azure-big-data-starter
A boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-88.7%)
Mutual labels:  big-data
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-49.57%)
Mutual labels:  pyspark
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-78.26%)
Mutual labels:  apache-spark
predictionio-template-ecom-recommender
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 73 (-36.52%)
Mutual labels:  big-data
beam-site
Apache Beam Site
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.13%)
Mutual labels:  pyspark
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+1952.17%)
Mutual labels:  big-data
FIW KRT
Families In the WIld: A Kinship Recogntion Toolbox.
Stars: ✭ 18 (-84.35%)
Mutual labels:  big-data
predictionio-sdk-python
PredictionIO Python SDK
Stars: ✭ 199 (+73.04%)
Mutual labels:  big-data
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-79.13%)
Mutual labels:  pyspark
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (-84.35%)
Mutual labels:  apache-spark
bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-68.7%)
Mutual labels:  big-data
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-53.04%)
Mutual labels:  big-data
accumulo-testing
Apache Accumulo Testing
Stars: ✭ 14 (-87.83%)
Mutual labels:  big-data
predictionio-sdk-java
PredictionIO Java SDK
Stars: ✭ 107 (-6.96%)
Mutual labels:  big-data
predictionio
PredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+10778.26%)
Mutual labels:  big-data
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-70.43%)
Mutual labels:  big-data
shifting
A privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (-73.04%)
Mutual labels:  big-data
net.jgp.books.spark.ch07
Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
parquet-dotnet
🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+73.04%)
Mutual labels:  apache-spark
Social-Network-Analysis-in-Python
Social Network Facebook Analysis (Python, Networkx)
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
IoT-system-PLC-data-to-InfluxDB
This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-77.39%)
Mutual labels:  big-data
predictionio-sdk-ruby
PredictionIO Ruby SDK
Stars: ✭ 192 (+66.96%)
Mutual labels:  big-data
bftkv
A distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-76.52%)
Mutual labels:  big-data
spark-connector
A connector for Apache Spark to access Exasol
Stars: ✭ 13 (-88.7%)
Mutual labels:  apache-spark
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-82.61%)
Mutual labels:  pyspark
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-79.13%)
Mutual labels:  apache-spark
yildiz
🦄🌟 Graph Database layer on top of Google Bigtable
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
spark-root
Apache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-75.65%)
Mutual labels:  big-data
spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Stars: ✭ 36 (-68.7%)
Mutual labels:  pyspark
pulsar-adapters
Apache Pulsar Adapters
Stars: ✭ 18 (-84.35%)
Mutual labels:  apache-spark
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2546.96%)
Mutual labels:  big-data
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+7026.96%)
Mutual labels:  big-data
ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-79.13%)
Mutual labels:  big-data
pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Stars: ✭ 24 (-79.13%)
Mutual labels:  pyspark
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-76.52%)
Mutual labels:  big-data
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+113.91%)
Mutual labels:  big-data
Real Time Social Media Mining
DevOps pipeline for Real Time Social/Web Mining
Stars: ✭ 22 (-80.87%)
Mutual labels:  big-data
Trafodion
Apache Trafodion
Stars: ✭ 242 (+110.43%)
Mutual labels:  big-data
falcon
Mirror of Apache Falcon
Stars: ✭ 95 (-17.39%)
Mutual labels:  big-data
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+106.09%)
Mutual labels:  big-data
airavata-django-portal
Mirror of Apache Airavata Django Portal
Stars: ✭ 20 (-82.61%)
Mutual labels:  big-data
Books
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+93.04%)
Mutual labels:  big-data
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-59.13%)
Mutual labels:  big-data
big-data-lite
Samples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-64.35%)
Mutual labels:  big-data
spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Stars: ✭ 129 (+12.17%)
Mutual labels:  apache-spark
data-viz-utils
Functions for easily making publication-quality figures with matplotlib.
Stars: ✭ 16 (-86.09%)
Mutual labels:  big-data
SGDLibrary
MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+43.48%)
Mutual labels:  big-data
hazelcast-csharp-client
Hazelcast .NET Client
Stars: ✭ 98 (-14.78%)
Mutual labels:  big-data
insightedge
InsightEdge Core
Stars: ✭ 22 (-80.87%)
Mutual labels:  big-data
geospark
bring sf to spark in production
Stars: ✭ 53 (-53.91%)
Mutual labels:  apache-spark
121-180 of 536 similar projects