All Projects → pyspark-cheatsheet → Similar Projects or Alternatives

536 Open source projects that are alternatives of or similar to pyspark-cheatsheet

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (+0%)

Mutual labels: pyspark

OSCI

Open Source Contributor Index

Stars: ✭ 107 (-6.96%)

Mutual labels: pyspark

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-66.09%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+33.04%)

Mutual labels: big-data

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (-13.91%)

Mutual labels: big-data

streamsx.kafka

Repository for integration with Apache Kafka

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-86.96%)

Mutual labels: big-data

net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Stars: ✭ 72 (-37.39%)

Mutual labels: apache-spark

Location-based-Restaurants-Recommendation-System

Big Data Management and Analysis Final Project

Stars: ✭ 44 (-61.74%)

Mutual labels: apache-spark

azure-big-data-starter

A boilerplate project for Azure Big Data PaaS services

Stars: ✭ 13 (-88.7%)

Mutual labels: big-data

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (-49.57%)

Mutual labels: pyspark

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

Stars: ✭ 25 (-78.26%)

Mutual labels: apache-spark

predictionio-template-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 73 (-36.52%)

Mutual labels: big-data

beam-site

Apache Beam Site

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-59.13%)

Mutual labels: pyspark

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+1952.17%)

Mutual labels: big-data

FIW KRT

Families In the WIld: A Kinship Recogntion Toolbox.

Stars: ✭ 18 (-84.35%)

Mutual labels: big-data

predictionio-sdk-python

PredictionIO Python SDK

Stars: ✭ 199 (+73.04%)

Mutual labels: big-data

ceja

PySpark phonetic and string matching algorithms

Stars: ✭ 24 (-79.13%)

Mutual labels: pyspark

fink-broker

Astronomy Broker based on Apache Spark

Stars: ✭ 18 (-84.35%)

Mutual labels: apache-spark

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-68.7%)

Mutual labels: big-data

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-53.04%)

Mutual labels: big-data

accumulo-testing

Apache Accumulo Testing

Stars: ✭ 14 (-87.83%)

Mutual labels: big-data

predictionio-sdk-java

PredictionIO Java SDK

Stars: ✭ 107 (-6.96%)

Mutual labels: big-data

predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Stars: ✭ 12,510 (+10778.26%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-70.43%)

Mutual labels: big-data

shifting

A privacy-focused list of alternatives to mainstream services to help the competition.

Stars: ✭ 31 (-73.04%)

Mutual labels: big-data

net.jgp.books.spark.ch07

Spark in Action, 2nd edition - chapter 7 - Ingestion from files

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

parquet-dotnet

🐬 Apache Parquet for modern .Net

Stars: ✭ 199 (+73.04%)

Mutual labels: apache-spark

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (+66.96%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-76.52%)

Mutual labels: big-data

spark-connector

A connector for Apache Spark to access Exasol

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

pyspark-for-data-processing

Code for my presentation: Using PySpark to Process Boat Loads of Data

Stars: ✭ 20 (-82.61%)

Mutual labels: pyspark

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-79.13%)

Mutual labels: apache-spark

yildiz

🦄🌟 Graph Database layer on top of Google Bigtable

Stars: ✭ 24 (-79.13%)

Mutual labels: big-data

spark-root

Apache Spark Data Source for ROOT File Format

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

spark-dgraph-connector

A connector for Apache Spark and PySpark to Dgraph databases.