All Projects → Petastorm → Similar Projects or Alternatives

154 Open source projects that are alternatives of or similar to Petastorm

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-63.36%)

Mutual labels: pyspark, parquet

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-98.29%)

Mutual labels: pyspark, parquet

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-89.98%)

Mutual labels: pyspark

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (-69.04%)

Mutual labels: parquet

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-97.65%)

Mutual labels: pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-96.93%)

Mutual labels: pyspark

pyspark-asyncactions

Asynchronous actions for PySpark

Stars: ✭ 30 (-97.29%)

Mutual labels: pyspark

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

Stars: ✭ 42 (-96.21%)

Mutual labels: pyspark

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-97.83%)

Mutual labels: pyspark

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (-89.62%)

Mutual labels: pyspark

Spark Gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

Stars: ✭ 308 (-72.2%)

Mutual labels: pyspark

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-98.56%)

Mutual labels: pyspark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-97.74%)

Mutual labels: pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (-42.87%)

Mutual labels: pyspark

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-97.29%)

Mutual labels: parquet

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-98.74%)

Mutual labels: pyspark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-97.11%)

Mutual labels: pyspark

Skale

High performance distributed data processing engine

Stars: ✭ 390 (-64.8%)

Mutual labels: parquet

ai-deployment

关注AI模型上线、模型部署

Stars: ✭ 149 (-86.55%)

Mutual labels: pyspark

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (-9.12%)

Mutual labels: parquet

meepo

异构存储数据迁移

Stars: ✭ 29 (-97.38%)

Mutual labels: parquet

Pystore

Fast data store for Pandas time-series data

Stars: ✭ 325 (-70.67%)

Mutual labels: parquet

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-97.74%)

Mutual labels: pyspark

Cluster Pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Stars: ✭ 23 (-97.92%)

Mutual labels: pyspark

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (-57.22%)

Mutual labels: pyspark

Ratatool

A tool for data sampling, data generation, and data diffing

Stars: ✭ 279 (-74.82%)

Mutual labels: parquet

Springboard-Data-Science-Immersive

No description or website provided.

Stars: ✭ 52 (-95.31%)

Mutual labels: pyspark

Roapi

Create full-fledged APIs for static datasets without writing a single line of code.

Stars: ✭ 253 (-77.17%)

Mutual labels: parquet

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (-37.18%)

Mutual labels: pyspark

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (-95.49%)

Mutual labels: pyspark

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (-97.38%)

Mutual labels: parquet

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-97.74%)

Mutual labels: pyspark

Spark Syntax

This is a repo documenting the best practices in PySpark.

Stars: ✭ 412 (-62.82%)

Mutual labels: pyspark

Node Parquet

NodeJS module to access apache parquet format files

Stars: ✭ 46 (-95.85%)

Mutual labels: parquet

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+121.93%)

Mutual labels: pyspark

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (-64.53%)

Mutual labels: parquet

HybridBackend

Efficient training of deep recommenders on cloud.

Stars: ✭ 30 (-97.29%)

Mutual labels: parquet

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-98.92%)

Mutual labels: pyspark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-95.49%)

Mutual labels: pyspark

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (-66.43%)

Mutual labels: parquet

Azure-Databricks-NYC-Taxi-Workshop

An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset

Stars: ✭ 71 (-93.59%)

Mutual labels: pyspark

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (-94.86%)

Mutual labels: parquet

centurion

Kotlin Bigdata Toolkit

Stars: ✭ 320 (-71.12%)

Mutual labels: parquet

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (-69.4%)

Mutual labels: parquet

experiments

Code examples for my blog posts

Stars: ✭ 21 (-98.1%)

Mutual labels: parquet

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-97.92%)

Mutual labels: pyspark

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-96.93%)

Mutual labels: pyspark

Pyspark Boilerplate

A boilerplate for writing PySpark Jobs

Stars: ✭ 318 (-71.3%)

Mutual labels: pyspark

machine-learning-course

Machine Learning Course @ Santa Clara University

Stars: ✭ 17 (-98.47%)

Mutual labels: pyspark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (-11.01%)

Mutual labels: pyspark

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-95.76%)

Mutual labels: pyspark

Elasticsearch loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Stars: ✭ 300 (-72.92%)

Mutual labels: parquet

graphique

GraphQL service for arrow tables and parquet data sets.

Stars: ✭ 28 (-97.47%)

Mutual labels: parquet

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-98.56%)

Mutual labels: parquet

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-98.47%)

Mutual labels: pyspark

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-75.09%)

Mutual labels: parquet

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-94.77%)

Mutual labels: parquet

Awesome Spark

A curated list of awesome Apache Spark packages and resources.

Stars: ✭ 1,061 (-4.24%)

Mutual labels: pyspark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (-13.9%)

Mutual labels: pyspark

Parquet Format

Apache Parquet

Stars: ✭ 800 (-27.8%)

Mutual labels: parquet

1-60 of 154 similar projects

›