All Projects → Petastorm → Similar Projects or Alternatives

154 Open source projects that are alternatives of or similar to Petastorm

Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-63.36%)
Mutual labels:  pyspark, parquet
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-98.29%)
Mutual labels:  pyspark, parquet
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-89.98%)
Mutual labels:  pyspark
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-69.04%)
Mutual labels:  parquet
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-97.65%)
Mutual labels:  pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-96.93%)
Mutual labels:  pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (-97.29%)
Mutual labels:  pyspark
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Stars: ✭ 42 (-96.21%)
Mutual labels:  pyspark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-97.83%)
Mutual labels:  pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-89.62%)
Mutual labels:  pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (-72.2%)
Mutual labels:  pyspark
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-98.56%)
Mutual labels:  pyspark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-97.74%)
Mutual labels:  pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-42.87%)
Mutual labels:  pyspark
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-97.29%)
Mutual labels:  parquet
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-98.74%)
Mutual labels:  pyspark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-97.11%)
Mutual labels:  pyspark
Skale
High performance distributed data processing engine
Stars: ✭ 390 (-64.8%)
Mutual labels:  parquet
ai-deployment
关注AI模型上线、模型部署
Stars: ✭ 149 (-86.55%)
Mutual labels:  pyspark
Quilt
Quilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (-9.12%)
Mutual labels:  parquet
meepo
异构存储数据迁移
Stars: ✭ 29 (-97.38%)
Mutual labels:  parquet
Pystore
Fast data store for Pandas time-series data
Stars: ✭ 325 (-70.67%)
Mutual labels:  parquet
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-97.74%)
Mutual labels:  pyspark
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-97.92%)
Mutual labels:  pyspark
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (-57.22%)
Mutual labels:  pyspark
Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (-74.82%)
Mutual labels:  parquet
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (-95.31%)
Mutual labels:  pyspark
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-77.17%)
Mutual labels:  parquet
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-37.18%)
Mutual labels:  pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-95.49%)
Mutual labels:  pyspark
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (-97.38%)
Mutual labels:  parquet
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-97.74%)
Mutual labels:  pyspark
Spark Syntax
This is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (-62.82%)
Mutual labels:  pyspark
Node Parquet
NodeJS module to access apache parquet format files
Stars: ✭ 46 (-95.85%)
Mutual labels:  parquet
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+121.93%)
Mutual labels:  pyspark
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-64.53%)
Mutual labels:  parquet
HybridBackend
Efficient training of deep recommenders on cloud.
Stars: ✭ 30 (-97.29%)
Mutual labels:  parquet
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-98.92%)
Mutual labels:  pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-95.49%)
Mutual labels:  pyspark
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-66.43%)
Mutual labels:  parquet
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Stars: ✭ 71 (-93.59%)
Mutual labels:  pyspark
Gcs Tools
GCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (-94.86%)
Mutual labels:  parquet
centurion
Kotlin Bigdata Toolkit
Stars: ✭ 320 (-71.12%)
Mutual labels:  parquet
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (-69.4%)
Mutual labels:  parquet
experiments
Code examples for my blog posts
Stars: ✭ 21 (-98.1%)
Mutual labels:  parquet
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-97.92%)
Mutual labels:  pyspark
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-96.93%)
Mutual labels:  pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (-71.3%)
Mutual labels:  pyspark
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-98.47%)
Mutual labels:  pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-11.01%)
Mutual labels:  pyspark
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-95.76%)
Mutual labels:  pyspark
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (-72.92%)
Mutual labels:  parquet
graphique
GraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (-97.47%)
Mutual labels:  parquet
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-98.56%)
Mutual labels:  parquet
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-98.47%)
Mutual labels:  pyspark
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (-75.09%)
Mutual labels:  parquet
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-94.77%)
Mutual labels:  parquet
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (-4.24%)
Mutual labels:  pyspark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-13.9%)
Mutual labels:  pyspark
Parquet Format
Apache Parquet
Stars: ✭ 800 (-27.8%)
Mutual labels:  parquet
1-60 of 154 similar projects