All Projects → Petastorm → Similar Projects or Alternatives

154 Open source projects that are alternatives of or similar to Petastorm

Pyspark Learning
Updated repository
Stars: ✭ 147 (-86.73%)
Mutual labels:  pyspark
Repo 2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (-88%)
Mutual labels:  pyspark
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (-88.63%)
Mutual labels:  pyspark
Eat pyspark in 10 days
pyspark🍒🥭 is delicious,just eat it!😋😋
Stars: ✭ 116 (-89.53%)
Mutual labels:  pyspark
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-90.25%)
Mutual labels:  pyspark
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-90.25%)
Mutual labels:  pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (-91.16%)
Mutual labels:  pyspark
Relation extraction
Relation Extraction using Deep learning(CNN)
Stars: ✭ 96 (-91.34%)
Mutual labels:  pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+20.76%)
Mutual labels:  pyspark
Pyspark Tutorial
PySpark Code for Hands-on Learners
Stars: ✭ 91 (-91.79%)
Mutual labels:  pyspark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-91.79%)
Mutual labels:  pyspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-92.15%)
Mutual labels:  pyspark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-94.22%)
Mutual labels:  pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-94.31%)
Mutual labels:  pyspark
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (-77.89%)
Mutual labels:  parquet
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-80.51%)
Mutual labels:  parquet
Parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
Stars: ✭ 200 (-81.95%)
Mutual labels:  parquet
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-84.03%)
Mutual labels:  parquet
Sqlite Parquet Vtable
A SQLite vtable extension to read Parquet files
Stars: ✭ 167 (-84.93%)
Mutual labels:  parquet
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-86.91%)
Mutual labels:  parquet
Parquet Rs
Apache Parquet implementation in Rust
Stars: ✭ 144 (-87%)
Mutual labels:  parquet
Kartothek
A consistent table management library in python
Stars: ✭ 144 (-87%)
Mutual labels:  parquet
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (-87.36%)
Mutual labels:  parquet
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+48.19%)
Mutual labels:  parquet
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-88.72%)
Mutual labels:  parquet
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-89.62%)
Mutual labels:  parquet
Parquet Go
Go package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-89.71%)
Mutual labels:  parquet
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (-90.16%)
Mutual labels:  parquet
Kglab
Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (-91.16%)
Mutual labels:  parquet
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-91.25%)
Mutual labels:  parquet
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (+15.34%)
Mutual labels:  parquet
Bigdata File Viewer
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-92.24%)
Mutual labels:  parquet
Sparksql Protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Stars: ✭ 82 (-92.6%)
Mutual labels:  parquet
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+46.12%)
Mutual labels:  parquet
121-154 of 154 similar projects