All Projects → Parquet Format → Similar Projects or Alternatives

420 Open source projects that are alternatives of or similar to Parquet Format

terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-96.87%)
Mutual labels:  big-data, parquet
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-81.87%)
Mutual labels:  big-data, parquet
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+105.25%)
Mutual labels:  big-data, parquet
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (-82.5%)
Mutual labels:  big-data, parquet
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-85.62%)
Mutual labels:  big-data, parquet
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (-57.62%)
Mutual labels:  big-data, parquet
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (+59.75%)
Mutual labels:  big-data, parquet
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-77.88%)
Mutual labels:  big-data, parquet
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+102.38%)
Mutual labels:  big-data, parquet
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-73%)
Mutual labels:  big-data, parquet
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (-65.5%)
Mutual labels:  big-data, parquet
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-30.37%)
Mutual labels:  big-data
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-46.75%)
Mutual labels:  big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-48.25%)
Mutual labels:  big-data
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-49.25%)
Mutual labels:  parquet
Sdc
Intel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-22.12%)
Mutual labels:  big-data
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-34%)
Mutual labels:  big-data
Mockneat
MockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-48.75%)
Mutual labels:  big-data
Kafka Connect Hdfs
Kafka Connect HDFS connector
Stars: ✭ 400 (-50%)
Mutual labels:  big-data
Beam
Apache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+543.63%)
Mutual labels:  big-data
Orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-51.37%)
Mutual labels:  big-data
Ignite
Apache Ignite
Stars: ✭ 4,027 (+403.38%)
Mutual labels:  big-data
Cython
The most widely used Python to C compiler
Stars: ✭ 6,588 (+723.5%)
Mutual labels:  big-data
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+607%)
Mutual labels:  big-data
Magellan
Geo Spatial Data Analytics on Spark
Stars: ✭ 507 (-36.62%)
Mutual labels:  big-data
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-53.5%)
Mutual labels:  parquet
Circosjs
d3 library to build circular graphs
Stars: ✭ 436 (-45.5%)
Mutual labels:  big-data
Pachyderm
Reproducible Data Science at Scale!
Stars: ✭ 5,305 (+563.13%)
Mutual labels:  big-data
Listenbrainz Server
Server for the ListenBrainz project
Stars: ✭ 420 (-47.5%)
Mutual labels:  big-data
Data Science Career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-21.25%)
Mutual labels:  big-data
Opendata.cern.ch
Source code for the CERN Open Data portal
Stars: ✭ 411 (-48.62%)
Mutual labels:  big-data
Couchdb
Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+545.75%)
Mutual labels:  big-data
Cogcomp Nlp
CogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-48.75%)
Mutual labels:  big-data
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-6.87%)
Mutual labels:  big-data
Decentralized Internet
A SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (-49.25%)
Mutual labels:  big-data
Arkime
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+524.25%)
Mutual labels:  big-data
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-50.87%)
Mutual labels:  parquet
Kafka Streams
equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-23.37%)
Mutual labels:  big-data
Skale
High performance distributed data processing engine
Stars: ✭ 390 (-51.25%)
Mutual labels:  parquet
Onlinestats.jl
Single-pass algorithms for statistics
Stars: ✭ 507 (-36.62%)
Mutual labels:  big-data
Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+376.63%)
Mutual labels:  big-data
Rakam Api
📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (-3.5%)
Mutual labels:  big-data
Pgm Index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-37.62%)
Mutual labels:  big-data
Hive
Apache Hive
Stars: ✭ 4,031 (+403.88%)
Mutual labels:  big-data
Halodb
A fast, log structured key-value store.
Stars: ✭ 370 (-53.75%)
Mutual labels:  big-data
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-54.87%)
Mutual labels:  big-data
Oozie
Mirror of Apache Oozie
Stars: ✭ 602 (-24.75%)
Mutual labels:  big-data
Stream Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+472%)
Mutual labels:  big-data
Sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-54.75%)
Mutual labels:  big-data
Sylph
Stream computing platform for bigdata
Stars: ✭ 362 (-54.75%)
Mutual labels:  big-data
Fit Sne
Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-39.37%)
Mutual labels:  big-data
Bigtop
Mirror of Apache Bigtop
Stars: ✭ 356 (-55.5%)
Mutual labels:  big-data
Vespa
The open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+368.38%)
Mutual labels:  big-data
Sciblog support
Support content for my blog
Stars: ✭ 694 (-13.25%)
Mutual labels:  big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+589.13%)
Mutual labels:  big-data
Redislite
Redis in a python module.
Stars: ✭ 464 (-42%)
Mutual labels:  big-data
Devops Roadmap
DevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-56.37%)
Mutual labels:  big-data
Attic Apex Core
Mirror of Apache Apex core
Stars: ✭ 346 (-56.75%)
Mutual labels:  big-data
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+482.75%)
Mutual labels:  big-data
Oap
Optimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-57.12%)
Mutual labels:  parquet
1-60 of 420 similar projects