All Projects → Parquet Format → Similar Projects or Alternatives

420 Open source projects that are alternatives of or similar to Parquet Format

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-96.87%)

Mutual labels: big-data, parquet

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (-81.87%)

Mutual labels: big-data, parquet

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+105.25%)

Mutual labels: big-data, parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-82.5%)

Mutual labels: big-data, parquet

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (-85.62%)

Mutual labels: big-data, parquet

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (-57.62%)

Mutual labels: big-data, parquet

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (+59.75%)

Mutual labels: big-data, parquet

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-77.88%)

Mutual labels: big-data, parquet

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+102.38%)

Mutual labels: big-data, parquet

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (-73%)

Mutual labels: big-data, parquet

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-65.5%)

Mutual labels: big-data, parquet

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (-30.37%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (-46.75%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (-48.25%)

Mutual labels: big-data

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-49.25%)

Mutual labels: parquet

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (-22.12%)

Mutual labels: big-data

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (-34%)

Mutual labels: big-data

Mockneat

MockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.

Stars: ✭ 410 (-48.75%)

Mutual labels: big-data

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (-50%)

Mutual labels: big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+543.63%)

Mutual labels: big-data

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (-51.37%)

Mutual labels: big-data

Ignite

Apache Ignite

Stars: ✭ 4,027 (+403.38%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+723.5%)

Mutual labels: big-data

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+607%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-36.62%)

Mutual labels: big-data

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (-53.5%)

Mutual labels: parquet

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (-45.5%)

Mutual labels: big-data

Pachyderm

Reproducible Data Science at Scale!

Stars: ✭ 5,305 (+563.13%)

Mutual labels: big-data

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (-47.5%)

Mutual labels: big-data

Data Science Career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

Stars: ✭ 630 (-21.25%)

Mutual labels: big-data

Opendata.cern.ch

Source code for the CERN Open Data portal

Stars: ✭ 411 (-48.62%)

Mutual labels: big-data

Couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

Stars: ✭ 5,166 (+545.75%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (-48.75%)

Mutual labels: big-data

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (-6.87%)

Mutual labels: big-data

Decentralized Internet

A SDK/library for decentralized web and distributing computing projects

Stars: ✭ 406 (-49.25%)

Mutual labels: big-data

Arkime

Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.

Stars: ✭ 4,994 (+524.25%)

Mutual labels: big-data

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (-50.87%)

Mutual labels: parquet

Kafka Streams

equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨

Stars: ✭ 613 (-23.37%)

Mutual labels: big-data

Skale

High performance distributed data processing engine

Stars: ✭ 390 (-51.25%)

Mutual labels: parquet

Onlinestats.jl

Single-pass algorithms for statistics

Stars: ✭ 507 (-36.62%)

Mutual labels: big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+376.63%)

Mutual labels: big-data

Rakam Api

📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)

Stars: ✭ 772 (-3.5%)

Mutual labels: big-data

Pgm Index

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Stars: ✭ 499 (-37.62%)

Mutual labels: big-data

Hive

Apache Hive

Stars: ✭ 4,031 (+403.88%)

Mutual labels: big-data

Halodb

A fast, log structured key-value store.

Stars: ✭ 370 (-53.75%)

Mutual labels: big-data

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (-54.87%)

Mutual labels: big-data

Oozie

Mirror of Apache Oozie

Stars: ✭ 602 (-24.75%)

Mutual labels: big-data

Stream Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+472%)

Mutual labels: big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-54.75%)

Mutual labels: big-data

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (-54.75%)

Mutual labels: big-data

Fit Sne

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)