All Projects → Parquet Mr → Similar Projects or Alternatives

420 Open source projects that are alternatives of or similar to Parquet Mr

Apache Parquet

Stars: ✭ 339 (-73.47%)

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-86.15%)

Mutual labels: big-data, parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-89.05%)

Mutual labels: big-data, parquet

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (-88.65%)

Mutual labels: big-data, parquet

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+26.68%)

Mutual labels: big-data, parquet

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-98.04%)

Mutual labels: big-data, parquet

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (-91%)

Mutual labels: big-data, parquet

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (-83.1%)

Mutual labels: big-data, parquet

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+28.48%)

Mutual labels: big-data, parquet

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-78.4%)

Mutual labels: big-data, parquet

Parquet Format

Apache Parquet

Stars: ✭ 800 (-37.4%)

Mutual labels: big-data, parquet

Carbondata

Mirror of Apache CarbonData

Stars: ✭ 1,158 (-9.39%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-95.77%)

Mutual labels: big-data

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-95.93%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-96.24%)

Mutual labels: big-data

Spark Website

Apache Spark Website

Stars: ✭ 75 (-94.13%)

Mutual labels: big-data

Flink Shaded

Apache Flink shaded artifacts repository

Stars: ✭ 67 (-94.76%)

Mutual labels: big-data

Traildb

TrailDB is an efficient tool for storing and querying series of events

Stars: ✭ 1,029 (-19.48%)

Mutual labels: big-data

Couchdb Couch

Mirror of Apache CouchDB

Stars: ✭ 43 (-96.64%)

Mutual labels: big-data

Cloud Volume

Read and write Neuroglancer datasets programmatically.

Stars: ✭ 63 (-95.07%)

Mutual labels: big-data

Attaca

Robust, distributed version control for large files.

Stars: ✭ 41 (-96.79%)

Mutual labels: big-data

Analysispreservation.cern.ch

Source code for the CERN Analysis Preservation portal

Stars: ✭ 37 (-97.1%)

Mutual labels: big-data

Uproot4

ROOT I/O in pure Python and NumPy.

Stars: ✭ 80 (-93.74%)

Mutual labels: big-data

Labs

Research on distributed system

Stars: ✭ 73 (-94.29%)

Mutual labels: big-data

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-95.15%)

Mutual labels: big-data

Metrics

Measure behavior of Java applications

Stars: ✭ 35 (-97.26%)

Mutual labels: big-data

Kibble 1

Apache Kibble - a tool to collect, aggregate and visualize data about any software project

Stars: ✭ 54 (-95.77%)

Mutual labels: big-data

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-94.6%)

Mutual labels: big-data

Macro ml

Course Website on Macroeconomic Analysis with Machine Learning and Big Data

Stars: ✭ 53 (-95.85%)

Mutual labels: big-data

Attic Predictionio Template Recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 78 (-93.9%)

Mutual labels: big-data

Datumbox Framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Stars: ✭ 1,063 (-16.82%)

Mutual labels: big-data

Hazelcast Cpp Client

Hazelcast IMDG C++ Client

Stars: ✭ 67 (-94.76%)

Mutual labels: big-data

Node Parquet

NodeJS module to access apache parquet format files

Stars: ✭ 46 (-96.4%)

Mutual labels: parquet

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-93.74%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (-19.8%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-94.91%)

Mutual labels: big-data

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (-21.21%)

Mutual labels: parquet

Cookbook

The Data Engineering Cookbook

Stars: ✭ 9,829 (+669.09%)

Mutual labels: big-data

Egads

A Java package to automatically detect anomalies in large scale time-series data

Stars: ✭ 997 (-21.99%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (-11.89%)

Mutual labels: big-data

Esper Tv

Esper instance for TV news analysis

Stars: ✭ 37 (-97.1%)

Mutual labels: big-data

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-93.27%)

Mutual labels: parquet

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-95.15%)

Mutual labels: big-data

Predictionio Template Text Classifier

Text Classification Engine

Stars: ✭ 30 (-97.65%)

Mutual labels: big-data

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (-97.73%)

Mutual labels: parquet

Skymap

High-throughput gene to knowledge mapping through massive integration of public sequencing data.

Stars: ✭ 29 (-97.73%)

Mutual labels: big-data

Bookkeeper

Apache Bookkeeper

Stars: ✭ 1,178 (-7.82%)

Mutual labels: big-data

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (-13.3%)

Mutual labels: parquet

Qcportal

A client interface to the QCArchive Project (read-only image of QCFractal)

Stars: ✭ 29 (-97.73%)

Mutual labels: big-data

Awesome Scalability

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Stars: ✭ 36,688 (+2770.74%)

Mutual labels: big-data

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-95.38%)

Mutual labels: big-data

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+2374.02%)

Mutual labels: big-data

K8s Ingress Claim

An admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.

Stars: ✭ 14 (-98.9%)

Mutual labels: big-data

Iotdb

Apache IoTDB