osm-parquetizerA converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (-38.26%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1307.83%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+9068.7%)
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-82.61%)
sparklygraphsOld repo for R interface for GraphFrames
Stars: ✭ 13 (-88.7%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+0%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (-15.65%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-66.09%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+33.04%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-0.87%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-17.39%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+26.96%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1213.91%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-88.7%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-30.43%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-1.74%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-31.3%)
BookkeeperApache Bookkeeper
Stars: ✭ 1,178 (+924.35%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-49.57%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-40%)
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-78.26%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+10.43%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1270.43%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-43.48%)
beam-siteApache Beam Site
Stars: ✭ 28 (-75.65%)
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+9.57%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.13%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-49.57%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-79.13%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-53.04%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-84.35%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-54.78%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-58.26%)
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-68.7%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+791.3%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-53.04%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1242.61%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-4.35%)
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+429.57%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-62.61%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-5.22%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-6.96%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-61.74%)