Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-80.67%)

Mutual labels: big-data

bigtable

TypeScript Bigtable Client with 🔋🔋 included.

Stars: ✭ 13 (-98.98%)

Mutual labels: big-data

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (-80.83%)

Mutual labels: big-data

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-93.74%)

Mutual labels: big-data

Kafka Ui

Open-Source Web GUI for Apache Kafka Management

Stars: ✭ 230 (-82%)

Mutual labels: big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-98.9%)

Mutual labels: big-data

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (-81.61%)

Mutual labels: big-data

Oozie

Mirror of Apache Oozie

Stars: ✭ 602 (-52.9%)

Mutual labels: big-data

Lite Virtual List

Virtual list component library supporting waterfall flow based on vue

Stars: ✭ 223 (-82.55%)

Mutual labels: big-data

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-91.31%)

Mutual labels: big-data

Usql

U-SQL Examples and Issue Tracking

Stars: ✭ 221 (-82.71%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (-19.8%)

Mutual labels: big-data

Real Time Social Media Mining

DevOps pipeline for Real Time Social/Web Mining

Stars: ✭ 22 (-98.28%)

Mutual labels: big-data

predictionio-template-java-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Stars: ✭ 36 (-97.18%)

Mutual labels: big-data

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (-83.26%)

Mutual labels: big-data

Giraph

Mirror of Apache Giraph

Stars: ✭ 569 (-55.48%)

Mutual labels: big-data

Attic Predictionio Sdk Python

PredictionIO Python SDK

Stars: ✭ 196 (-84.66%)

Mutual labels: big-data

ibmpairs

open source tools for interaction with IBM PAIRS:

Stars: ✭ 23 (-98.2%)

Mutual labels: big-data

Data Science Live Book

An open source book to learn data science, data analysis and machine learning, suitable for all ages!

Stars: ✭ 193 (-84.9%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-94.91%)

Mutual labels: big-data

Gun

An open source cybersecurity protocol for syncing decentralized graph data.

Stars: ✭ 15,172 (+1087.17%)

Mutual labels: big-data

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-92.88%)

Mutual labels: big-data

Pachyderm

Reproducible Data Science at Scale!

Stars: ✭ 5,305 (+315.1%)

Mutual labels: big-data

Pretzel

Javascript full-stack framework for Big Data visualisation and analysis

Stars: ✭ 26 (-97.97%)

Mutual labels: big-data

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (-73.16%)

Mutual labels: parquet

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-96.09%)

Mutual labels: big-data

Dvid

Distributed, Versioned, Image-oriented Dataservice

Stars: ✭ 174 (-86.38%)

Mutual labels: big-data

AverageShiftedHistograms.jl

⚡ Lightning fast density estimation in Julia ⚡

Stars: ✭ 52 (-95.93%)

Mutual labels: big-data

Attic Predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Stars: ✭ 12,522 (+879.81%)

Mutual labels: big-data

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (-21.21%)

Mutual labels: parquet

Keyvi

Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 161 (-87.4%)

Mutual labels: big-data

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-98.67%)

Mutual labels: big-data

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+913.85%)

Mutual labels: big-data

Couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

Stars: ✭ 5,166 (+304.23%)

Mutual labels: big-data

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (-88.03%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-98.59%)

Mutual labels: big-data

Fili

Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.

Stars: ✭ 151 (-88.18%)

Mutual labels: big-data

Cookbook

The Data Engineering Cookbook

Stars: ✭ 9,829 (+669.09%)

Mutual labels: big-data

centurion

Kotlin Bigdata Toolkit

Stars: ✭ 320 (-74.96%)

Mutual labels: parquet

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (-88.73%)

Mutual labels: big-data

Arkime

Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.

Stars: ✭ 4,994 (+290.77%)

Mutual labels: big-data

Storm Doc Zh

Apache Storm 官方文档中文版

Stars: ✭ 142 (-88.89%)

Mutual labels: big-data

meepo

异构存储数据迁移

Stars: ✭ 29 (-97.73%)

Mutual labels: parquet

Egads

A Java package to automatically detect anomalies in large scale time-series data

Stars: ✭ 997 (-21.99%)

Mutual labels: big-data

airavata-django-portal

Mirror of Apache Airavata Django Portal

Stars: ✭ 20 (-98.44%)

Mutual labels: big-data

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Stars: ✭ 344 (-73.08%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (-88.11%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-98.98%)

Mutual labels: big-data

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-93.58%)

Mutual labels: big-data

Sparksql Protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Stars: ✭ 82 (-93.58%)

Mutual labels: parquet

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-93.82%)

Mutual labels: big-data

Appdocs

Application Performance Optimization Summary

Stars: ✭ 1,169 (-8.53%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-95.54%)

Mutual labels: big-data

301-360 of 420 similar projects

first

‹

›