Apache Flink is an open source project of The Apache Software Foundation (ASF). The Apache Flink project originated from the Stratosphere research project.

Stars: ✭ 17,781 (+42.13%)

Mutual labels: big-data

ytpriv

YT metadata exporter

Stars: ✭ 28 (-99.78%)

Mutual labels: big-data

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (-63.38%)

Mutual labels: big-data

Hazelcast Nodejs Client

Hazelcast IMDG Node.js Client

Stars: ✭ 124 (-99.01%)

Mutual labels: big-data

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-97.79%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (-90.62%)

Mutual labels: big-data

Datahub

The Metadata Platform for the Modern Data Stack

Stars: ✭ 4,232 (-66.17%)

Mutual labels: big-data

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (-99.03%)

Mutual labels: big-data

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (-76.83%)

Mutual labels: big-data

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Stars: ✭ 887 (-92.91%)

Mutual labels: big-data

bigstatsr

R package for statistical tools with big matrices stored on disk.

Stars: ✭ 139 (-98.89%)

Mutual labels: big-data

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (-99.06%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-99.82%)

Mutual labels: big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-99.89%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (-15.72%)

Mutual labels: big-data

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-99.11%)

Mutual labels: big-data

accumulo-docker

Apache Accumulo Docker

Stars: ✭ 17 (-99.86%)

Mutual labels: big-data

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-99.09%)

Mutual labels: big-data

ibmpairs

open source tools for interaction with IBM PAIRS:

Stars: ✭ 23 (-99.82%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-99.6%)

Mutual labels: big-data

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-99.27%)

Mutual labels: big-data

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-99.1%)

Mutual labels: big-data

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (-93.47%)

Mutual labels: big-data

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-99.71%)

Mutual labels: big-data

vxquery

Mirror of Apache VXQuery

Stars: ✭ 19 (-99.85%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (-87.66%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-99.81%)

Mutual labels: big-data

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (-98.63%)

Mutual labels: big-data

Parquet Format

Apache Parquet

Stars: ✭ 800 (-93.61%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-99.9%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks