Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-84.14%)

Mutual labels: big-data

Trafodion

Apache Trafodion

Stars: ✭ 242 (+6.61%)

Mutual labels: big-data

Books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Stars: ✭ 222 (-2.2%)

Mutual labels: big-data

ytpriv

YT metadata exporter

Stars: ✭ 28 (-87.67%)

Mutual labels: big-data

Social-Network-Analysis-in-Python

Social Network Facebook Analysis (Python, Networkx)

Stars: ✭ 26 (-88.55%)

Mutual labels: big-data

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (-4.85%)

Mutual labels: big-data

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+1240.97%)

Mutual labels: big-data

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (-81.06%)

Mutual labels: big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+8.37%)

Mutual labels: big-data

sgd

An R package for large scale estimation with stochastic gradient descent

Stars: ✭ 55 (-75.77%)

Mutual labels: big-data

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+4.41%)

Mutual labels: big-data

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (-73.57%)

Mutual labels: big-data

Nakedtensor

Bare bone examples of machine learning in TensorFlow

Stars: ✭ 2,443 (+976.21%)

Mutual labels: big-data

dbt-spotify-analytics

Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase

Stars: ✭ 92 (-59.47%)

Mutual labels: dbt

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-5.29%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (-15.42%)

Mutual labels: big-data

Calcite

Apache Calcite

Stars: ✭ 2,816 (+1140.53%)

Mutual labels: big-data

Couchdb Docker

Semi-official Apache CouchDB Docker images

Stars: ✭ 194 (-14.54%)

Mutual labels: big-data

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Stars: ✭ 887 (+290.75%)

Mutual labels: big-data

masc

Microsoft's contributions for Spark with Apache Accumulo

Stars: ✭ 20 (-91.19%)

Mutual labels: big-data

Attic Predictionio Sdk Ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (-15.42%)

Mutual labels: big-data

PyRasgo

Helper code to interact with Rasgo via our SDK, PyRasgo

Stars: ✭ 39 (-82.82%)

Mutual labels: dbt

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (-77.97%)

Mutual labels: big-data

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+8.81%)

Mutual labels: big-data

dbt-databricks

A dbt adapter for Databricks.

Stars: ✭ 115 (-49.34%)

Mutual labels: dbt

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+7.93%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-85.46%)

Mutual labels: big-data

Kafka Ui

Open-Source Web GUI for Apache Kafka Management

Stars: ✭ 230 (+1.32%)

Mutual labels: big-data

accumulo-testing

Apache Accumulo Testing

Stars: ✭ 14 (-93.83%)

Mutual labels: big-data

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (+3.52%)

Mutual labels: big-data

mmtf-spark

Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.

Stars: ✭ 20 (-91.19%)

Mutual labels: big-data

Lite Virtual List

Virtual list component library supporting waterfall flow based on vue

Stars: ✭ 223 (-1.76%)

Mutual labels: big-data

bagri

XML/Document DB on top of distributed cache

Stars: ✭ 40 (-82.38%)

Mutual labels: big-data

Usql

U-SQL Examples and Issue Tracking

Stars: ✭ 221 (-2.64%)

Mutual labels: big-data

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-82.82%)

Mutual labels: big-data

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (-4.85%)

Mutual labels: big-data

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (-19.38%)

Mutual labels: big-data

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (-5.73%)

Mutual labels: big-data

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (-44.49%)

Mutual labels: big-data

Attic Predictionio Sdk Python

PredictionIO Python SDK

Stars: ✭ 196 (-13.66%)

Mutual labels: big-data

awesome-dbt

A curated list of awesome dbt resources

Stars: ✭ 520 (+129.07%)

Mutual labels: dbt

Data Science Live Book

An open source book to learn data science, data analysis and machine learning, suitable for all ages!

Stars: ✭ 193 (-14.98%)

Mutual labels: big-data

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (-89.43%)

Mutual labels: big-data

Gun

An open source cybersecurity protocol for syncing decentralized graph data.

Stars: ✭ 15,172 (+6583.7%)

Mutual labels: big-data

acousticbrainz-server

The server components for the AcousticBrainz project

Stars: ✭ 128 (-43.61%)

Mutual labels: big-data

predictionio-sdk-python

PredictionIO Python SDK

Stars: ✭ 199 (-12.33%)

Mutual labels: big-data

Flume

Mirror of Apache Flume

Stars: ✭ 2,200 (+869.16%)

Mutual labels: big-data

Detecting-Malicious-URL-Machine-Learning

No description or website provided.

Stars: ✭ 47 (-79.3%)

Mutual labels: big-data

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-22.03%)

Mutual labels: big-data

Dvid

Distributed, Versioned, Image-oriented Dataservice

Stars: ✭ 174 (-23.35%)

Mutual labels: big-data

predictionio-template-recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 80 (-64.76%)

Mutual labels: big-data

leetspeek

Open and collaborative content from leet hackers!