Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+850%)

Mutual labels: big-data

awesome-dbt

A curated list of awesome dbt resources

Stars: ✭ 520 (+1900%)

Mutual labels: data-engineering

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+384.62%)

Mutual labels: big-data

predictionio-template-recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 80 (+207.69%)

Mutual labels: big-data

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (-7.69%)

Mutual labels: big-data

Vue Virtual Scroll List

⚡️A vue component support big amount data list with high render performance and efficient.

Stars: ✭ 3,201 (+12211.54%)

Mutual labels: big-data

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+1584.62%)

Mutual labels: data-engineering

leetspeek

Open and collaborative content from leet hackers!

Stars: ✭ 11 (-57.69%)

Mutual labels: big-data

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+842.31%)

Mutual labels: big-data

lrmr

Less-Resilient MapReduce framework for Go

Stars: ✭ 32 (+23.08%)

Mutual labels: data-engineering

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (+65.38%)

Mutual labels: big-data

Kafka Ui

Open-Source Web GUI for Apache Kafka Management

Stars: ✭ 230 (+784.62%)

Mutual labels: big-data

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (+803.85%)

Mutual labels: big-data

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+42.31%)

Mutual labels: data-engineering

Lite Virtual List

Virtual list component library supporting waterfall flow based on vue

Stars: ✭ 223 (+757.69%)

Mutual labels: big-data

Usql

U-SQL Examples and Issue Tracking

Stars: ✭ 221 (+750%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (+638.46%)

Mutual labels: big-data

ytpriv

YT metadata exporter

Stars: ✭ 28 (+7.69%)

Mutual labels: big-data

masc

Microsoft's contributions for Spark with Apache Accumulo

Stars: ✭ 20 (-23.08%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (+26.92%)

Mutual labels: big-data

Detecting-Malicious-URL-Machine-Learning

No description or website provided.

Stars: ✭ 47 (+80.77%)

Mutual labels: big-data

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Stars: ✭ 887 (+3311.54%)

Mutual labels: big-data

airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Stars: ✭ 111 (+326.92%)

Mutual labels: data-engineering

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+773.08%)

Mutual labels: big-data

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+11607.69%)

Mutual labels: big-data

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+973.08%)

Mutual labels: data-engineering

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.

Stars: ✭ 2,795 (+10650%)

Mutual labels: big-data

cdp-service

cdp数据平台，帮助企业充分了解客户，实现千人千面的精准营销。

Stars: ✭ 30 (+15.38%)

Mutual labels: big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+846.15%)

Mutual labels: big-data

accumulo-docker

Apache Accumulo Docker

Stars: ✭ 17 (-34.62%)

Mutual labels: big-data

Trafodion

Apache Trafodion

Stars: ✭ 242 (+830.77%)

Mutual labels: big-data

merkle-db

High-scalability analytics database built on immutable merkle-trees

Stars: ✭ 44 (+69.23%)

Mutual labels: big-data

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+811.54%)

Mutual labels: big-data

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (+38.46%)

Mutual labels: big-data

Books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Stars: ✭ 222 (+753.85%)

Mutual labels: big-data

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (+92.31%)

Mutual labels: big-data

Nakedtensor

Bare bone examples of machine learning in TensorFlow

Stars: ✭ 2,443 (+9296.15%)

Mutual labels: big-data

accumulo-testing

Apache Accumulo Testing

Stars: ✭ 14 (-46.15%)

Mutual labels: big-data

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (+50%)

Mutual labels: big-data

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+730.77%)

Mutual labels: big-data

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (+123.08%)

Mutual labels: data-engineering

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (+130.77%)

Mutual labels: big-data

Awkward 0.x

Manipulate arrays of complex data structures as easily as Numpy.

Stars: ✭ 216 (+730.77%)

Mutual labels: big-data

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+726.92%)

Mutual labels: big-data

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-23.08%)

Mutual labels: data-engineering

Helicalinsight

Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (+723.08%)

Mutual labels: big-data

Calcite

Apache Calcite

Stars: ✭ 2,816 (+10730.77%)

Mutual labels: big-data

sgd

An R package for large scale estimation with stochastic gradient descent

Stars: ✭ 55 (+111.54%)

Mutual labels: big-data

bagri

XML/Document DB on top of distributed cache