Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.

Stars: ✭ 214 (+197.22%)

Mutual labels: big-data, nosql

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-73.61%)

Mutual labels: pyspark, graphframes

Archived-SANSA-Query

SANSA Query Layer

Stars: ✭ 31 (-56.94%)

Mutual labels: distributed-computing, partitioning

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-45.83%)

Mutual labels: big-data, distributed-computing

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+3177.78%)

Mutual labels: big-data, dataframe

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-59.72%)

Mutual labels: big-data, pyspark

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+59.72%)

Mutual labels: big-data, pyspark

javaer-mind

Java 程序员进阶学习的思维导图

Stars: ✭ 66 (-8.33%)

Mutual labels: big-data, nosql

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (-30.56%)

Mutual labels: big-data, pyspark

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-55.56%)

Mutual labels: pyspark, rdd

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (+226.39%)

Mutual labels: big-data, dataframe

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-62.5%)

Mutual labels: big-data, mapreduce

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-45.83%)

Mutual labels: big-data, pyspark

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+4559.72%)

Mutual labels: big-data, pyspark

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-65.28%)

Mutual labels: distributed-computing, pyspark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+30522.22%)

Mutual labels: big-data, mapreduce

Hazelcast

Open-source distributed computation and storage platform

Stars: ✭ 4,662 (+6375%)

Mutual labels: big-data, distributed-computing

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+7556.94%)

Mutual labels: big-data, nosql

Iotdb

Apache IoTDB

Stars: ✭ 1,221 (+1595.83%)

Mutual labels: big-data, nosql

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-66.67%)

Mutual labels: big-data, pyspark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+26.39%)

Mutual labels: big-data, pyspark

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+15165.28%)

Mutual labels: big-data, mapreduce

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+1758.33%)

Mutual labels: big-data, pyspark

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+1323.61%)

Mutual labels: big-data, distributed-computing

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-1.39%)

Mutual labels: big-data, mapreduce

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+1190.28%)

Mutual labels: mapreduce, dataframe

ParallelUtilities.jl

Fast and easy parallel mapreduce on HPC clusters

Stars: ✭ 28 (-61.11%)

Mutual labels: distributed-computing, mapreduce

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+3926.39%)

Mutual labels: big-data, pyspark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+200%)

Mutual labels: big-data, pyspark

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+1218.06%)

Mutual labels: distributed-computing, mapreduce

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+76.39%)

Mutual labels: big-data, distributed-computing

Nakedtensor

Bare bone examples of machine learning in TensorFlow

Stars: ✭ 2,443 (+3293.06%)

Mutual labels: big-data, distributed-computing

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+229.17%)

Mutual labels: big-data, distributed-computing

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-72.22%)

Mutual labels: big-data, mapreduce

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-34.72%)

Mutual labels: big-data, rdd

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (+633.33%)

Mutual labels: big-data, distributed-computing

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+401.39%)

Mutual labels: big-data, distributed-computing

Beeva Best Practices

Best Practices and Style Guides in BEEVA

Stars: ✭ 335 (+365.28%)

Mutual labels: big-data, nosql

isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Stars: ✭ 28 (-61.11%)

Mutual labels: pyspark, dataframe

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+58.33%)

Mutual labels: big-data, mapreduce

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+4127.78%)

Mutual labels: big-data, dataframe

merkle-db

High-scalability analytics database built on immutable merkle-trees

Stars: ✭ 44 (-38.89%)

Mutual labels: big-data, nosql

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (+16.67%)

Mutual labels: pyspark

cdp-service

cdp数据平台，帮助企业充分了解客户，实现千人千面的精准营销。

Stars: ✭ 30 (-58.33%)

Mutual labels: big-data

mesos-pinspider

A framework called "pinspider" on Apache mesos, to get basic user information from a pinterest page of a user.

Stars: ✭ 18 (-75%)

Mutual labels: distributed-computing

dynamodb-onetable

DynamoDB access and management for one table designs with NodeJS

Stars: ✭ 508 (+605.56%)

Mutual labels: nosql

elearning

elearning linux/mac/db/cache/server/tools/人工智能

Stars: ✭ 72 (+0%)

Mutual labels: nosql

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (-30.56%)

Mutual labels: big-data

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (-19.44%)

Mutual labels: pyspark

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+215.28%)

Mutual labels: big-data

sgd

An R package for large scale estimation with stochastic gradient descent