This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+642.61%)

Mutual labels: big-data

predictionio-sdk-ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (+66.96%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-76.52%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+10.43%)

Mutual labels: big-data

proxima-platform

The Proxima platform.

Stars: ✭ 17 (-85.22%)

Mutual labels: apache-spark

Ambari

Mirror of Apache Ambari

Stars: ✭ 1,576 (+1270.43%)

Mutual labels: big-data

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-83.48%)

Mutual labels: big-data

spark-connector

A connector for Apache Spark to access Exasol

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (+610.43%)

Mutual labels: big-data

pyspark-for-data-processing

Code for my presentation: Using PySpark to Process Boat Loads of Data

Stars: ✭ 20 (-82.61%)

Mutual labels: pyspark

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (+584.35%)

Mutual labels: big-data

masc

Microsoft's contributions for Spark with Apache Accumulo

Stars: ✭ 20 (-82.61%)

Mutual labels: big-data

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-79.13%)

Mutual labels: apache-spark

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+5628.7%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (+487.83%)

Mutual labels: big-data

spark-root

Apache Spark Data Source for ROOT File Format

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (+441.74%)

Mutual labels: big-data

spark-dgraph-connector

A connector for Apache Spark and PySpark to Dgraph databases.

Stars: ✭ 36 (-68.7%)

Mutual labels: pyspark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+4818.26%)

Mutual labels: big-data

pulsar-adapters

Apache Pulsar Adapters

Stars: ✭ 18 (-84.35%)

Mutual labels: apache-spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+4693.91%)

Mutual labels: big-data

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+2546.96%)

Mutual labels: big-data

Scanner

Efficient video analysis at scale

Stars: ✭ 569 (+394.78%)

Mutual labels: big-data

nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

Stars: ✭ 8,196 (+7026.96%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (+384.35%)

Mutual labels: big-data

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.

Stars: ✭ 2,795 (+2330.43%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-79.13%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+1242.61%)

Mutual labels: big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+4377.39%)

Mutual labels: big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+113.91%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+340.87%)

Mutual labels: big-data

Real Time Social Media Mining

DevOps pipeline for Real Time Social/Web Mining

Stars: ✭ 22 (-80.87%)

Mutual labels: big-data

Stream Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+3879.13%)

Mutual labels: big-data

Trafodion

Apache Trafodion

Stars: ✭ 242 (+110.43%)

Mutual labels: big-data

Redislite

Redis in a python module.

Stars: ✭ 464 (+303.48%)

Mutual labels: big-data

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (-17.39%)

Mutual labels: big-data

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (+294.78%)

Mutual labels: big-data

Selinon

An advanced distributed task flow management on top of Celery

Stars: ✭ 237 (+106.09%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+19072.17%)

Mutual labels: big-data

airavata-django-portal

Mirror of Apache Airavata Django Portal

Stars: ✭ 20 (-82.61%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+270.43%)

Mutual labels: big-data

Books

整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据、推荐系统、数据库、数据挖掘、机器学习、深度学习、算法等。

Stars: ✭ 222 (+93.04%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+260%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-59.13%)

Mutual labels: big-data

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-4.35%)

Mutual labels: big-data

spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Stars: ✭ 609 (+429.57%)

Mutual labels: apache-spark

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-62.61%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-5.22%)

Mutual labels: big-data

Attic Predictionio Sdk Java

PredictionIO Java SDK