Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+1582.35%)

Mutual labels: big-data

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-92.65%)

Mutual labels: big-data

Redislite

Redis in a python module.

Stars: ✭ 464 (+70.59%)

Mutual labels: big-data

Attic Predictionio Sdk Ruby

PredictionIO Ruby SDK

Stars: ✭ 192 (-29.41%)

Mutual labels: big-data

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (+66.91%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+331.25%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+8005.88%)

Mutual labels: big-data

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (-32.72%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+56.62%)

Mutual labels: big-data

predictionio

PredictionIO, a machine learning server for developers and ML engineers.

Stars: ✭ 12,510 (+4499.26%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+52.21%)

Mutual labels: big-data

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-34.93%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (+50.74%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-81.62%)

Mutual labels: big-data

Decentralized Internet

A SDK/library for decentralized web and distributing computing projects

Stars: ✭ 406 (+49.26%)

Mutual labels: big-data

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (-37.13%)

Mutual labels: big-data

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (+43.01%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-93.38%)

Mutual labels: big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+1301.84%)

Mutual labels: big-data

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (-38.6%)

Mutual labels: big-data

Halodb

A fast, log structured key-value store.

Stars: ✭ 370 (+36.03%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (-44.12%)

Mutual labels: big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+33.09%)

Mutual labels: big-data

Fluo

Apache Fluo

Stars: ✭ 159 (-41.54%)

Mutual labels: big-data

Bigtop

Mirror of Apache Bigtop

Stars: ✭ 356 (+30.88%)

Mutual labels: big-data

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-89.34%)

Mutual labels: big-data

Devops Roadmap

DevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.

Stars: ✭ 349 (+28.31%)

Mutual labels: big-data

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-44.12%)

Mutual labels: big-data

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Stars: ✭ 344 (+26.47%)

Mutual labels: big-data

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-93.01%)

Mutual labels: big-data

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (+21.32%)

Mutual labels: big-data

Datasciencevm

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

Stars: ✭ 153 (-43.75%)

Mutual labels: big-data

Beeva Best Practices

Best Practices and Style Guides in BEEVA

Stars: ✭ 335 (+23.16%)

Mutual labels: big-data

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (-87.5%)

Mutual labels: big-data

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (+14.71%)

Mutual labels: big-data

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-44.85%)

Mutual labels: big-data

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (+13.6%)

Mutual labels: big-data

FlameStream

Distributed stream processing model and its implementation

Stars: ✭ 14 (-94.85%)

Mutual labels: big-data

Helix

Mirror of Apache Helix

Stars: ✭ 304 (+11.76%)

Mutual labels: big-data

100daysofmlcode

My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.

Stars: ✭ 146 (-46.32%)

Mutual labels: big-data

Cloudbreak

A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (+10.66%)

Mutual labels: big-data

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (-63.6%)

Mutual labels: big-data

Metamodel

Mirror of Apache Metamodel

Stars: ✭ 143 (-47.43%)

Mutual labels: big-data

Datahub

The Metadata Platform for the Modern Data Stack