Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+3447.29%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+809.3%)

Mutual labels: big-data

dxram

A distributed in-memory key-value storage for billions of small objects.

Stars: ✭ 25 (-80.62%)

Mutual labels: big-data

Redislite

Redis in a python module.

Stars: ✭ 464 (+259.69%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-61.24%)

Mutual labels: big-data

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-51.94%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (+17.83%)

Mutual labels: big-data

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (+251.94%)

Mutual labels: big-data

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-85.27%)

Mutual labels: big-data

Kudu

Mirror of Apache Kudu

Stars: ✭ 1,360 (+954.26%)

Mutual labels: big-data

FlameStream

Distributed stream processing model and its implementation

Stars: ✭ 14 (-89.15%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+16991.47%)

Mutual labels: big-data

ngm

swissgeol.ch gives you insight in geoscientific data - above and below the surface.

Stars: ✭ 23 (-82.17%)

Mutual labels: big-data

Attic Lens

Mirror of Apache Lens

Stars: ✭ 58 (-55.04%)

Mutual labels: big-data

automile-net

Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.

Stars: ✭ 24 (-81.4%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+230.23%)

Mutual labels: big-data

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-87.6%)

Mutual labels: big-data

Amazon S3 Find And Forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (-10.85%)

Mutual labels: big-data

FIW KRT

Families In the WIld: A Kinship Recogntion Toolbox.

Stars: ✭ 18 (-86.05%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+220.93%)

Mutual labels: big-data

shifting

A privacy-focused list of alternatives to mainstream services to help the competition.

Stars: ✭ 31 (-75.97%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-55.81%)

Mutual labels: big-data

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-79.07%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (+217.83%)

Mutual labels: big-data

hazelcast-csharp-client

Hazelcast .NET Client

Stars: ✭ 98 (-24.03%)

Mutual labels: big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-24.81%)

Mutual labels: big-data

corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Stars: ✭ 16 (-87.6%)

Mutual labels: big-data

Decentralized Internet

A SDK/library for decentralized web and distributing computing projects

Stars: ✭ 406 (+214.73%)

Mutual labels: big-data

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-79.84%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-58.14%)

Mutual labels: big-data

merkle-db

High-scalability analytics database built on immutable merkle-trees

Stars: ✭ 44 (-65.89%)

Mutual labels: big-data

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (+201.55%)

Mutual labels: big-data

metriql

The metrics layer for your data. Join us at https://metriql.com/slack

Stars: ✭ 227 (+75.97%)

Mutual labels: big-data

Mobydq

🐳 Tool to automate data quality checks on data pipelines

Stars: ✭ 123 (-4.65%)

Mutual labels: big-data

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (-69.77%)

Mutual labels: big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+2855.81%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-74.42%)

Mutual labels: big-data

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-59.69%)

Mutual labels: big-data

cdp-service

cdp数据平台，帮助企业充分了解客户，实现千人千面的精准营销。

Stars: ✭ 30 (-76.74%)

Mutual labels: big-data

Halodb

A fast, log structured key-value store.

Stars: ✭ 370 (+186.82%)

Mutual labels: big-data

sgd

An R package for large scale estimation with stochastic gradient descent

Stars: ✭ 55 (-57.36%)

Mutual labels: big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+937.21%)

Mutual labels: big-data

ytpriv

YT metadata exporter

Stars: ✭ 28 (-78.29%)

Mutual labels: big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+180.62%)

Mutual labels: big-data

scikit-learn-intelex

Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Stars: ✭ 887 (+587.6%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-62.79%)

Mutual labels: big-data

Bigtop

Mirror of Apache Bigtop

Stars: ✭ 356 (+175.97%)

Mutual labels: big-data

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+1172.87%)

Mutual labels: big-data

Tajo

Mirror of Apache Tajo

Stars: ✭ 128 (-0.78%)

Mutual labels: big-data

Richdem

High-performance Terrain and Hydrology Analysis

Stars: ✭ 127 (-1.55%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (+8073.64%)

Mutual labels: big-data

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+8420.16%)

Mutual labels: big-data

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-51.94%)

Mutual labels: big-data

Fit Sne

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Stars: ✭ 485 (+275.97%)

Mutual labels: big-data

nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

Stars: ✭ 8,196 (+6253.49%)

Mutual labels: big-data

241-300 of 369 similar projects

first

‹

›