storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-99.71%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (-93.55%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-93.43%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (-20.69%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (-12.45%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (-39.45%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-92.08%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-94.25%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-94.46%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-93.23%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (-17.96%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-93.49%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (-10.18%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-93.82%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (-91.95%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (-94.12%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-89.26%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (-94.35%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-92.3%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-90.96%)
StroomStroom is a highly scalable data storage, processing and analysis platform.
Stars: ✭ 344 (-94.54%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (-25.96%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (-94.76%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (-93.08%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-90.27%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (-93.47%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-91.62%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-93.49%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (-18.23%)
IgniteApache Ignite
Stars: ✭ 4,027 (-36.05%)
OozieMirror of Apache Oozie
Stars: ✭ 602 (-90.44%)
HiveApache Hive
Stars: ✭ 4,031 (-35.99%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-91.95%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-94.27%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-88.44%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-94.25%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (-27.33%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (-40.5%)
GiraphMirror of Apache Giraph
Stars: ✭ 569 (-90.96%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-92.63%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-90%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (-94.7%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-92.76%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (-15.75%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-94.73%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-92.79%)
TezApache Tez
Stars: ✭ 313 (-95.03%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (-95.05%)
Conjure UpDeploying complex solutions, magically.
Stars: ✭ 454 (-92.79%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-88.17%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+4.62%)