Hazelcast JetDistributed Stream and Batch Processing
Stars: ✭ 855 (+345.31%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-87.5%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-97.4%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (-49.48%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-21.87%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (+302.08%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-50%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+288.02%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+755.21%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-50.52%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+228.13%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-13.02%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (+219.27%)
OozieMirror of Apache Oozie
Stars: ✭ 602 (+213.54%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-33.33%)
GiraphMirror of Apache Giraph
Stars: ✭ 569 (+196.35%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+2663.02%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (-23.96%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+2590.63%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+2501.04%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+1241.67%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (+164.06%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-58.33%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (+159.9%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-7.81%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+152.6%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-58.85%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+2328.13%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-33.85%)
Conjure UpDeploying complex solutions, magically.
Stars: ✭ 454 (+136.46%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+127.08%)
MetamodelMirror of Apache Metamodel
Stars: ✭ 143 (-25.52%)
LabsResearch on distributed system
Stars: ✭ 73 (-61.98%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (+114.06%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (+113.54%)
FluoApache Fluo
Stars: ✭ 159 (-17.19%)
IgniteApache Ignite
Stars: ✭ 4,027 (+1997.4%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (+508.85%)
HiveApache Hive
Stars: ✭ 4,031 (+1999.48%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (+503.13%)
GunAn open source cybersecurity protocol for syncing decentralized graph data.
Stars: ✭ 15,172 (+7802.08%)
FlumeMirror of Apache Flume
Stars: ✭ 2,200 (+1045.83%)
Attic PredictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,522 (+6421.88%)
FiliEasily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Stars: ✭ 151 (-21.35%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+344.79%)