MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-49.82%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-55.81%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (-37.94%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+532.31%)
IgniteApache Ignite
Stars: ✭ 4,027 (+392.9%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-24.97%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+358.63%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-40.64%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (-46.63%)
TezApache Tez
Stars: ✭ 313 (-61.69%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+549.33%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (-49.69%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-22.89%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+511.26%)
HiveApache Hive
Stars: ✭ 4,031 (+393.39%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-8.81%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-55.69%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-38.92%)
OozieMirror of Apache Oozie
Stars: ✭ 602 (-26.32%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (-59.12%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+470.62%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+2598.65%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (-61.81%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-30.35%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-47.86%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-17.26%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-49.33%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-31.82%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-49.82%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+670.75%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (-50.31%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-35.37%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-52.39%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-23.75%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+366.71%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+530.23%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (-54.71%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-3.67%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-55.69%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-37.94%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (-56.43%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+592.29%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-57.28%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+460.1%)
StroomStroom is a highly scalable data storage, processing and analysis platform.
Stars: ✭ 344 (-57.89%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+706.36%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (-59.61%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-43.21%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+574.79%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-44.43%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (-5.51%)
GiraphMirror of Apache Giraph
Stars: ✭ 569 (-30.35%)
Conjure UpDeploying complex solutions, magically.
Stars: ✭ 454 (-44.43%)