VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
OodtMirror of Apache OODT
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
TrckQuery engine for TrailDB
TraildbTrailDB is an efficient tool for storing and querying series of events
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
AttacaRobust, distributed version control for large files.
EgadsA Java package to automatically detect anomalies in large scale time-series data
Esper TvEsper instance for TV news analysis
MetricsMeasure behavior of Java applications
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
SparkApache Spark - A unified analytics engine for large-scale data processing
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Dremio OssDremio - the missing link in modern data
SparkjniA heterogeneous Apache Spark framework.
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
SqoopMirror of Apache Sqoop
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
StormMirror of Apache Storm
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
CythonThe most widely used Python to C compiler
SamzaMirror of Apache Samza
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
SdcIntel® Scalable Dataframe Compiler for Pandas*
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
OozieMirror of Apache Oozie
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
ScannerEfficient video analysis at scale
NipypeWorkflows and interfaces for neuroimaging packages
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
BeamApache Beam is a unified programming model for Batch and Streaming
MagellanGeo Spatial Data Analytics on Spark