GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-91.61%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+30.24%)
LabsResearch on distributed system
Stars: ✭ 73 (-97.17%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-95.46%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+1127.41%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-96.23%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-76.24%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+55.4%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-89.91%)
neptune-client📒 Experiment tracking tool and model registry
Stars: ✭ 348 (-86.49%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-48.06%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-69.22%)
BentomlModel Serving Made Easy
Stars: ✭ 3,064 (+18.94%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (-78.34%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-94.06%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-94.6%)
cliPolyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-99.3%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-80.32%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (-53.45%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (-27.95%)
Auto ml[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (-39.48%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (-95.23%)
Ros2learnROS 2 enabled Machine Learning algorithms
Stars: ✭ 119 (-95.38%)
Responsible Ai WidgetsThis project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
Stars: ✭ 107 (-95.85%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-95.77%)
Ml Dl ScriptsThe repository provides usefull python scripts for ML and data analysis
Stars: ✭ 119 (-95.38%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-95.77%)
AnndotnetANNdotNET - deep learning tool on .NET Platform.
Stars: ✭ 109 (-95.77%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-95.11%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-95.46%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-95.81%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (-37.15%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-95.81%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+1555.05%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-95.23%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-95.42%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+341.69%)
Yolov5YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+673.06%)
DeephyperDeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks
Stars: ✭ 117 (-95.46%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (-44.29%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-95.07%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-95.15%)
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (-95.23%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-95.85%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-95.85%)
Dna2vecdna2vec: Consistent vector representations of variable-length k-mers
Stars: ✭ 117 (-95.46%)
Wishlist For RFeatures and tweaks to R that I and others would love to see - feel free to add yours!
Stars: ✭ 106 (-95.89%)
Datasist A Python library for easy data analysis, visualization, exploration and modeling
Stars: ✭ 123 (-95.23%)
KubeflowMachine Learning Toolkit for Kubernetes
Stars: ✭ 11,028 (+328.11%)
GpflowGaussian processes in TensorFlow
Stars: ✭ 1,547 (-39.95%)