PyfunctionalPython library for creating data pipelines with chain functional programming
Stars: ✭ 1,943 (+838.65%)
FlinkxBased on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+1180.68%)
Rangelessc++ LINQ -like library of higher-order functions for data manipulation
Stars: ✭ 148 (-28.5%)
ZumiszUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
Stars: ✭ 178 (-14.01%)
PipcookMachine learning platform for Web developers
Stars: ✭ 2,186 (+956.04%)
DolphinbeatA server that pulls and parses MySQL binlog, pushs change data into different sinks like Kafka.
Stars: ✭ 164 (-20.77%)
HadoopApache Hadoop
Stars: ✭ 12,177 (+5782.61%)
Drone CacheA Drone plugin for caching current workspace files between builds to reduce your build times
Stars: ✭ 194 (-6.28%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-30.43%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-21.26%)
Proposal Smart PipelinesOld archived draft proposal for smart pipelines. Go to the new Hack-pipes proposal at js-choi/proposal-hack-pipes.
Stars: ✭ 177 (-14.49%)
Awesome Decision Tree PapersA collection of research papers on decision, classification and regression trees with implementations.
Stars: ✭ 1,908 (+821.74%)
CoreThe safe post-production pipeline - https://getavalon.github.io/2.0
Stars: ✭ 162 (-21.74%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-32.37%)
Jenkinsdocs Jenkins实践文档 最新站点地址: http://www.idevops.site
Stars: ✭ 200 (-3.38%)
Go spider[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+743%)
Machine Learning ModelsDecision Trees, Random Forest, Dynamic Time Warping, Naive Bayes, KNN, Linear Regression, Logistic Regression, Mixture Of Gaussian, Neural Network, PCA, SVD, Gaussian Naive Bayes, Fitting Data to Gaussian, K-Means
Stars: ✭ 160 (-22.71%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (+725.6%)
ChefboostA Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Stars: ✭ 176 (-14.98%)
KartonDistributed malware processing framework based on Python, Redis and MinIO.
Stars: ✭ 134 (-35.27%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+789.37%)
Spacy Wordnetspacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
Stars: ✭ 156 (-24.64%)
RandomforestexplainerA set of tools to understand what is happening inside a Random Forest
Stars: ✭ 175 (-15.46%)
TipdmTipDM建模平台,开源的数据挖掘工具。
Stars: ✭ 130 (-37.2%)
EctsElastic Crontab System 简单易用的分布式定时任务管理系统
Stars: ✭ 156 (-24.64%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+731.4%)
Scrapy demoall kinds of scrapy demo
Stars: ✭ 128 (-38.16%)
FluidsFluid dynamics component of Chemical Engineering Design Library (ChEDL)
Stars: ✭ 154 (-25.6%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-38.16%)
SpydraEphemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-38.16%)
Pypyrpypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
Stars: ✭ 173 (-16.43%)
NmflibraryMATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (-26.09%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-38.16%)
FpartSort files and pack them into partitions
Stars: ✭ 127 (-38.65%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+921.26%)
Ssh Steps PluginJenkins pipeline steps which provides SSH facilities such as command execution or file transfer for continuous delivery.
Stars: ✭ 183 (-11.59%)
Faas FlowFunction Composition for OpenFaaS
Stars: ✭ 172 (-16.91%)
Javainterview最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Stars: ✭ 154 (-25.6%)
Ml ProjectsML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (-38.65%)
SqueezemetaA complete pipeline for metagenomic analysis
Stars: ✭ 128 (-38.16%)
EmlearnMachine Learning inference engine for Microcontrollers and Embedded devices
Stars: ✭ 154 (-25.6%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-38.65%)
Rnaseq WorkflowA repository for setting up a RNAseq workflow
Stars: ✭ 170 (-17.87%)
Pipeline LivePipeline Extension for Live Trading
Stars: ✭ 154 (-25.6%)
SemsegpipelineA simpler way of reading and augmenting image segmentation data into TensorFlow
Stars: ✭ 126 (-39.13%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-39.61%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+910.63%)
SarekDetect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (-40.1%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-11.59%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+5830.92%)
Metlmito ETL tool
Stars: ✭ 153 (-26.09%)