cmip6 preprocessingAnalysis ready CMIP6 data in python the easy way with pangeo tools.
Stars: ✭ 126 (+641.18%)
bookmarksA PySide2 based file and asset manager for animation and CG productions.
Stars: ✭ 33 (+94.12%)
timit-preprocessorExtract mfcc vectors and phones from TIMIT dataset
Stars: ✭ 14 (-17.65%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (+252.94%)
wikirepoPython based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (+94.12%)
krawlerA minimalist (geospatial) ETL
Stars: ✭ 51 (+200%)
nanoflow🔬 De novo assembly of nanopore reads using nextflow
Stars: ✭ 20 (+17.65%)
nodejs-docker-exampleAn example of how to run a Node.js project in Docker in a Buildkite pipeline
Stars: ✭ 39 (+129.41%)
lightflowA lightweight, distributed workflow system
Stars: ✭ 67 (+294.12%)
jgit-spark-connectorjgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+317.65%)
modelboxA high performance, high expansion, easy to use framework for AI application. 为AI应用的开发者提供一套统一的高性能、易用的编程框架,快速基于AI全栈服务、开发跨端边云的AI行业应用。
Stars: ✭ 48 (+182.35%)
nemesystGeneralised and highly customisable, hybrid-parallelism, database based, deep learning framework.
Stars: ✭ 17 (+0%)
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+500%)
eventkitEvent-driven data pipelines
Stars: ✭ 94 (+452.94%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (+17.65%)
towheeTowhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+4729.41%)
lncpipeUNDER DEVELOPMENT--- Analysis of long non-coding RNAs from RNA-seq datasets
Stars: ✭ 24 (+41.18%)
krshA declarative KubeFlow Management Tool
Stars: ✭ 127 (+647.06%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+64.71%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (+35.29%)
ECG analysisNo description or website provided.
Stars: ✭ 32 (+88.24%)
makepipeTools for constructing simple make-like pipelines in R.
Stars: ✭ 23 (+35.29%)
wrangleA data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-11.76%)
pipeFunctional Pipeline in Go
Stars: ✭ 30 (+76.47%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (+64.71%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+7847.06%)
tracemlEngine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Stars: ✭ 445 (+2517.65%)
nextNEOpinextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Stars: ✭ 42 (+147.06%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+176.47%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+123.53%)
assume-role-arn🤖🎩assume-role-arn allows you to easily assume an AWS IAM role in your CI/CD pipelines, without worrying about external dependencies.
Stars: ✭ 54 (+217.65%)
django-calaccess-raw-dataA Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+258.82%)
frizzleThe magic message bus
Stars: ✭ 14 (-17.65%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+47.06%)
thainThain is a distributed flow schedule platform.
Stars: ✭ 81 (+376.47%)
modelscriptREPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Stars: ✭ 40 (+135.29%)
html-pipelineHTML processing filters and utilities in Go version
Stars: ✭ 18 (+5.88%)
redundansRedundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (+429.41%)
Automlpipeline.jlA package that makes it trivial to create and evaluate machine learning pipeline architectures.
Stars: ✭ 223 (+1211.76%)
get phylomarkersA pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+100%)
RedispipeHigh-throughput Redis client for Go with implicit pipelining
Stars: ✭ 215 (+1164.71%)
NVTabularNVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+4588.24%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+294.12%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (+23.53%)
langx-javaJava tools, helper, common utilities. A replacement of guava, apache-commons, hutool
Stars: ✭ 50 (+194.12%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (+23.53%)
hicAnalysis of Chromosome Conformation Capture data (Hi-C)
Stars: ✭ 45 (+164.71%)
chronicle-etl📜 A CLI toolkit for extracting and working with your digital history
Stars: ✭ 78 (+358.82%)
dmriprepdMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.
Stars: ✭ 55 (+223.53%)