MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+229.43%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-90.07%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-60.64%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+109.22%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-46.81%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (-89.01%)
spark-streaming-visualizeSimple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-94.33%)
ctdna-pipelineA simplified pipeline for ctDNA sequencing data analysis
Stars: ✭ 29 (-89.72%)
JT1078Gateway基于Pipeline实现的JT1078Gateway支持TCP/UDP,目前只支持http-flv、ws-flv、hls三种拉流方式
Stars: ✭ 50 (-82.27%)
Dominando-PandasEste repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-92.2%)
pyrealtimeRealtime data processing and plotting pipelines in Python
Stars: ✭ 62 (-78.01%)
pipeline-editorCloud Pipelines Editor is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.
Stars: ✭ 22 (-92.2%)
CreditAn example project that predicts risk of credit card default using a Logistic Regression classifier and a 30,000 sample dataset.
Stars: ✭ 18 (-93.62%)
godot-exporterGodot Engine Automation Pipeline Android – iOS – Linux – MacOS – Windows – HTML5 – Itch.io.
Stars: ✭ 54 (-80.85%)
dropEstPipeline for initial analysis of droplet-based single-cell RNA-seq data
Stars: ✭ 71 (-74.82%)
kedroA Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 6,068 (+2051.77%)
bistroA library to build and execute typed scientific workflows
Stars: ✭ 43 (-84.75%)
connector-xFastest library to load data from DB to DataFrames in Rust and Python
Stars: ✭ 550 (+95.04%)
Rust DataframeA Rust DataFrame implementation, built on Apache Arrow
Stars: ✭ 271 (-3.9%)
spot-termination-exporterPrometheus spot instance exporter to monitor AWS instance termination with Hollowtrees
Stars: ✭ 30 (-89.36%)
metagrafmetaGraf is a opinionated specification for describing a software component and what its requirements are from the runtime environment. The mg command, turns metaGraf specifications into Kubernetes resources, supporting CI, CD and GitOps software delivery.
Stars: ✭ 15 (-94.68%)
cli-property-managerUse this Property Manager CLI to automate Akamai property changes and deployments across many environments.
Stars: ✭ 22 (-92.2%)
HARRecognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.
Stars: ✭ 18 (-93.62%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-8.16%)
coronavirus-statsAutomatically scrape data and statistics on Coronavirus to make them easily accessible in CSV format
Stars: ✭ 47 (-83.33%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-91.13%)
pywedgeMakes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking
Stars: ✭ 49 (-82.62%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-3.55%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.39%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-82.27%)
germline-DNAA BioWDL variantcalling pipeline for germline DNA data. Starting with FASTQ files to produce VCF files. Category:Multi-Sample
Stars: ✭ 21 (-92.55%)
re-moteRe-mote operations using SSH and Re-gent
Stars: ✭ 61 (-78.37%)
skippaSciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-88.3%)
etlM-Lab ingestion pipeline
Stars: ✭ 15 (-94.68%)
proseA python framework to process FITS images. Built for Astronomy.
Stars: ✭ 21 (-92.55%)
TACTIC-HandlerPySide based TACTIC client for maya, nuke, 3dsmax, houdini, etc
Stars: ✭ 67 (-76.24%)
bifrostA stream processing framework for high-throughput applications.
Stars: ✭ 48 (-82.98%)
DNAscanDNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.
Stars: ✭ 36 (-87.23%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-75.89%)
NimdataDataFrame API written in Nim, enabling fast out-of-core data processing
Stars: ✭ 261 (-7.45%)
raccoonPython DataFrame with fast insert and appends
Stars: ✭ 64 (-77.3%)
companionThis repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: ✭ 21 (-92.55%)
ploioSafe, Reliable, and Fast Production Deployments for Kubernetes
Stars: ✭ 11 (-96.1%)
RNASeqRNASeq pipeline
Stars: ✭ 30 (-89.36%)
sfpowerscriptsA build system for modular development in Salesforce, delivered as a sfdx plugin that can be implemented in any CI/CD system of choice
Stars: ✭ 121 (-57.09%)
only-pipeA non-intrusive Python pipeline.
Stars: ✭ 19 (-93.26%)