All Projects → sparklanes → Similar Projects or Alternatives

744 Open source projects that are alternatives of or similar to sparklanes

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (+41.18%)

Mutual labels: etl

A hands-on DevOps course covering the culture, methods and repeated practices of modern software development involving Packer, Vagrant, VirtualBox, Ansible, Kubernetes, K3s, MetalLB, Traefik, Docker-Compose, Docker, Taiga, GitLab, Drone CI, SonarQube, Selenium, InSpec, Alpine 3.10, Ubuntu-bionic, CentOS 7...

Stars: ✭ 196 (+1052.94%)

Mutual labels: pipeline

cpp-can-isotp

C++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.

Stars: ✭ 14 (-17.65%)

Mutual labels: etl

rec-core

Data pipelining service

Stars: ✭ 19 (+11.76%)

Mutual labels: data-processing

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (+29.41%)

Mutual labels: etl

Pipeline.rs

☔️ => ⛅️ => ☀️

Stars: ✭ 188 (+1005.88%)

Mutual labels: pipeline

open-semantic-desktop-search

Virtual Machine for Desktop Search with Open Semantic Search

Stars: ✭ 22 (+29.41%)

Mutual labels: etl

architect big data solutions with spark

code, labs and lectures for the course

Stars: ✭ 40 (+135.29%)

Mutual labels: etl

gamechanger-data

GAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements

Stars: ✭ 17 (+0%)

Mutual labels: etl

Zumis

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs

Stars: ✭ 178 (+947.06%)

Mutual labels: pipeline

cardano-py

Python3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)

Stars: ✭ 17 (+0%)

Mutual labels: etl

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+235.29%)

Mutual labels: etl

spdr-etf-holdings

ETL for the SPDR ETF holdings XLS documents

Stars: ✭ 14 (-17.65%)

Mutual labels: etl

Pypyr

pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

Stars: ✭ 173 (+917.65%)

Mutual labels: pipeline

TEAM

The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.

Stars: ✭ 27 (+58.82%)

Mutual labels: etl

stargate

An Apache Pulsar client written in Elixir

Stars: ✭ 33 (+94.12%)

Mutual labels: data-processing

koza

Data transformation framework for LinkML data models

Stars: ✭ 21 (+23.53%)

Mutual labels: etl

Rnaseq Workflow

A repository for setting up a RNAseq workflow

Stars: ✭ 170 (+900%)

Mutual labels: pipeline

oesophagus

Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)

Stars: ✭ 12 (-29.41%)

Mutual labels: etl

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (+88.24%)

Mutual labels: pyspark

DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用（ThriftServer，HttpServer）分布式运行（DataX on YARN）功能

Stars: ✭ 130 (+664.71%)

Mutual labels: etl

Plex

Open Source Pipeline for Maya, Houdini, 3ds Max and Nuke .

Stars: ✭ 170 (+900%)

Mutual labels: pipeline

MIPS-pipeline-processor

A pipelined implementation of the MIPS processor featuring hazard detection as well as forwarding

Stars: ✭ 92 (+441.18%)

Mutual labels: pipeline

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+1076.47%)

Mutual labels: pyspark

Unity resources

A list of resources and tutorials for those doing programming in Unity.

Stars: ✭ 170 (+900%)

Mutual labels: pipeline

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+876.47%)

Mutual labels: pyspark

Sparkora

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

Stars: ✭ 51 (+200%)

Mutual labels: pyspark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+13564.71%)

Mutual labels: pyspark

Cloud Dev

云研发，是一种生于云上的闭环 + 代码化的软件开发方式。它可以让业务人员、开发人员、运营人员等在同一个云端共同协作、透明化地完成整个软件的生命周期（需求、设计、编码、构建、部署、运营），而非相互隔离，又或者是借助于多个软件才能完成工作。

Stars: ✭ 164 (+864.71%)

Mutual labels: pipeline

bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

Stars: ✭ 51 (+200%)

Mutual labels: pipeline

Cc Pyspark

Process Common Crawl data with Python and Spark

Stars: ✭ 147 (+764.71%)

Mutual labels: pyspark

Operator

Kubernetes operator to manage installation, updation and uninstallation of tektoncd projects (pipeline, …)

Stars: ✭ 161 (+847.06%)

Mutual labels: pipeline

Repo 2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Stars: ✭ 133 (+682.35%)

Mutual labels: pyspark

ngs pipeline

Exome/Capture/RNASeq Pipeline Implementation using snakemake

Stars: ✭ 40 (+135.29%)

Mutual labels: pipeline

Pyspark Cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

Stars: ✭ 108 (+535.29%)

Mutual labels: pyspark

Spacy Wordnet

spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

Stars: ✭ 156 (+817.65%)

Mutual labels: pipeline

Pyspark Stubs

Apache (Py)Spark type annotations (stub files).

Stars: ✭ 98 (+476.47%)

Mutual labels: pyspark

Apos.Content

Content builder library for MonoGame.

Stars: ✭ 14 (-17.65%)

Mutual labels: pipeline

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+7770.59%)

Mutual labels: pyspark

Ects

Elastic Crontab System 简单易用的分布式定时任务管理系统

Stars: ✭ 156 (+817.65%)

Mutual labels: pipeline

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (+435.29%)

Mutual labels: pyspark

ImcSegmentationPipeline

A pixel classification based multiplexed image segmentation pipeline

Stars: ✭ 62 (+264.71%)

Mutual labels: pipeline

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (+276.47%)

Mutual labels: pyspark

Open Solution Toxic Comments

Open solution to the Toxic Comment Classification Challenge

Stars: ✭ 154 (+805.88%)

Mutual labels: pipeline

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+6417.65%)

Mutual labels: pyspark

MTBseq source

MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.

Stars: ✭ 26 (+52.94%)

Mutual labels: pipeline

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+5700%)

Mutual labels: pyspark

STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA

Forecast stock prices using machine learning approach. A time series analysis. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks

Stars: ✭ 94 (+452.94%)

Mutual labels: pipeline

proposal-hack-pipes

Old specification for Hack pipes in JavaScript. Please go to the new specification.

Stars: ✭ 87 (+411.76%)

Mutual labels: pipeline

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-17.65%)

Mutual labels: pyspark

needlestack

Multi-sample somatic variant caller

Stars: ✭ 45 (+164.71%)

Mutual labels: pipeline

katana-skipper

Simple and flexible ML workflow engine

Stars: ✭ 234 (+1276.47%)

Mutual labels: pipeline

Motorway

Cloud ready pure-python streaming data pipeline library

Stars: ✭ 150 (+782.35%)

Mutual labels: pipeline

flamingo

FreeCAD - flamingo workbench

Stars: ✭ 30 (+76.47%)

Mutual labels: pipeline

kafka-connect-datagen

A Kafka Connect source connector that generates data for tests

Stars: ✭ 27 (+58.82%)

Mutual labels: etl

rivery cli

Rivery CLI

Stars: ✭ 16 (-5.88%)

Mutual labels: etl

mech

🦾 Main repository for the Mech programming language. Start here!

Stars: ✭ 135 (+694.12%)

Mutual labels: data-processing

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-17.65%)

Mutual labels: etl

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (+41.18%)

Mutual labels: etl

prospectr

R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data

Stars: ✭ 26 (+52.94%)

Mutual labels: preprocessing

601-660 of 744 similar projects

first

‹

›