All Projects → gallia-core → Similar Projects or Alternatives

407 Open source projects that are alternatives of or similar to gallia-core

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+1290.91%)

Mutual labels: etl, data-engineering, feature-engineering

FIFA-2019-Analysis

This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations

Stars: ✭ 28 (-36.36%)

Mutual labels: data-manipulation, feature-engineering

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+186.36%)

Mutual labels: etl, data-engineering

naas

⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment

Stars: ✭ 219 (+397.73%)

Mutual labels: etl, data-transformation

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (+6.82%)

Mutual labels: etl, data-engineering

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-54.55%)

Mutual labels: etl, data-engineering

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Stars: ✭ 64 (+45.45%)

Mutual labels: etl, data-engineering

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+1338.64%)

Mutual labels: etl, data-engineering

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+5320.45%)

Mutual labels: etl, data-engineering

Benthos

Fancy stream processing made operationally mundane

Stars: ✭ 3,705 (+8320.45%)

Mutual labels: etl, data-engineering

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+227.27%)

Mutual labels: etl, data-engineering

fastverse

An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

Stars: ✭ 123 (+179.55%)

Mutual labels: data-transformation, data-manipulation

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+29.55%)

Mutual labels: etl, data-engineering

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (+79.55%)

Mutual labels: etl, data-engineering

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+306.82%)

Mutual labels: etl, data-engineering

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (+47.73%)

Mutual labels: etl, data-engineering

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (-50%)

Mutual labels: etl, data-engineering

Feast

Feature Store for Machine Learning

Stars: ✭ 2,576 (+5754.55%)

Mutual labels: data-engineering, feature-engineering

zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Stars: ✭ 655 (+1388.64%)

Mutual labels: etl, data-transformation

AirflowDataPipeline

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (-45.45%)

Mutual labels: etl, data-engineering

etl manager

A python package to create a database on the platform using our moj data warehousing framework

Stars: ✭ 14 (-68.18%)

Mutual labels: etl, data-engineering

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+79.55%)

Mutual labels: etl, data-engineering

Dataform

Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift

Stars: ✭ 342 (+677.27%)

Mutual labels: etl, data-engineering

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (-15.91%)

Mutual labels: etl, data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+534.09%)

Mutual labels: etl, data-engineering

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+11079.55%)

Mutual labels: etl, data-engineering

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (+75%)

Mutual labels: etl, data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+20.45%)

Mutual labels: etl, data-engineering

sql-to-redis

🔄 Simple tool for ETL. From SQL to Redis.

Stars: ✭ 18 (-59.09%)

Mutual labels: etl

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-52.27%)

Mutual labels: etl

Feature-Engineering-for-Fraud-Detection

Implementation of feature engineering from Feature engineering strategies for credit card fraud

Stars: ✭ 31 (-29.55%)

Mutual labels: feature-engineering

PDAP-Scrapers

Code relating to scraping public police data.

Stars: ✭ 72 (+63.64%)

Mutual labels: etl

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-45.45%)

Mutual labels: etl

DQCS

数据质量控制系统

Stars: ✭ 34 (-22.73%)

Mutual labels: etl

h4sci-course

ETH PhD Program course

Stars: ✭ 19 (-56.82%)

Mutual labels: data-engineering

neon-workshop

A Pachyderm deep learning tutorial for conference workshops

Stars: ✭ 19 (-56.82%)

Mutual labels: data-engineering

50-days-of-Statistics-for-Data-Science

This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.

Stars: ✭ 19 (-56.82%)

Mutual labels: feature-engineering

nasdaq-symbols

ETL for the NASDAQ symbol file

Stars: ✭ 13 (-70.45%)

Mutual labels: etl

autoencoders tensorflow

Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.

Stars: ✭ 66 (+50%)

Mutual labels: feature-engineering

hrv-analysis

Package for Heart Rate Variability analysis in Python

Stars: ✭ 225 (+411.36%)

Mutual labels: feature-engineering

wrangle

A data transformation package for deep learning with Autonomio, Keras and TensorFlow.

Stars: ✭ 15 (-65.91%)

Mutual labels: etl

pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 970 (+2104.55%)

Mutual labels: data-engineering

tutorials

Short programming tutorials pertaining to data analysis.

Stars: ✭ 14 (-68.18%)

Mutual labels: data-transformation

Quora-Paraphrase-Question-Identification

Paraphrase question identification using Feature Fusion Network (FFN).

Stars: ✭ 19 (-56.82%)

Mutual labels: feature-engineering

covid-19

Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.

Stars: ✭ 14 (-68.18%)

Mutual labels: etl

viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Stars: ✭ 110 (+150%)

Mutual labels: data-engineering

mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).

Stars: ✭ 32 (-27.27%)

Mutual labels: etl

OpenKettleWebUI

一款基于kettle的数据处理web调度控制平台，支持文档资源库和数据库资源库，通过web平台控制kettle数据转换，可作为中间件集成到现有系统中

Stars: ✭ 138 (+213.64%)

Mutual labels: etl

dominance-analysis

This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.

Stars: ✭ 111 (+152.27%)

Mutual labels: feature-engineering

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (-13.64%)

Mutual labels: etl

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (+52.27%)

Mutual labels: etl

python mozetl

ETL jobs for Firefox Telemetry