Koop🔮 Transform, query, and download geospatial data on the web.
Stars: ✭ 505 (+1478.13%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+3634.38%)
Linq2dbLinq to database provider.
Stars: ✭ 2,211 (+6809.38%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (+125%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+1365.63%)
SmartcodeSmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Stars: ✭ 464 (+1350%)
Usaspending ApiServer application to serve U.S. federal spending data via a RESTful API
Stars: ✭ 166 (+418.75%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+665.63%)
DiscreetlyETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (+87.5%)
Etl unicorn数据可视化, 数据挖掘, 数据处理 ETL
Stars: ✭ 156 (+387.5%)
Bentools EtlPHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Stars: ✭ 45 (+40.63%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-34.37%)
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (+1321.88%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+1121.88%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+7353.13%)
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (+1071.88%)
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+2887.5%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+634.38%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-34.37%)
Dswarm Backoffice WebThe backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 11 (-65.62%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-40.62%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+5653.13%)
Etl2pcapngUtility that converts an .etl file containing a Windows network packet capture into .pcapng format.
Stars: ✭ 228 (+612.5%)
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Stars: ✭ 732 (+2187.5%)
Reddit DetectivePlay detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (+303.13%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+1821.88%)
wikirepoPython based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (+3.13%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+1621.88%)
AistoreAIStore: scalable storage for AI applications
Stars: ✭ 367 (+1046.88%)
Bulk WriterProvides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+556.25%)
KibaData processing & ETL framework for Ruby
Stars: ✭ 1,618 (+4956.25%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+1062.5%)
Webkettle基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Stars: ✭ 334 (+943.75%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-37.5%)
openmrs-fhir-analyticsA collection of tools for extracting FHIR resources and analytics services on top of that data.
Stars: ✭ 55 (+71.88%)
ExtractA cross-platform command line tool for parallelised content extraction and analysis.
Stars: ✭ 188 (+487.5%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+262.5%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+12709.38%)
RikoA Python stream processing engine modeled after Yahoo! Pipes
Stars: ✭ 1,571 (+4809.38%)
CqlCategorical Query Language IDE
Stars: ✭ 196 (+512.5%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+968.75%)
Sentinel CrawlerXenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器,分布式爬虫
Stars: ✭ 118 (+268.75%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+815.63%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+11478.13%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+750%)
Mongo EsA MongoDB to Elasticsearch connector
Stars: ✭ 185 (+478.13%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+234.38%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-56.25%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-37.5%)
Kafka Connectequivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Stars: ✭ 102 (+218.75%)
grateA Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
Stars: ✭ 98 (+206.25%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-21.87%)
id3cData logistics system enabling real-time pathogen surveillance. Built for the Seattle Flu Study.
Stars: ✭ 21 (-34.37%)
MetlMetl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Stars: ✭ 185 (+478.13%)
Csv2dbThe CSV to database command line loader
Stars: ✭ 102 (+218.75%)
ETW2JSONTool and library to convert ETW logs to JSON files
Stars: ✭ 66 (+106.25%)