SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+96795.45%)
ExtractA cross-platform command line tool for parallelised content extraction and analysis.
Stars: ✭ 188 (+327.27%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (+36.36%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (-6.82%)
MetlMetl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Stars: ✭ 185 (+320.45%)
prefect-saturnPython client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-65.91%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+17981.82%)
GrafterLinked Data & RDF Manufacturing Tools in Clojure
Stars: ✭ 174 (+295.45%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+990.91%)
autoencoders tensorflowAutomatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (+50%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+940.91%)
BenderBender - Serverless ETL Framework
Stars: ✭ 171 (+288.64%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (+838.64%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+722.73%)
sync-addonsOdoo Integration Addons
Stars: ✭ 69 (+56.82%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-31.82%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+22238.64%)
Open Semantic EtlPython based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+275%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+5222.73%)
OpenOmicsA bioinformatics API and web-app to integrate multi-omics datasets & interface with public databases.
Stars: ✭ 22 (-50%)
Mara Example Project 2An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+250%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+6.82%)
tutorialsShort programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-68.18%)
Omniparseromniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+236.36%)
CyberchefThe Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
Stars: ✭ 13,674 (+30977.27%)
google-sheets-etlLive import all your Google Sheets to your data warehouse
Stars: ✭ 15 (-65.91%)
brain-brewAutomated Anki flashcard creation and extraction to/from Csv
Stars: ✭ 55 (+25%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+218.18%)
learning RList of resources for learning R
Stars: ✭ 32 (-27.27%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-68.18%)
databaseAplus Framework Database Library
Stars: ✭ 147 (+234.09%)
Kettle Web基于spring boot通过java代码调用kette
Stars: ✭ 128 (+190.91%)
Amazing Feature EngineeringFeature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+395.45%)
TsfelAn intuitive library to extract features from time series
Stars: ✭ 202 (+359.09%)
Etl.netMass processing data with a complete ETL for .net developers
Stars: ✭ 129 (+193.18%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+150%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-54.55%)
cpp-can-isotpC++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.
Stars: ✭ 14 (-68.18%)
Hanzi char featurizer汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning
Stars: ✭ 187 (+325%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (+184.09%)
YaEtlYet Another ETL in PHP
Stars: ✭ 60 (+36.36%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+4636.36%)
Machine Learning Workflow With PythonThis is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+256.82%)
dominance-analysisThis package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Stars: ✭ 111 (+152.27%)
KibaData processing & ETL framework for Ruby
Stars: ✭ 1,618 (+3577.27%)
DataCon🏆DataCon大数据安全分析大赛,2019年方向二(恶意代码检测)冠军源码、2020年方向五(恶意代码分析)季军源码
Stars: ✭ 69 (+56.82%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (-36.36%)
cobrixA COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (+147.73%)
Table-Extractor-From-ImageThis repository contains the code that extracts a table from an image and exports it to an Excel.
Stars: ✭ 46 (+4.55%)
krawlerA minimalist (geospatial) ETL
Stars: ✭ 51 (+15.91%)