jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-77.27%)
Mutual labels: data-engineering, data-pipeline
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+1769.09%)
Mutual labels: data-engineering, data-pipeline
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-81.82%)
Mutual labels: data-engineering, data-pipeline
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-47.27%)
Mutual labels: data-engineering
big-data-engineering-indonesiaA curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-76.36%)
Mutual labels: data-engineering
deordie-meetupsDE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (-56.36%)
Mutual labels: data-engineering
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-64.55%)
Mutual labels: data-pipeline
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+298.18%)
Mutual labels: data-engineering
Azure-Certification-DP-200Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
Stars: ✭ 54 (-50.91%)
Mutual labels: data-engineering
ob bulkstashBulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+2.73%)
Mutual labels: data-pipeline
get smartiesDummy variable generation with fit/transform capabilities
Stars: ✭ 23 (-79.09%)
Mutual labels: data-engineering
saisokuSaisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (-63.64%)
Mutual labels: data-pipeline
awesome-bigquery-viewsUseful SQL queries for Blockchain ETL datasets in BigQuery.
Stars: ✭ 325 (+195.45%)
Mutual labels: data-engineering
lrmrLess-Resilient MapReduce framework for Go
Stars: ✭ 32 (-70.91%)
Mutual labels: data-engineering
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+847.27%)
Mutual labels: data-engineering
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+153.64%)
Mutual labels: data-engineering
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-30%)
Mutual labels: data-engineering
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-84.55%)
Mutual labels: data-engineering
Everything-TechA collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+260%)
Mutual labels: data-engineering
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-80%)
Mutual labels: data-pipeline