All Projects → sspaeti-com → practical-data-engineering

sspaeti-com / practical-data-engineering

Licence: other
Real estate dagster pipeline

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to practical-data-engineering

jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-77.27%)
Mutual labels:  data-engineering, data-pipeline
Data Engineering Howto
A list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+1769.09%)
Mutual labels:  data-engineering, data-pipeline
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-81.82%)
Mutual labels:  data-engineering, data-pipeline
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-47.27%)
Mutual labels:  data-engineering
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-76.36%)
Mutual labels:  data-engineering
deordie-meetups
DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (-56.36%)
Mutual labels:  data-engineering
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-64.55%)
Mutual labels:  data-pipeline
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+298.18%)
Mutual labels:  data-engineering
Azure-Certification-DP-200
Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
Stars: ✭ 54 (-50.91%)
Mutual labels:  data-engineering
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+2.73%)
Mutual labels:  data-pipeline
get smarties
Dummy variable generation with fit/transform capabilities
Stars: ✭ 23 (-79.09%)
Mutual labels:  data-engineering
saisoku
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (-63.64%)
Mutual labels:  data-pipeline
awesome-bigquery-views
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Stars: ✭ 325 (+195.45%)
Mutual labels:  data-engineering
lrmr
Less-Resilient MapReduce framework for Go
Stars: ✭ 32 (-70.91%)
Mutual labels:  data-engineering
datart
Datart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+847.27%)
Mutual labels:  data-engineering
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+153.64%)
Mutual labels:  data-engineering
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-30%)
Mutual labels:  data-engineering
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-84.55%)
Mutual labels:  data-engineering
Everything-Tech
A collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+260%)
Mutual labels:  data-engineering
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-80%)
Mutual labels:  data-pipeline

Practical Data Engineering Project

This is a practical example of a data engineering project with real-estates. The connected blog post about Building a Data Engineering Project in 20 Minutes you can find on my website. Topics are:


The Status of the project you find here.

Starting Dagster

To get MinIO, Spark, Kubernetes, etc. ready, check the representive folder in here.

  1. MinIO started
  2. Kubernetes ready
  3. Spark image and role and namespaces ready
  4. cd src/pipelines/real-estate and start dagit with dagit
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].