All Projects → ftupas → dbt-spotify-analytics

ftupas / dbt-spotify-analytics

Licence: MIT license
Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to dbt-spotify-analytics

pre-commit-dbt
🎣 List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
Stars: ✭ 149 (+61.96%)
Mutual labels:  dbt
hassio-addons
DSMR Reader Datalogger and Metabase Home Assistant Add-ons
Stars: ✭ 29 (-68.48%)
Mutual labels:  metabase
awesome-dbt
A curated list of awesome dbt resources
Stars: ✭ 520 (+465.22%)
Mutual labels:  dbt
dbt-on-airflow
No description or website provided.
Stars: ✭ 30 (-67.39%)
Mutual labels:  dbt
firefox explore
Explore your firefox browsing history trends using Metabase Analytics Tool
Stars: ✭ 24 (-73.91%)
Mutual labels:  metabase
JimuReport
「低代码可视化报表」类似excel操作风格,在线拖拽完成设计!功能涵盖: 报表设计、图形报表、打印设计、大屏设计等,完全免费!秉承“简单、易用、专业”的产品理念,极大的降低报表开发难度、缩短开发周期、解决各类报表难题。
Stars: ✭ 2,895 (+3046.74%)
Mutual labels:  metabase
re-data
re_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+938.04%)
Mutual labels:  dbt
lando-boilerplates-for-joomla-wordpress-and-prestashop
My personal recipes for Lando - Docker containers. Battle-tested for Joomla, WordPress, and PrestaShop.
Stars: ✭ 23 (-75%)
Mutual labels:  docker-containers
dockerX
Examples of amazing Docker/Docker-Compose/Docker Swarm technologies
Stars: ✭ 17 (-81.52%)
Mutual labels:  metabase
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+20.65%)
Mutual labels:  dbt
metabase-school
A Metabase-integrated, real-time collaborative tool for writing SQL
Stars: ✭ 40 (-56.52%)
Mutual labels:  metabase
metabase-datomic
Datomic driver for Metabase
Stars: ✭ 59 (-35.87%)
Mutual labels:  metabase
Cboard
An easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+2938.04%)
Mutual labels:  metabase
tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
Stars: ✭ 219 (+138.04%)
Mutual labels:  dbt
kubernetes-starterkit
A launchpad for developers to learn Kubernetes from scratch and deployment of microservices on a kubernetes cluster.
Stars: ✭ 39 (-57.61%)
Mutual labels:  docker-containers
snowflake-starter
A _simple_ starter template for Snowflake Cloud Data Platform
Stars: ✭ 31 (-66.3%)
Mutual labels:  dbt
eastmoney
python requests + Django+ nodejs koa+ mysql to crawl eastmoney fund and stock data,for data analysis and visualiaztion .
Stars: ✭ 56 (-39.13%)
Mutual labels:  metabase
PyRasgo
Helper code to interact with Rasgo via our SDK, PyRasgo
Stars: ✭ 39 (-57.61%)
Mutual labels:  dbt
dbt-databricks
A dbt adapter for Databricks.
Stars: ✭ 115 (+25%)
Mutual labels:  dbt
Metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Stars: ✭ 26,803 (+29033.7%)
Mutual labels:  metabase

Spotify User Analytics

Introduction

In this project, we will be analyzing our listening history, top tracks & artists, and genres from Spotify. Here are the tools that we will be using:

  • Python - Scraping data from Spotify API endpoints and saving files to CSV
  • Postgres - Our database wherein data will be stored into and queried from
  • dbt (Data Build Tool) - Data modeling tool to transform our data in staging to fact, dimension tables, and views
  • Metabase - Dashboarding tool to analyze our data
  • Docker - Containerizing our applications i.e. Postgres, dbt, and Metabase

Project Files

  • app
    • main.py - Our main ETL script that fetches data from the Spotify API endpoints and saves them to CSV
    • util.py - Utility helper file that contains a custom class SpotifyUtil
    • config_template.py - This is where we will store our credentials
  • dbt
    • models - Contains the sql scripts and schema.yml files that will be used when we run our transformations
    • dbt_entrypoint.sh - Script that will server as our entrypoint when running the dbt container
    • Dockerfile - Contains the commands to create the custom Docker image
    • dbt_project.yml - YAML file to configure dbt
    • packages.yml - YAML file for test dependencies
    • profiles.yml - YAML file to configure connection of dbt to postgres
  • metabase
    • metabase.db - Metadata database of Metabase for the dashboard
  • docker-compose.yml - YAML file to orchestrate Docker containers composition

Workflow

The diagram below illustrates the systems design and how the workflow will go.

system_design

Let's break this down into major steps

  • Setup
  • Get Spotify data
  • Build Docker containers
  • Transform, model, and load data to Postgres DB using dbt
  • Serve to Metabase dashboard

Setup

  • cd to this directory

  • Open a terminal, create a Python virtual environment using:

    Windows
    > python -m venv venv
    
    Mac/Linux
    $ make build
    
    

    then activate it by executing

    Windows:
    > venv\Scripts\activate.bat
    

    (For Windows) Install dependencies using:

    > python -m pip install -r requirements.txt
    
  • While dependencies are being installed, navigate to Spotify Developer Page and login

  • Create an app and note down the Client ID and Client Secret, make sure to add a redirect uri in Settings i.e. http://localhost:8888/callback/

  • Fill the details in config_template.py and rename it to config.py

Get Spotify data

  • Run the main Python script to fetch the data from Spotify using:

    Windows
    > python app\main.py
    
    Mac/Linux
    $ make run
    
  • While the script is running, it will redirect to a webpage that looks like the one below, and just click AGREE

    spotify

    p.s. follow me for nice tunes! 😁

Build Docker containers

Now that we have the CSV files in the data folder, we can now build our Docker containers using this command:

docker-compose up

This command will build our dbt, postgres, and metabase containers. This will also run our data loading, transformations, and modeling in the background.

Transform, model, and load data to Postgres DB using dbt

During docker-compose, dbt runs the following commands

  • dbt init spotify_analytics: Creates the project folder
  • dbt debug: Checks the connection with the Postgres database
  • dbt deps: Installs the test dependencies
  • dbt seed: Loads the CSV files into staging tables in the database in postgres
  • dbt run: Runs the transformations and loads the data into the database
  • dbt docs generate: Generates the documentation of the dbt project
  • dbt docs serve: Serves the documentation on a webserver

Navigating to http://localhost:8080 to see the documentation, we can see the lineage graph, a DAG (Directed Acyclic Graph).

DAG

This shows us how the CSV files have been transformed to the fact, dimension tables and views.

Serve to Metabase dashboard

Now that the data is loaded and transformed in our database, we may now view it in http://localhost:3000. You may need to login, the credentials are

email: [email protected]
password: password1

login

Then you can navigate through, play around, and analyze your data.

Questions

  • What are the more common tracks in my playlists?
  • Avg length of playlists?
  • What are my favourites (most listened - top 5) genres in my playlists?
  • What are my favourites (most listened - top 10) artists in my playlists?
  • Am I born at the right decade? (more common release years of tracks in my playlists)
  • What are the two keys that please me more? (2 most commons keys on tracks in my playlists)
  • How much hipster am I? (avg popularity of tracks in my playlists)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].