data science template
in this repo u can look at default template for ds/ml/dl/.. projects or similar
how to use
-
before creating a new project from this template, u need to install the next dependencies
-
brew install cookiecutter
or
pip install cookiecutter
-
-
macos
-
install
brew install gh
-
upgrade
brew upgrade gh
-
-
linux
look at the linux installation instructions
-
-
-
after go to the directory where u want to create your project and run
cookiecutter gh:vtrokhymenko/dst
-
follow the instruction
using the next project structure
├── .github <- some actions
│ ├── workflows
│ │ └── ci.yml
│ └── dependabot.yml
│
├── LICENSE <- will be created if u choose
├── README.md <- the main readme
│
├── config <- often it's yaml-files with some parameters
│
├── data
│ ├── external <- data from third party sources
│ ├── interim <- intermediate data that has been transformed
│ ├── processed <- the final, canonical data sets for modeling
│ ├── raw <- the original, immutable data dump
│ ├── features <- another
│ └── README.md
│
├── docs <- a default sphinx project (see sphinx-doc.org for details)
│
├── experiments <- for any experiments
│ └── README.md
│
├── models <- trained & serialized models, model predictions, or model summaries
│ └── README.md
│
├── notebooks <- notebooks for research
│ naming convention is a number (for ordering), the creator's initials, and a short `-`
│ delimited description, eg `1.0-jqp-initial-data-exploration`
│
├── references <- data dictionaries, manuals, and all other explanatory materials
│ └── README.md
│
├── tests <- test for project
│
├── {{ cookiecutter.repo_name }} <- source code
│ ├── __init__.py <- makes src a python module eg propose generate with `mkinit`
│ │
│ ├── data <- scripts to download or generate data
│ │
│ ├── models <- scripts to train models and then use trained models to make predictions
│ │
│ └── visualization <- scripts to create exploratory and results oriented visualizations
│
├── .gitignore <- default for python
│
└── .pre-commit-config.yaml <- custom pcc with `reorder_python_imports`, `black`, `flake8`, `pre-commit-pyright`, `pre-commit-hooks`
other similar templates
propose to use next tools
- gh – github on the terminal
- dvc – open-source version control system for ds projects
- cml – continuous machine learning | ci/cd for ml/dl
- renovate - yet another dependency management
- hydra – to configuring complex applications
- pipreqs – autogenerate pip requirements
- pre-commit – framework for managing & maintaining multi-language pre-commit hooks
- code style/review/formatter/typer
- codefactor
- snyk
- deepsource
- prettier
- pycodestyle
- pyre-check
- pyright
- restyled (autopep8, black, isort, prettier-markdown, reorder-python-imports, yapf)
- super-linter (pylint, flake8, awesome-flake8-extensions, black)
- yapf
- vulture
- tests
- profiler/debugger
- spellcheckers
citation
@misc{dst,
author = {trokhymenko viktor},
title = {data science template},
year = {2020},
publisher = {github},
howpublished = {\url{https://github.com/vtrokhymenko/dst}}
}