A cookiecutter template for Python data science projects
Pre-requisite
You need to have the cli cookiecutter
available in your Python environment.
Please see its installation instructions
here.
Usage
To generate a directory structure for a new data science project, you can run the following commands in your Python environment.
cookiecutter https://github.com/aws-samples/python-data-science-template
Alternatively, you can also clone this repository to use a local template:
# Clone to a local repository in the current directory.
git clone https://github.com/aws-samples/python-data-science-template.git
# The above command creates python-data-science-template/ in the current dir.
# Use the local repo to generate project structure
cookiecutter python-data-science-template
Project Structure
By using this template, your data science project is auto-generated as follows:
.
|-- bin/
|-- notebooks # A directory to place all notebooks files.
| |-- *.ipynb
| `-- my_nb_path.py # Imported by *.ipynb to treat src/ as PYTHONPATH
|-- requirements/
|-- src
| |-- my_custom_module # Your custom module
| |-- my_nb_color.py # Imported by *.ipynb to colorize their outputs
| `-- source_dir # Additional codes such as SageMaker source dir
|-- tests/ # Unit tests
|-- MANIFEST.in # Required by setup.py (if module name specified)
|-- setup.py # To pip install your Python module (if module name specified)
# These sample configuration files are auto-generated too:
|-- .editorconfig # Sample editor config (for IDE / editor that supports this)
|-- .gitattributes # Sample .gitattributes
|-- .gitleaks.toml # Sample Gitleaks config (if pre_commit is advanced)
|-- .gitignore # Sample .gitignore
|-- .pre-commit-config.yaml # Sample precommit hooks
|-- LICENSE # Boilperplate (auto-generated)
|-- README.md # Template for you to customize
|-- pyproject.toml # Sample configurations for Python toolchains
`-- tox.ini # Sample configurations for Python toolchains
This structure has been used in a few other places as well, e.g.,
aws-samples/sagemaker-rl-energy-storage-system
and aws-samples/amazon-sagemaker-gluonts-entrypoint.
Feel free to look at those repositories and observe the project structure
documented in their README.md
.
Related Projects
Ready to start your new data science project on AWS? If so, you may want to check on these related samples.
-
Do you like to work on EC2 instances? Then why don't you check out these simple template to setup basic Vim, Tmux, Zsh for the Deep Learning AMI Amazon Linux 2 for data scientsts.
-
Do you like to work on SageMaker classic notebook instances? Then why don't you check out the one-liner customization command that quickly applies common tweaks on a fresh (i.e., newly created or rebooted) SageMaker classic notebook instance, to make the notebook instance a little bit more ergonomic for prolonged usage.
-
Are you loooking for a quickstart to accelerate the delivery of custom ML solutions to production, without having to make too many design choices? Then why don't you check out the ML Max repo which includes templates for four pillars: training pipeline, inference pipeline, development environment and data management/ETL.
-
Are you tired of repeatedly writing the same boilerplate codes for common, tactical data science tasks? Then why don't you check on the SageMaker meta-entrypoint utilities, and the smallmatter library.
Security
See CONTRIBUTING for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.