All Projects → coiled → coiled-resources

coiled / coiled-resources

Licence: other
Notebooks that support blog posts and tech talks on Dask / Coiled.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to coiled-resources

dvc dask use case
A use case of a reproducible machine learning pipeline using Dask, DVC, and MLflow.
Stars: ✭ 22 (-33.33%)
Mutual labels:  dask
framequery
SQL on dataframes - pandas and dask
Stars: ✭ 63 (+90.91%)
Mutual labels:  dask
Stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Stars: ✭ 2,019 (+6018.18%)
Mutual labels:  dask
arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Stars: ✭ 33 (+0%)
Mutual labels:  dask
dask-rasterio
Read and write rasters in parallel using Rasterio and Dask
Stars: ✭ 82 (+148.48%)
Mutual labels:  dask
knit
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Stars: ✭ 53 (+60.61%)
Mutual labels:  dask
mloperator
Machine Learning Operator & Controller for Kubernetes
Stars: ✭ 85 (+157.58%)
Mutual labels:  dask
graphchain
⚡️ An efficient cache for the execution of dask graphs.
Stars: ✭ 63 (+90.91%)
Mutual labels:  dask
flox
Fast & furious GroupBy operations for dask.array
Stars: ✭ 42 (+27.27%)
Mutual labels:  dask
Swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Stars: ✭ 1,844 (+5487.88%)
Mutual labels:  dask
madpy-dask
MadPy Dask talk materials
Stars: ✭ 33 (+0%)
Mutual labels:  dask
gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Stars: ✭ 29 (-12.12%)
Mutual labels:  dask
dask-ec2
Start a cluster in EC2 for dask.distributed
Stars: ✭ 103 (+212.12%)
Mutual labels:  dask
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+1169.7%)
Mutual labels:  dask
Xarray
N-D labeled arrays and datasets in Python
Stars: ✭ 2,353 (+7030.3%)
Mutual labels:  dask
esmlab
Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
Stars: ✭ 23 (-30.3%)
Mutual labels:  dask
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (+421.21%)
Mutual labels:  dask
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3993.94%)
Mutual labels:  dask
Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+6893.94%)
Mutual labels:  dask
Dask
Parallel computing with task scheduling
Stars: ✭ 9,309 (+28109.09%)
Mutual labels:  dask

coiled-resources

Notebooks that support content like blogs and videos.

Project goals

  • Make it easy for you to reproduce any computations covered in blogs or videos
  • Show best practices for organizing a repo with hundreds of notebooks & tens of environments

Here are some notable blog posts that are backed by notebooks in this repo:

Blog post Notebook link
Speed up pandas query 10x Notebook
Convert Dask DataFrame to pandas DataFrame Notebook
Convert Parquet to CSV Notebook

Repo organization

This repo contains notebooks that are used in blogs and other content. The notebooks are cleanly organized, so you can easily find the notebook that corresponds to a blog post. For example, the blogs/save-numpy-dask-array-to-zarr.ipynb notebook corresponds with the coiled.io/blog/save-numpy-dask-array-to-zarr/ blog post. Notice how the notebook name aligns with the blog post URL.

The instructions for creating an environment to run each notebook are at the top of every notebook. The following setup instruction will work for most of the notebooks.

Setting up your machine

You can install the dependencies on your local machine to run these notebooks by creating a conda environment:

conda env create -f envs/crt-004.yml

crt stands for coiled-runtime, which pins a set of Dask runtime dependencies that are known to happily coexist.

Activate the environment with conda activate crt-004.

Open the project in your browser with jupyter lab.

Create Coiled software environments

To a Coiled software environment that matches you local environment, run a command like this: coiled env create -n crt-004 --conda envs/crt-004.yml.

Your Coiled sofware environment should always match your local environment exactly.

Here's how to create a cluster that uses the coiled-runtime software environment: cluster = coiled.Cluster(name="powers-crt-004", software="crt-004", n_workers=5).

Notebooks

Some of the notebooks are designed to run locally and others run on cloud machines via Coiled.

You can follow the Coiled getting started guide to get your machine setup. Coiled gives you some free credits, so you can easily try out the platform.

Some notebooks in this repo require conda environments with additional customization. You can find environment.yml files to build those environments in the respective directories.

Contributing

We welcome community contributions, especially MCVE analyses that others will find useful.

Feel free to create an issue and we'll be happy to brainstorm contributions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].