All Projects → marl → Audiosetdl

marl / Audiosetdl

Scripts for downloading AudioSet

Projects that are alternatives of or similar to Audiosetdl

Ml Examples
Examples of machine learning, with an emphasis on deep learning.
Stars: ✭ 111 (-0.89%)
Mutual labels:  jupyter-notebook
Web Deep Learning Classifier
Deep Learning model to classify food (Web App)
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Ctc Executioner
Master Thesis: Limit order placement with Reinforcement Learning
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Numba examples
Examples using Numba.
Stars: ✭ 111 (-0.89%)
Mutual labels:  jupyter-notebook
Wechat Zhihu Csdnblog Code
WeChat Official Accounts, zhihu and CSDN'blog code
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Others
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Python tutorial
python book
Stars: ✭ 111 (-0.89%)
Mutual labels:  jupyter-notebook
An Introduction To Statistical Learning
This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.
Stars: ✭ 1,843 (+1545.54%)
Mutual labels:  jupyter-notebook
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Pydata Book
利用Python进行数据分析(https://read.douban.com/reader/ebook/15249337/)源代码整理版
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Silero Vad
Silero VAD: pre-trained enterprise-grade Voice Activity and Number Detector
Stars: ✭ 108 (-3.57%)
Mutual labels:  jupyter-notebook
Deepscm
Repository for Deep Structural Causal Models for Tractable Counterfactual Inference
Stars: ✭ 106 (-5.36%)
Mutual labels:  jupyter-notebook
Program synthesis
Program Synthesis
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Quantecon.notebooks
Jupyter notebooks contributed by QuantEcon developers, users and the community
Stars: ✭ 111 (-0.89%)
Mutual labels:  jupyter-notebook
Stock Market Analysis And Prediction
Stock Market Analysis and Prediction is the project on technical analysis, visualization and prediction using data provided by Google Finance.
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook
Workshops
A few exercises for use at events.
Stars: ✭ 1,460 (+1203.57%)
Mutual labels:  jupyter-notebook
Lstm Gru Pytorch
LSTM and GRU in PyTorch
Stars: ✭ 109 (-2.68%)
Mutual labels:  jupyter-notebook
Hello World
Add any Program in any language you like or add a hello world Program ❣️ if you like give us ⭐
Stars: ✭ 1,464 (+1207.14%)
Mutual labels:  jupyter-notebook
Deepfashion2
DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf
Stars: ✭ 1,628 (+1353.57%)
Mutual labels:  jupyter-notebook
Ewc.pytorch
An implementation of EWC with PyTorch
Stars: ✭ 112 (+0%)
Mutual labels:  jupyter-notebook

audiosetdl

Modules and scripts for downloading Google's AudioSet dataset, a dataset of ~2.1 million annotated segments from YouTube videos.

Setup

  • Clone the repository onto your machine.

  • If you would like to get started right away with a standalone (mini)conda, environment, run setup.sh in the project directory. This will install a local Anaconda environment in <PROJECT DIR>/bin/miniconda. You can find a python executable at <PROJECT DIR>/bin/miniconda/bin/python.

    • Example: ./setup.sh
  • If you would like to work with your existing working environment, it should satisfy the following requirements:

    • Python 3 and dependencies
      • On Mac, can be installed with brew install python3
      • On Ubuntu/Debian, can be installed with apt-get install python3
      • Dependencies can be installed with pip install -r <PROJECT DIR>/requirements.txt
    • ffmpeg
      • On Mac, can be installed with brew install ffmpeg
      • On Ubuntu/Debian, can be installed with apt-get install ffmpeg
    • sox
      • On Mac, can be installed with brew install sox
      • On Ubuntu/Debian, can be installed with apt-get install sox

Running

As a single script

  • Run python download_audioset.py
    • If you are using the local standalone conda installation, either activate the conda virtual environment, or use the python executable found in the local conda installation.
    • The script will automatically download the scripts into your data directory if they do not exist and then start downloading the audio and video for all of the segments in parallel.
    • You can tweak how the downloading and processing is done. For example,
      • URL/path to dataset subset files
      • Audio/video format and codec
      • Different strategies for obtaining video
      • Number of multiprocessing pool workers used
      • Path to logging
    • Run python download_audioset.py -h for a full list of arguments

SLURM

This can be run as a batch of SLURM jobs

  • Run download_subset_files.sh

    • Sets up the data directory structure in the given folder (which will be created) and downloads the AudioSet subset files to that directory. If the --split <N> option is used, the script splits the files into N parts, which will have a suffix for a job ID, e.g. eval_segments.csv.01.
    • Example: ./download_subset_files.sh --split 10 /home/user/audiosetdl/data
  • Use sbatch to run the audiosetdl-job-array.s job array script

    • SLURM job array script that can be run by sbatch. Be sure to edit this to change the location of the repository ($SRCDIR) and to set the data directory ($DATADIR). Update any other configurations, such as email notifications and memory usage as it fits your use case.
    • Example: sbatch --array=1-10 audiosetdl-job-array.s

Examples

Examples can be found in the notebooks directory of this repository.

Cases where videos cannot be downloaded

  • Video removed
  • User account deleted
  • Not available in country
  • Need to sign in to view
  • Video no longer exists
  • Copyright takedown

Kinetics Dataset

This script can also be used to download the Kinetics dataset. Running kinetics/filter_subset.sh <filter_list> <kinetics_subset_csv> <output_file> will filter the given Kinetics subset csv file to contain only the classes in the given filter list, and put it in a format that is compatible with this script. kinetics/filter_classes.txt is provided as an example, and filters what seems to be close to what is used in Look, Listen and Learn (Arandjelović, R., Zisserman, A. 2017). Once you run it on all of the subset .csv files, you can provide those to the scripts instead of the AudioSet .csv files. Note that because the test labels are withheld, the .csv for the test set will be empty.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].