AMI Tools
Python3 scripts and classes to help with managing bags of NYPL AMI files
Installation and Updates
Production use
Run the following from your terminal
pip3 install --user 'ami-tools @ git+https://github.com/NYPL/ami-tools'
If you are using virtual environments, do not include the --user
.
Development use
If you want a version that you can edit and run separately from the production install, clone this repo and then install it to a virtual environment.
cd /path/to/repo
pyenv virtualenv amitools-dev
pyenv local amitool-dev
pip install -e .
Whenever you run any portion of the ami-tools package while in /path/to/repo
it will use this working version of the package.
Tools
Installing the package makes the following tools available from the command line. All scripts include a help dialog.
script_name.py -h
Data collection
survey_drive.py
Generate the following from a mounted drive (or any folder): report of all files, report of all bags, directory with a copy of all presumed metadata (JSON and Excel)
Usage: Survey a drive mounted on a Mac
survey_drive.py -d /Volumes/drive-name -o path/to/dir/for/reports
Validation Tools
validate_ami_bags.py
Check bag Oxums, bag completeness, bag hashes, directory structure, filenames, and metadata (only implemented for Excel)
Usage: Check a directory of bags, default check does not look at metadata or checksums
validate_ami_bags.py -d path/to/dir/of/bags
Usage: Check a single bag, including metadata or checksums
validate_ami_bags.py -b path/to/bag --metadata --slow
validate_ami_excel.py
Check if an excel file adheres to the expectations of media ingest
Usage: Check a single Excel file
validate_ami_bags.py -e path/to/excel/file
validate_bags.py
Check bag Oxums, bag completeness, and bag hashes (if requested). Default is similar to bagit.py --validate --fast
except includes completeness check. Less strict than validate_ami_bags.py
.
Usage: Check a single bag
validate_bags.py -b path/to/bag --slow
Bag Management Tools
fix_baginfo.py
Update Oxum in bag-info.txt to match actual Oxum
Usage: Check and repair a directory of bag Oxums
fix_baginfo.py -d path/to/dir/of/bags
repair_bags.py (in development)
Manage files in bag-payload but not in manifest, either adding them to the manifest or deleting them.
Usage: Add all untracked files to manifest and Oxum
repair_bags.py -b path/to/bag --addfiles
Usage: Delete all untracked file from data/ directory. By default, only the following system files will be deleted: Thumbs.db files, DS_Store files, Appledouble files, and Icon files
repair_bags.py -b path/to/bag --deletefiles
convert_excelbag_to_jsonbag.py (in development)
Convert an bag that meets rules for AMI Excel bags to a bag that meets rules for AMI JSON bags
Usage: Convert all bags in a directory from Excel to JSON
convert_excelbag_to_jsonbag.py -b path/to/bag
Classes
The package also contains classes for implementing further tools
ami_bag.ami_bag
Extension of the bagit-python Bag class with methods for validation and classification of bags according to NYPL AMI rules
ami_md.ami_excel
Classes and methods for Excel workbooks and sheets storing metadata about preservation masters, edit masters, and no transfers
Usage: Validate the contents preservation master sheet against the ingest business rules
import ami_md.ami_excel
excel_file = ami_md.ami_excel("path/to/excel.xlsx")
excel_file.pres_sheet.validate_worksheet()
ami_md.ami_json
Methods for loading and manipulating AMI JSON data.
Usage: Convert a valid AMI JSON file to a flat key-value dict
import ami_md.ami_json
json_file = ami_md.ami_json(filepath = "path/to/file.json")
new_dict = json_file.convert_nestedDictToDotKey(json_file)
ami_md.ami_md_constants
Constants used for validating, normalizing, and enhancing metadata, mostly through methods in ami_excel.
Shell scripts
The package also includes a handful of scripts for utility functions. To install these scripts, users should chmod +x
and create an appropriate alias for each script.
bin/collect_metadata.sh
Copy xlsx and json from bags to a another directory for manipulation and analysis
validate_bags.sh
Validate a directory of bags after network transfer (superseded by validate_ami_bags.py)