All Projects → visipedia → Annotation_tools

visipedia / Annotation_tools

Licence: mit
Visipedia Annotation Tools

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Annotation tools

Awesome Dataset Tools
🔧 A curated list of awesome dataset tools
Stars: ✭ 495 (+102.04%)
Mutual labels:  datasets, annotations
clothing-detection-ecommerce-dataset
Clothing detection dataset
Stars: ✭ 43 (-82.45%)
Mutual labels:  annotations, datasets
Label Studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Stars: ✭ 7,264 (+2864.9%)
Mutual labels:  datasets, annotations
Projects
🪐 End-to-end NLP workflows from prototype to production
Stars: ✭ 397 (+62.04%)
Mutual labels:  datasets, annotations
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (+263.67%)
Mutual labels:  datasets, annotations
Convalida
A simple, lightweight and powerful field validation library for Android.
Stars: ✭ 201 (-17.96%)
Mutual labels:  annotations
Aidl kb
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
Stars: ✭ 219 (-10.61%)
Mutual labels:  datasets
Nlp datasets
My NLP datasets for Russian language
Stars: ✭ 198 (-19.18%)
Mutual labels:  datasets
Simple Spring Memcached
A drop-in library to enable memcached caching in Spring beans via annotations
Stars: ✭ 185 (-24.49%)
Mutual labels:  annotations
Retriever
Quickly download, clean up, and install public datasets into a database management system
Stars: ✭ 241 (-1.63%)
Mutual labels:  datasets
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Stars: ✭ 231 (-5.71%)
Mutual labels:  datasets
Koro1fileheader
VSCode插件:自动生成,自动更新VSCode文件头部注释, 自动生成函数注释并支持提取函数参数,支持所有主流语言,文档齐全,使用简单,配置灵活方便,持续维护多年。
Stars: ✭ 3,137 (+1180.41%)
Mutual labels:  annotations
Indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Stars: ✭ 198 (-19.18%)
Mutual labels:  datasets
Zr Obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Stars: ✭ 219 (-10.61%)
Mutual labels:  datasets
Awesome Json Datasets
A curated list of awesome JSON datasets that don't require authentication.
Stars: ✭ 2,421 (+888.16%)
Mutual labels:  datasets
React Tater
A React component to add annotations to any element on a page 🥔
Stars: ✭ 235 (-4.08%)
Mutual labels:  annotations
Datasaurus
R Package 📦 Containing the Datasaurus Dozen datasets 📊
Stars: ✭ 193 (-21.22%)
Mutual labels:  datasets
Mola
A Modular Optimization framework for Localization and mApping (MOLA)
Stars: ✭ 206 (-15.92%)
Mutual labels:  datasets
Machine Learning Resources
A curated list of awesome machine learning frameworks, libraries, courses, books and many more.
Stars: ✭ 226 (-7.76%)
Mutual labels:  datasets
Awesome Data Annotation
A list of tools for annotating data, managing annotations, etc.
Stars: ✭ 204 (-16.73%)
Mutual labels:  annotations

coco example

Visipedia Annotation Toolkit

This repository contains a collection of tools for editing and creating COCO style datasets.

These web based annotation tools are built on top of Leaflet.js and Leaflet.draw.

Capabilities:

  • Load and visualize a COCO style dataset
  • Edit Class Labels
  • Edit Bounding Boxes
  • Edit Keypoints
  • Export a COCO style dataet
  • Bounding Box Tasks for Amazon Mechanical Turk

Not Implemented:

  • Edit Segmentations
  • Keypoint tasks for Amazon Mechanical Turk
  • Class label tasks for Amazon Mechanical Turk
  • Segmentation tasks for Amazon Mechanical Turk

Requirements and Environments

This code base is developed using Python 2.7.10 on Ubuntu 16.04 and MacOSX 10.11. You need to have MongoDB installed and running.

The tools are primarily tested using the Chrome web browser.

Quick Start

Make sure that MongoDB is installed and running (e.g. for Ubuntu 16.04 see here).

Clone the repo:

$ git clone https://github.com/visipedia/annotation_tools.git
$ cd annotation_tools

Install the python dependencies:

$ pip install -r requirements.txt

Start the annotation tool web server

$ python run.py --port 8008

Download the COCO Dataset annotation file:

cd ~/Downloads
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip annotations_trainval2017.zip

Import the validation annotations into the annotation tool:

# From the annotation_tools repo
$ python -m annotation_tools.db_dataset_utils --action load \
--dataset ~/Downloads/annotations/person_keypoints_val2017.json \
--normalize

If you get an error here, then please make sure MongoDB is installed and running.

Go to http://localhost:8008/edit_image/100238 to edit the annotations for the validation image with id=100238.

Go to http://localhost:8008/edit_task/?start=0&end=100 to edit the first 100 images, where the images have been sorted by their ids.

Go to http://localhost:8008/edit_task/?category_id=1 to edit all images that have annotations whose category_id=1.

Export the modified dataset:

$ python -m annotation_tools.db_dataset_utils --action export \
--output ~/Downloads/annotations/updated_person_keypoints_val2017.json \
--denormalize

Clear the annotation tool database:

$ python -m annotation_tools.db_dataset_utils --action drop

Development Setup

To modify and develop this code base you will need to have node and npm installed.

Clone the repo:

$ git clone https://github.com/visipedia/annotation_tools.git
$ cd annotation_tools

Install python packages:

$ pip install -r requirements.txt

Install node modules (both production and development):

$ npm install

Watch for javascript changes and recompile the app (this generates app.bundle.js in annotation_tools/static):

$ npm run watch

Start the web server:

$ python run.py \
--port 8008 \
--debug

Dataset Format

We use a slightly modified COCO dataset format:

{
"images" : [image],
"annotations" : [annotation],
"categories" : [category],
"licenses" : [license]
}

image{
  "id" : str,
  "width" : int,
  "height" : int,
  "file_name" : str,
  "license" : str,
  "rights_holder" : str,
  "url" : str,
  "date_captured" : datetime (str)
}

annotation{
  "id" : str,
  "image_id" : str,
  "category_id" : str,
  "segmentation" : RLE or [polygon],
  "area" : float,
  "bbox" : [x,y,width,height],
  "iscrowd" : 0 or 1,
  "keypoints" : [x, y, v, ...],
  "num_keypoints" : int
}

category{
  "id" : str,
  "name" : str,
  "supercategory" : str,
  "keypoints" : [str, ...],
  "keypoints_style" : [str, ...],
}

license{
  "id" : str,
  "name" : str,
  "url" : str,
}

The biggest change that we have made is storing the annotations in normalized coordinates (each x value is divided by the width of the image, and each y value is divided by the height of the image). This is more convenient for rendering the annotations on resized images. We also use strings to store the ids rather than integers.

coco_url & flickr_url have been remapped to url.

rights_holder is a string that can hold the photographer's name.

keypoints_style is an array of css color values for the different keypoints of the class (e.g. '#46f0f0').

Dataset Loading and Exporting

We use the modified COCO dataset format as the "schema" for the the MongoDB database. Loading a dataset will create 4 collections: category, image, annotation, and license.

We can load the original COCO dataset out of the box. However, we need to tell the code to normalize the annotations by passing the --normalize command line argument. Further, the code will check to see if coco_url is present and will create a url field with the same value.

Load a dataset:

python -m annotation_tools.db_dataset_utils --action load \
--dataset ~/Downloads/annotations/person_keypoints_val2017.json \
--normalize

After we have edited the dataset, we can export it. This will produce a json file that can be used as a datatset file to train a computer vision model. By default, the code will export noramalized annotations, we can export denomalized coordinates by passing the --denormalize command line argument.

Export a dataset:

python -m annotation_tools.db_dataset_utils --action export \
--output ~/Downloads/annotations/updated_person_keypoints_val2017.json \
--denormalize

We provide a convenience function to clear the collections that have been created when loading a dataset:

python -m annotation_tools.db_dataset_utils --action drop

Hosting Images Locally

It might be the case that the images you want to edit are on your local machine and not accessible via a url. In this case, you can use python's SimpleHTTPServer to start a local webserver to serve the images directly from your machine. If the images are located in /home/gvanhorn/images then you can:

cd /home/gvanhorn
python -m SimpleHTTPServer 8007

This starts a webserver on port 8007 that can serve files from the /home/gvanhorn directory. You can now access images via the browser by going to localhost:8007/images/397133.jpg, where 397133.jpg is an image file in /home/gvanhorn/images. Now you can create a json dataset file that has localhost:8007/images/397133.jpg in the url field for the image with id 397133. As this technique makes all files in the directory /home/gvanhorn accessible, this should be used with caution.

Editing an Image

The edit tool is meant to be used by a "super user." It is a convenient tool to visualize and edit all annotations on an image. All changes will overwrite the annotations in the database. To edit a specific image, use the image id (which you specified in the dataset file that you loaded in the previous section) and go to the url localhost:8008/edit_image/397133, where the image id is 397133 in this case. Make any modificaiton to the image that you need to and save the annotations. Note that when saving the annotations you directly overwrite the previous version of the annotations.

We currently support editing the class labels, bounding boxes, and keypoints. Editing segmentations is not currently supported.

Editing an Image Sequence

You can use a url constucted like localhost:8008/edit_task/?start=0&end=100 to edit the first 100 images in the dataset, where the images are sorted by their ids. You can additionally specify a category id to edit only images that have labels with that category localhost:8008/edit_task/?start=0&end=100&category_id=1.

Collecting Bounding Boxes

We support creating bounding box tasks, where each task is composed of a group of images that needed to be annotated with bounding boxes for a single category. Each task has a specific id and is accessible via localhost:8008/bbox_task/0a95f07a, where 0a95f07a is the task id. Similar to datasets, you'll need to create a json file that specifies the bounding box tasks and then load that file into the tool.

Data format:

{
  'instructions' : [bbox_task_instructions],
  'tasks' : [bbox_task]
}

bbox_task_instructions{
  id : str
  title : str
  description : str
  instructions: url
  examples: [url]
}

bbox_task{
  id : str
  image_ids : [str]
  instructions_id : str,
  category_id : str
}

The bbox_task_instructions contains fields that hold instruction information to show to the worker. The examples list should contain urls to example images. These images should have a height of 500px and will be rendered on the task start screen. instructions should point to an external page that contains detailed information for your task. For example you can use Google Slides to describe the task in detail and have more examples.

bbox_task contains a list of image ids (image_ids) that should be annotated with bounding boxes. The instruction_id field should be a valid bbox_task_instructions id. The category_id should be valid category that was created when loading a dataset. The workers will be asked to draw boxes around that category for each image in the task.

Once you have created a json file you can load it:

python -m annotation_tools.db_bbox_utils --action load \
--tasks ~/Desktop/bbox_tasks.json

The task can be accessed by going to the url localhost:8008/bbox_task/0a95f07a, where 0a95f07a is a bbox_task id that you specified in the json file that was loaded.

When a worker finishes a task, the following result structure will be saved in the database:

bbox_task_result{
  time : float
  task_id : str
  date : str
  worker_id : str
  results : [bbox_result]
}

bbox_result{
  time : float
  annotations : [annotation]
  image : image
}

Where annotation is defined above.

These results can be exported to a json file with:

python -m annotation_tools.db_bbox_utils --action export \
--output ~/Desktop/bbox_task_results.json \
--denormalize

If you only want to export a specific set of results, you can pass in the bounding box task file that contains the tasks you want results for:

python -m annotation_tools.db_bbox_utils --action export \
--tasks ~/Desktop/bbox_tasks.json \
--output ~/Desktop/bbox_task_results.json \
--denormalize

To merge these redundant box annotations together to produce a final dataset you can use the Crowdsourcing repo. See here for an example.

We provide a convenience function to clear all collections associated with the bounding boxes tasks:

python -m annotation_tools.db_bbox_utils --action drop
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].