• 1 Resources
  • 2 Ideal Labeler Qualities
  • 3 Potential Labelers
    • 3.1 Microsoft Visual Object Tagging Tool
    • 3.2 Labelbox
    • 3.3 UltimateLabeling
    • 3.4 cvat
    • 3.5 ViPER
    • 3.6 Anvil
    • 3.7 sloth
    • 3.8 labelme
    • 3.9 SimpleVideoAnnotation
    • 3.10 video_annotation
    • 3.11 label-v
    • 3.12 scaleabel
    • 3.13 vactic.js
  • 4 Best leads
  • 5 Conclusions and Thoughts
    • 5.1 Labeler Considerations

2 Ideal Labeler Qualities

The labeler needs to have the following attributes:

  • Ability to handle both images and video
  • Import/export annotations
  • Some sort of layered annotations for comparisons

The labeler will preferably have the following:

  • Import/export multiple types of annotations (e.g., .csv,.json)
  • Add meta-data
  • Toggle number of videos present

3 Potential Labelers

3.1 Microsoft Visual Object Tagging Tool

Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.

  • Can run locally or in a web app
    • Web app cannot access local file system
    • webapp Docker image
  • No way to import external data

3.2 Labelbox

Labelbox is the fastest way to annotate data to build and ship computer vision applications.

3.3 UltimateLabeling

A multi-purpose Video Labeling GUI in Python with integrated SOTA detector and tracker.

  • Ability to import pre-existing files:

Import labels: To import existing .CSV labels, hit Cmd+I (or Ctrl+I). UltimateLabeling expects to read one .CSV file per frame, in the format: “class_id”, “xc”, “yc”, “w”, “h”.

  • csv file import/export
  • Can connect to remote server via ssh
  • YOLO integration
  • Seems to be under active development (i.e., missing some features)

3.4 cvat

Powerful and efficient Computer Vision Annotation Tool (CVAT)

  • XML file output
  • Can only be run in Google Chrome
  • Run via docker compose (i.e., multiple containers), which is incompatible with singularity (as far as I know)

3.5 ViPER

  • Written in Java (cross platform)
  • Loads annotations via XML
  • Relatively old (2015)

3.6 Anvil

  • Written in Java
  • Relatively old (2015)

3.7 sloth

Sloth is a tool for labeling image and video data for computer vision research.

Installation guide

3.8 labelme

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

3.9 SimpleVideoAnnotation

A simple video annotation made with python + OpenCV for detection in YoloV2 format

3.10 video_annotation

Annotation tool for videos

  • Written in python
  • Run as a server

3.11 label-v

semi-automatic video annotation tool

  • Written in python
  • Runs in browser
  • Outputs json

3.12 scaleabel

Quantify computer vision performance in human terms

  • Run in browser
  • Can upload image lists and annotations as json

3.13 vactic.js

vatic.js A pure Javascript video annotation tool

  • Run in browser
  • Outputs XML

4 Best leads

Based on the initial skimming of annotation software, the following appear to be the most viable for the current project. None are perfectly suitable for the project purpose, but may be useful nonetheless.

  1. UltimateLabeling
  • Runs in a dedicated window (as opposed to through a web browser); not ideal for running on ACI or in a container. Doable, but can’t get it to work on my local machine yet

  • A little buggy, but in active development

  • Can’t get labels other than bounding boxes to work

  1. scalable
  • Packaged Dockerfile runs on Ubuntu18.04 (not compatible with ACI)

  • Complicated label file structure

  1. labelme
  • Doesn’t run in browser

  • Uncomplicated JSON label file

  • Support for multiple label types

  1. label-v
  • Bounding box labels only
  1. cvat
    • Stretch goal. Very pretty and feature-rich, but I don’t think I can get it working on ACI since it relies on docker-compose

    • Complicated project setup

5 Conclusions and Thoughts

There are a lot of labeling tools, but all seem to have been created either as a project starting point, or to address a specific issue that the creator was facing (which probably explains why there are so many!).

None of the labelers reviewed here are perfect, or fit all of the ideal label qualities. The best lead is labelme. Not only does it support multiple label types, but also can also import modified JSON files. It doesn’t run in a browser, but can be containerized and run on any system with X11 support.

5.1 Labeler Considerations

  • No labeler that I have come across does online video transformation; all expect images (frames) to be pre-processed

  • Because frames are pre-processed, we need to consider the framerate of the original video, the sample rate of the classification model, and the frame extraction rate for the labelers (a few seem to default to extracting 5 fps)