Object detection with YOLO and extreme clicking in a semi automatic combination

Published in

GoPenAI

12 min readMar 28, 2023

We just unlocked the secrets of semi-automatic annotation with a touch of coolness! By combining extreme clicking with a powerful tiny YOLO v4, we made annotation a breeze. Get ready to revolutionize your annotation game!

Get ready to save your time for valuable work!

Extreme clicking is a powerful tool that can save you time on machine learning tasks [1].
To test further optimizations, I combined this tool with a tiny YOLO v4. The results show that the semi-automatic method can save much more time. The method was tested on a dataset from Open Image. There were 443 images containing 689 objects in five different classes.

Extreme Clicking

With extreme clicking, objects are marked by setting points. The resulting bounding boxes are calculated from the coordinates of the points. In order to calculate the frame around an object, all maxima of the object must be represented by the setting points. These maxima include the left, upper, right and lower borders of the object. At least two points are needed to calculate a frame.

The picture shows on the left side the annotation of an airplane by three points. A similar aircraft is shown on the right. This is marked with 6 dots. Only three points are needed to calculate the bounding box. These points describe the position and size of the red frame by their coordinates. The red frame represents the resulting box. We don’t always have to look for the lowest number of points, as the example on the right shows. The points can be intuitively marked. Time may be saved by doing this when looking for the ideal location.

The fact that the spots are joined by lines is a benefit of the extreme clicking utilized here. This makes identifying which points correspond to which item quite simple.

Annotation Process

The annotating procedure is shown below. The Nvidia Triton Inference Server is used to save learned model versions. We utilize the annotation software LOST for manual annotations by intense clicking.

At first, we manually annotate with LOST a small dataset, which should include all object classes in the whole dataset. Then we calculate the bounding box for the annotated objects from the point coordinates by extreme clicking. With this initial set of images, we can train the tiny YOLO v4 for the first time and store it on a Nvidia Triton Inference Server. For the rest of our data, we can now tag unlabeled images in a loop. Following the first model training, we may change the bounding box predictions of the model or use extreme clicking to indicate missing items.

Results

The results show that compared to a manual approach using bounding boxes, 56% of time can be saved. For 689 objects, the needed time was reduced from 191 to 84 minutes. At the same time, the data quality achieved by semi-automatic extreme clicking reached a mAP of 76%. Furthermore, 579 of the annotated objects achieved an IOU of more than 70%.

Let’s examine the results from the semi-automatic extreme clicking experiment. We have a look at the trends in processing time, number of actions and annotation speed. In the last part, we look at the quality of the annotation data.

Processing time

The processing time is the amount of time needed to label 100 images with objects. The trend indicates that constant model training causes the time to decrease steadily. Less time is required for 100 images per iteration, the more accurately the model predicts the objects. According to the findings, it takes 25 minutes to complete the first set of images without model support. This can be reduced to 12 minutes. The fifth iteration displays a marginal increase. This could be because of the evaluation procedure. The final set of images includes 43 images. This set’s measured time is extrapolated to 100 images.

Number of actions

Following that, we examine the annotator’s actions. The actions listed below have been tracked.

Add points
Edit points or model boxes
Delete points or model boxes
Move points or model boxes
Assign object classes

Without the assistance of a model, 445 actions are carried out for 100 images. With the model’s prediction, the number is reduced to 269 actions. Over the course of all runs, the number of necessary actions decreases. The evaluation process, like the processing time, has an effect on the last run.

Annotation speed

Finally, we look at the resulting annotation speed. This explains how long it takes to annotate an object. The time in which the annotator performs an action is measured. It doesn’t depend on how long someone views an image. The speed curve reveals a decrease in time per object from 3.3 to 1.1 seconds. This represents a 71% improvement.

mAP annotation data

The AP describes the resulting quality of the annotation data. It shows how well an object is correctly recognized and how well it is recognized again. The mAP is the average of the AP of all classes for a detection task.
In our example, the mAP refers to the recognition of 5 classes.

The quality of the annotation data drops from a mAP of 90% to 69% from the first 100 images to the fourth iteration. In the last run, the quality showed an increasing trend and reached 80% mAP. It can be assumed that further runs from the fifth onward will lead to a further increase in quality.

The reason for the falling quality is the procedure used during the experiment. The first 100 images are manually annotated. With this package, the model is trained for the first time. From the second iteration on, predictions are received from the AI model. If the suggestions describe 70% of the object, the box is not processed by annotators. Since the model is not yet as good in the first runs, this results in poorer annotation quality at the beginning. This gets better and better as the model training continues.

Are you interested in providing your own results for my experiment set-up? Then follow my step-by-step guide for semi-automatic, extreme clicking annotation below!

1. Semi-automatic extreme clicking step-by-step

This section describes how to use semi-automatic extreme clicking. First, we install the necessary software. Then we use an example to demonstrate the use of semi-automatic extreme clicking. Finally, we evaluate the performance of our annotation.

1.1 Installation

In this section, we will setup everything we need for our first example. All installations refer to the installation under Ubuntu 20.04.5 LTS with Python 3.8. All the paths that are given correspond to my system.

PIP

sudo apt install python3-pip

Docker

For LOST and Nvidia Triton we need a Docker installation
And the Docker Compose Plugin

 sudo apt-get update
 sudo apt-get install docker-compose-plugin

LOST

Clone LOST

git clone https://github.com/l3p-cv/lost.git

Install the cryptography package in your python environment

pip install cryptography boto3

Run quick setup script

cd lost/docker/quick_setup/
python3 quick_setup.py ~/ml_software/lost --release 2.0.0-alpha.29

Run LOST

cd ~/ml_software/lost/docker; docker-compose up

Open the browser and enter this url

http://localhost/

username: admin
password: admin

Import Pipeline

* LOST -> Admin Area -> Pipeline Projects -> Import Pipeline Project

Import/ update the url of the semi automatic pipeline

Git Url = https://github.com/l3p-cv/lost_experiments.git

Check everythink works

* LOST -> Start Pipelines -> look for tiny_yolo_triton_sia_loop_exp

Everything is ready. You can see the pipeline tiny_yolo_triton_sia_loop_exp in the overview.

2. Example

Once everything is installed, we can test the semi-automatic extreme clicking on an example. We will run the example with my file paths and settings. You will need to adapt them to your system. The images are from the open image database. We will annotate 443 images with 689 objects. The dataset contains the 5 classes bus, fox, person, airplane and bicycle.

2.1 Download the images from open image

% For user in path choose your own system user

mkdir ~/semi_tutorial/
cd ~/semi_tutorial/

wget https://raw.githubusercontent.com/openimages/dataset/master/downloader.py
wget https://raw.githubusercontent.com/l3p-cv/lost_experiments/main/data/image_list.txt

mkdir ~/semi_tutorial/images

python3 downloader.py image_list.txt --download_folder=/home/$USER/semi_tutorial/images --num_processes=5

2.2 Start semi automatic pipeline in LOST

Download label tree of the 5 classes

cd ~/semi_tutorial/
wget https://raw.githubusercontent.com/l3p-cv/lost_experiments/main/data/label.csv

Start LOST

http://localhost/login

username: admin
password: admin

Import label tree in LOST

* LOST -> Labels -> Import -> choose label.csv

Import images in LOST

* LOST -> Datasources -> admin -> click Browse

* Choose Create folder

* Enter folder name: semi_tutorial

* Open the folder semi_tutorial

* Upload files to this folder by drag n drop the downloaded images from 
  step 1. to the field with the red box in picture below

* Click upload

Start the pipeline

* LOST -> Start Pipeline -> choose tiny_yolo_triton_sia_loop_exp -> start

Now we configure the pipeline shown below as follows

Configure Datasource Block

* Click the "Datasource" Block
* Select Datasource -> admin -> mark semi_tutorial -> okay
* Now "Datasource" Block is green

Configure Script Block

* Click "Script" Block
* model_name = semi_tutorial
* url = 192.168.1.23 (ip of your system)
* img_batch = 100

Configure Annotation Block

* Click "Annotation Task" Block

* Info
  * Name = semi tutorial task
  * Instruction = no more time is lost to me

* Annotators
  * Choose admin

* Label tree
  * Choose open image

* Labels 
  * Click on open image in the tree

* Configuration
  * Activate Line
  * Deactivate Bbox, Polygon, Point

* Click Okay

* Now "Annotation Task" Block is green

Edit pipeline information

* Pipeline Name = semi tutorial pipeline
* Pipeline Description = we have configured our first pipeline in LOST

Start pipeline

* Click "Start Pipe"

2.3 Start annotation

This is the initial image set to train our tiny YOLO v4 for the first time
It may take a moment for the annotation task to be displayed

* LOST -> Annotation -> semi tutorial task -> "Annotate"

We annote the objects of the first 100 images by extreme clicking
objects to annotate are fox, airplane, person, bus, bicycle
how extreme clicking work

* The image below shows an example of how objects
  are annotated by „extreme clicking"
* Choose the line option (red box)
* Set points by right mouse click
* To finish press "enter" and choose a class
* It is also possible to finish by double click right

* To calculate a bounding box, all maxima of an object
  are represented by the set points
* Maxima of the object are left-upper and right-bottom edge of the object
* At least two points must be defined for this
* It is not necessarily important to set few points.
  Sometimes it is faster to intuitively place many points than to
  explicitly search for only two crucial ones
* In the example picture the four points represent the maximas of the object

Finish the annotation task

* In the picture below the part of the bicycle is annotated by 3 points
* To finish the task click paper plane in the red box

2.4 Export annotation data

* LOST -> Pipelines -> semi tutorial pipeline -> open
* Click "Data Export" block
* Download LOST_Annoation_0.parquet
* Download labels_loop_0.json

Save downloaded annoation data

mkdir ~/semi_tutorial/anno_data
mv ~/Downloads/LOST_Annotation_0.parquet ~/semi_tutorial/anno_data/
mv ~/Downloads/labels_loop_0.json ~/semi_tutorial/anno_data/

2.5 Model train configuration

Clone model train repository

cd ~/semi_tutorial
git clone https://github.com/l3p-cv/lost_yolov3_tf2.git
pip install -r ~/semi_tutorial/lost_yolov3_tf2/requirements.txt

I edit the configs.py with vs code

cd ~/semi_tutorial/lost_yolov3_tf2/yolov3
code configs.py

Edit the configs.py

Line 40: TRAIN_CLASSES = os.path.expanduser("~/semi_tutorial/anno_data/labels_loop_0.json")
Line 41: TRAIN_ANNOT_PATH = "model_data/model_train.txt"
Line 42: TRAIN_IMG_PATH = os.path.expanduser("~/semi_tutorial/images")
Line 43: TRAIN_ANNO_DATA_PATH = os.path.expanduser("~/semi_tutorial/anno_data/")
Line 61: TEST_ANNOT_PATH = "model_data/model_test.txt"
Line 76: MODEL_PATH = os.path.expanduser("~/semi_tutorial/model_repo/semi_tutorial")

2.6 Model train

The model train is based on the tutorial of PyLessons [2]
Convert parquet file to train and test files

cd ~/semi_tutorial/lost_yolov3_tf2
python3 convert_parquet_to_yolo.py

Start model training

python3 train.py

When the training is finished after 200 epochs, we convert it and save it as a Tensorflow model.

python3 convert_to_pb.py

2.7 Start Triton Inference Server

Run Server with GPU

docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ~/semi_tutorial/model_repo:/models nvcr.io/nvidia/tritonserver:22.09-py3 tritonserver --model-repository=/models --model-control-mode=poll --repository-poll-secs=60

If Triton can’t run with GPU, use this command

docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ~/semi_tutorial/model_repo:/models nvcr.io/nvidia/tritonserver:22.09-py3 tritonserver --model-repository=/models --model-control-mode=poll --repository-poll-secs=60

The server checks for new models every 60 seconds. Nothing else needs to be done.

Edit config for next training

cd ~/semi_tutorial/lost_yolov3_tf2/yolov3
code configs.py

Line 51: False
Line 52: True

2.8 Annotate next image package

* LOST -> Annotation -> semi tutorial task -> "Annotate"

We get bounding boxes predictions from trained model
We can edit or delete these boxes
Edit = move, change size or update class
In my experiment, I only processed predictions from the model if less than 70% of the object was marked by the bounding box or a wrong class was assigned.
We use extreme clicking described in step 3 to annotate objects that are not suggested by the model.

2.9 Export annotation data

* LOST -> Pipelines -> semi tutorial pipeline -> open
* Click "Data Export" block
* Download LOST_Annoation_1.parquet

Save downloaded annoation data

mv ~/Downloads/LOST_Annotation_*.parquet ~/semi_tutorial/anno_data/

2.10 We are in the loop

We repeat steps 6, 8 and 9 until the entire dataset of 443 images is annotated.
After the last iteration run the parquet_to_yolo.py for a last time
the reson is to convert the points to bounding boxes for evaluation in the last chapter of this blog

cd ~/semi_tutorial/lost_yolov3_tf2
python3 convert_parquet_to_yolo.py

Finish

Congratulations, we have annotated our first dataset with a semi-automatic extreme clicking pipeline. If you are interested, experiment a little with the pipelines and scripts yourself. You can also check the other pipelines in LOST. In the next chapter, we will look at our performance results from the annotation.

3. Evaluation

In this chapter, we will check the performance of the semi-automatic extreme clicking. The course of actions, working time and process time are evaluated. Furthermore, we look at the quality of our annotation data compared to the ground truth. First, we check the mAP of the trained model and then the mAP of the annotation data. Finally, we look at how many of our annotations achieve an IOU of 70% to the ground truth. For the evaluation, we use a Jupyter notebook.

Download model data

* LOST -> Pipelines -> semi tutorial pipeline -> click "open" -> "Data Export" block

* download all Model_Annotation_*.parquet (there are 4 model parquets)

mkdir ~/semi_tutorial/model_data
mv ~/Downloads/Model_Annotation_*.parquet ~/semi_tutorial/model_data/

Download files

cd ~/semi_tutorial/
wget -O ground_truth_open_image.parquet https://github.com/l3p-cv/lost_experiments/blob/add_evaluation/data/ground_truth_open_image.parquet?raw=true
wget https://raw.githubusercontent.com/l3p-cv/lost_experiments/add_evaluation/requirements.txt
wget https://raw.githubusercontent.com/l3p-cv/lost_experiments/add_evaluation/evaluation/evaluation.ipynb
pip install -r requirements.txt

Open evaluation notebook

jupyter notebook evaluation.ipynb

We run all cells or one by one
The notebook includes scripts to evaluate time, actions, model mAP, annotation data mAP and IOU
For example, the picture shows the course of our processing time. Every iteration represent annoation of 100 images.

Some final words

The process and the results come from my internship and bachelor thesis at the company L3bm from Fulda.

References:

[1] Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller and Vittorio Ferrari. Extreme clicking for efficient object annotation, 2017

[2] Rokas Liuberskis, Training YOLO Mnist Object Detection with TensorFlow 2 https://pylessons.com/YOLOv3-TF2-custrom-train