Getting Started with YOLO

Introduction

Remember the object detection approaches we covered earlier? YOLO revolutionized this field when it was introduced in 2015. Instead of using complex pipelines or scanning an image multiple times, YOLO takes a refreshingly simple approach: it looks at the image just once (hence the name "You Only Look Once") to detect all objects.

Historical Context

When YOLO was first released, object detection systems were complex multi-stage pipelines. The original paper titled "You Only Look Once: Unified, Real-Time Object Detection" by Joseph Redmon et al. introduced a radically different approach that would change the field forever.

The YOLO Approach

Let's break down how YOLO works in simple terms:

Grid Division: YOLO first divides your image into a grid (say 13x13).
Grid Cells Predictions: Each cell in the grid is responsible for predicting objects centered in that cell. Each cell predicts a certain number of bounding boxes and confidence scores for those boxes. A confidence score reflects how confident the model is that the box contains an object and also how accurate it thinks the box is.
Bounding Box Parameters: Each bounding box has five predictions: x, y, w, h, and a confidence score. (x, y) coordinates represent the center of the box relative to the bounds of the grid cell. Width (w) and height (h) are predicted relative to the whole image. Finally, the confidence score represents the likelihood that the box contains an object and how accurate the box is.
Class Predictions: In addition to predicting bounding boxes, each cell also predicts class probabilities. These probabilities are conditioned on the grid cell containing an object.
Combining Predictions: The bounding box predictions and class predictions are combined to create a complete detection. If a grid cell is confident that it contains an object, and if the predicted class score is high, then it’s a strong detection.
Non-Max Suppression: Since YOLO predicts multiple boxes for each grid cell, it uses a technique called non-max suppression

Here's a visualization of how YOLO divides an image and makes predictions:

YOLO Grid System (Source: Jonathan Hui on Medium)

Task: YOLO Approach

Watch the following video about YOLO and answer the questions below:

Basics:

In basic object detection, what two main things does YOLO need to determine about an object?
What happens when no object is detected in a grid cell?
True or False: YOLO must always use a 4x4 grid to divide images.
Which is faster at detecting objects: YOLO or older methods like R-CNN?
What are the components of YOLO's 7-dimensional output vector for a single grid cell prediction?

Advantages and Disadvantages

Let's examine what makes YOLO special and where it might not be the best choice:

Advantage
- Speed: YOLO is blazingly fast, capable of processing images in real-time
- Accuracy: Despite its speed, it maintains good detection accuracy
- Global Context: By looking at the entire image at once, YOLO understands context better than sliding window approaches
- Generalization: YOLO learns generalizable representations of objects
Disadvantage
- Small Objects: YOLO can struggle with detecting small objects, especially in groups
- Unusual Aspects: Objects in unusual aspects ratios or configurations might be missed
- Precision: While fast, it might not be as precise as two-stage detectors for some applications

YOLO Versions

The YOLO family has evolved significantly since its introduction in 2016. Each version brought important improvements:

YOLOv1-v3 (2016-2018)

YOLOv1: First version, introduced the grid-based approach
YOLOv2/YOLO9000: Added anchor boxes, batch normalization
YOLOv3: Implemented feature pyramid networks, better backbone (Darknet-53)

YOLOv4-v5 (2020-2021)

YOLOv4: Introduced Mosaic augmentation, CSPNet backbone
YOLOv5: Brought PyTorch implementation, improved training methods

Latest Versions (2022-2024)

YOLOv6: Released by Meituan, optimized for industrial applications
YOLOv7: Improved architecture design and training strategies
YOLOv8: Ultralytics' flagship model with multi-task capabilities
YOLOv9: Introduced revolutionary new features
YOLOv10: Enhanced previous versions' capabilities
YOLOv11: Latest iteration with significant improvements

YOLO Version Comparison (Source: Ultralytics)

Installation and Setup

Getting started with YOLO is straightforward. You can use the Ultralytics implementation of YOLOv11, which offers a user-friendly API and excellent documentation.

First, create a new project folder and virtual environment and activate it:

📁 computer_vision/
├── 📁 .venv/
├── 📁 pics/
└── 📄 your_files.ipynb

Install the required packages:
```
pip install ultralytics
```

Verify your installation:

from ultralytics import YOLO
model = YOLO('yolov8n.pt')  # load a pretrained model

Installation Tips for Pro's

If you're using a GPU, make sure you have the correct CUDA version installed

Reading the Docs

The Ultralytics documentation is your best friend when working with YOLO. Here are the key resources you should bookmark:

Documentation Best Practices

Keep the API reference handy for specific function documentation
Check the examples section for common use cases
Join the Ultralytics Discord community for help and updates

What's Next?

In the upcoming sections, we'll dive deeper into:

Working with pretrained models
Analyzing images and videos
Preparing custom datasets
Training YOLO on your own data