Upskill yourself in Minutes with YOLO: Learn AI Object detection with examples
Implementing AI for Object Detection isnt hard. Using YOLO we can learn the usage of AI and setup object detection with ease. In this article we will learn to setup a table detection using the new YOLOv8 model. Follow through the tutorial at the end to get it working practically.
Breaking the Myth: Object Detection Isn't Hard as Thought
Hearing about AI and object detection can create an illusion among developers that doing such things is far beyond the reach of traditionally trained programmers.
But that's not the case. Object detection is easy to set up and only requires a few minutes of your time.
It's a computer vision technique that works to identify and locate objects within an image or video. For example, traffic surveillance systems, self-driving cars, and facial recognition systems all employ this technology to track down vehicles, faces, and other objects of interest.
This article will use YOLO (You Only Look Once) for performing the object detection tasks.
Object Detection with YOLO
Introduction to YOLOV8
YOLOv8 (You Only Look Once) is an open-source Computer Vision AI model released on January 10th, 2023. It’s called YOLO because it detects everything inside an image in a single pass. The new version can perform image detection, classification, instance segmentation, tracking, and pose estimation tasks.
The new v8 has better performance and flexibility. This is pre-trained on COCO (Common Objects in Context) and ImageNet datasets.
Using YOLO: An Example
YOLO can be used for a wide variety of applications and use cases. Here is an example of borderless table detection. A detailed section on implementation is presented at the end of the article.
The Evolution of YOLO
YOLO has eight versions in total, with each subsequent version improving upon the previous one. Initially introduced in 2015, it has since become one of the most popular object detection algorithms worldwide.
YOLO V1
Released: 2015
Paper title: You Only Look Once: Unified, Real-Time Object Detection
Authors: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
YOLO V1 used a convolutional neural network (CNN) to predict bounding boxes for objects in the image. It is very fast, compared to other models.
Drawbacks:
- Not as accurate as other object detection algorithms at the time, as it had issues with large number of false positive detections.
YOLO V2
Released: 2016
Paper title: YOLO9000: Better, Faster, Stronger
Authors: Joseph Redmon, Ali Farhadi
It introduced several improvements, which made YOLO V2 more accurate and faster than YOLO V1.
Drawbacks:
- It still had some drawbacks like difficulty detecting smaller objects.
YOLO V3
Released: 2018
Paper title: YOLOv3: An Incremental Improvement
Authors: Joseph Redmon, Ali Farhadi
It introduced several more improvements over YOLO v2. The number of false positive detections got reduced and accuracy got improved.
Drawbacks:
- May not be ideal for using niche models where large datasets can be hard to obtain.
YOLO V4
Released: 2020
Paper title: YOLOv4: Optimal Speed and Accuracy of Object Detection
Authors: Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao
It introduced several new features which made YOLO v4 more accurate and faster than YOLO v3.
Drawbacks:
- YOLOv4 models are generally larger. This can lead to higher memory consumption and slower inference speeds.
YOLO V5
Released: 2020
YOLOv5 was specifically designed for high scalability, making it adept at deployment on diverse devices, ranging from powerful GPUs to even low-power mobile devices.
Drawbacks:
- While YOLOv5 offers different sized models with varying accuracy levels, the most accurate models (e.g., YOLOv5x) can be computationally expensive and require powerful hardware for real-time inference.
YOLO V6
Released: 2022
Paper Title: YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications
Authors: Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, Xiaolin Wei
The sixth version of YOLO was released in 2022. YOLOv6 focused on optimizing the architecture for hardware.
Drawbacks
- Less flexibility for customization: Making it harder to customize and fine-tune for specific tasks.
YOLO V7
Released: 2022
Paper title: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Authors: Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao
YOLO V7 introduced support for detecting poses, which is based on the COCO Image Dataset. YOLO v7 is more versatile than YOLO v6.
Drawbacks:
- Slower compared to yolov8.
YOLO V8
Released: 2023
YOLO V8 is a cutting edge model that builds upon the success of previous YOLO versions and introduces new features and improvements for enhanced performance, flexibility, and efficiency. YOLOv8 supports a full range of vision AI tasks, including detection, segmentation, pose estimation, tracking, and classification.
More on YOLOV8: Exploring Its Internals and Features
Inner Workings of YOLOV8
Anchor-Free Detection
One of the key features of YOLOv8 is its use of anchor-free detection.
Anchor boxes are predefined boxes that are used to represent objects in an image.
YOLOv8 does not use anchor boxes, instead, it predicts the centre, dimensions, and class of an object directly from the image. This makes YOLOv8 more accurate and faster than previous YOLO versions.
Mosaic data augmentation
Mosaic data augmentation is a technique used to increase the size and diversity of a training dataset for object detection models. It involves randomly selecting four images from the training set and stitching them together into a single composite image. This composite image is used to train the object detection model.
Modern object detectors consist of the following Backbone: The backbone is a convolutional neural network (CNN) that extracts features from an input image. YOLOv8 uses a variety of different backbones.
Neck: The neck is a series of layers that combine features from different levels of the backbone. This allows YOLOv8 to detect objects of different sizes at different locations in the image.
Head: The head is a series of layers that predict bounding boxes for objects in the image. YOLOv8 uses a new anchor-free detection head, which eliminates the need for anchor boxes and improves the accuracy of object detection.
YOLOV8 capabilities
Classify: Image classification involves classifying an entire image into one of a set of predefined classes. For example, an image can belong to the classes "Person", "Tripod" and "Safety Vest".
Detection: Detection is a task that involves identifying the location and class of objects in an image or video stream.
Segment: Segmentation goes a step further than object detection and involves identifying individual objects in an image and segmenting them from the rest of the image.
Track: Tracking the movement of objects over time in a sequence of images or video frames.
Pose: Identifying the location of keypoints, or landmarks, on an object, Imagine you have a picture of a person striking a pose. Pose estimation can analyze this image and identify the precise location of keypoints like elbows, knees and wrists.
Next, we can dive into a Detection example implementation. Follow through the tutorial locally or using Google Colab (Recommended) to try this quickly.
Setup Table Detection Using YOLOV8: A Detailed Guide
To extract the tabular data, you must initially determine the table’s location within the document. Table detection is facilitated by a Python ultralytics library. It can download lightweight YOLO models that have a range of advanced features like object detection, classification, segmentation, etc.
Install Dependencies
Install pytesseract, a Python library and its dependencies for OCR(Optical Character Recognition). Using this we will extract text information from an image.
!sudo apt install tesseract-ocr !pip install pytesseract transformers ultralyticsplus==0.0.23 ultralytics==8.0.21
Initialize the Imports
We use numpy for numpy arrays which is used to represent the table coordinates. Pytesseract for OCR,ultralyticsplus for calling YOLO.
import numpy as np import pytesseract from pytesseract import Output from ultralyticsplus import YOLO, render_result from PIL import Image # Load the initial image image = './borderless_table.jpg' img = Image.open(image) img
This will display the initial document that we will be using for table detection.
Initial Document
Load the YOLOv8-table-extraction model
This model was taken from awesome-yolov8-models authored by @_keremberke. Big thanks for making this model open-source. This model was made by tuning YOLOv8 with a custom dataset of similar tables.
model = YOLO('keremberke/yolov8m-table-extraction') # set model parameters model.overrides['conf'] = 0.25 # NMS confidence threshold model.overrides['iou'] = 0.45 # NMS IoU threshold model.overrides['agnostic_nms'] = False # NMS class-agnostic model.overrides['max_det'] = 1000 # maximum number of detections per image
Detect the Table
Call the predict method to get the borderless table marked using a bounding box.
results = model.predict(img) # observe results print('Boxes: ', results[0].boxes) render = render_result(model=model, image=img, result=results[0]) render
This code renders the result obtained after table detection.
Result of Table Detection
Next, we can try to get the table contents as text using OCR.
Cropping the image and performing OCR
boxes_data = results[0].boxes.data.cpu().numpy() x1, y1, x2, y2, _, _ = tuple(int(item) for item in boxes_data[0]) img = np.array(Image.open(image)) cropped_image = img[y1:y2, x1:x2] cropped_image = Image.fromarray(cropped_image) cropped_image
This is the cropped version of the detected table.
Cropped Table
# conversion to text text = pytesseract.image_to_string(cropped_image) print(text)
OCR will return the following results.
Results of OCR
Conclusion
The world of AI is vast, but you can easily get started and push your boundaries, as we have shown in this article.
Getting started takes only a few minutes, but the insights gained will fuel your desire for more knowledge.
YOLO isn't the only option, but it's a good starting point for developers who want to get involved in this technology. Bigger and more complex projects are possible; the demo here is just the beginning.