MobileNet SSD Object Detection using OpenCV 3.4.1 DNN module

MobileNet SSD Object Detection using OpenCV 3.4.1 DNN module

In this post, it is demonstrated how to use OpenCV 3.4.1 deep learning module with MobileNet-SSD network for object detection.

As part of Opencv 3.4.+ deep neural network(dnn) module was included officially. The dnn module allows load pre-trained models from most populars deep learning frameworks, including Tensorflow, Caffe, Darknet, Torch. Besides MobileNet-SDD other architectures are compatible with OpenCV 3.4.1 :

  • GoogleLeNet
  • YOLO
  • SqueezeNet
  • Faster R-CNN
  • ResNet

This API is compatible with C++ and Python.  : – )

Code description

In this section, We’ll create the python script for object detection and it is explained, how to load our deep neural network with OpenCV 3.4 ? How to pass image to neural network ? and How to make a prediction with MobileNet or dnn module in OpenCV?.

We use a MobileNet pre-trained taken from https://github.com/chuanqi305/MobileNet-SSD/ that was trained in Caffe-SSD framework. This model can detect 20 classes.

Load and predict with deep neural network module

First, create a python new file mobilenet_ssd_python.py put the following code, here we import the libraries:

#Import the neccesary libraries
import numpy as np
import argparse
import cv2


Next, add the parser command lines:

# construct the argument parse 
parser = argparse.ArgumentParser(
    description='Script to run MobileNet-SSD object detection network ')
parser.add_argument("--video", help="path to video file. If empty, camera's stream will be used")
parser.add_argument("--prototxt", default="MobileNetSSD_deploy.prototxt",
                                  help='Path to text network file: '
                                       'MobileNetSSD_deploy.prototxt for Caffe model or '
parser.add_argument("--weights", default="MobileNetSSD_deploy.caffemodel",
                                 help='Path to weights: '
                                      'MobileNetSSD_deploy.caffemodel for Caffe model or '
parser.add_argument("--thr", default=0.2, type=float, help="confidence threshold to filter out weak detections")
args = parser.parse_args()

The above line establish the following arguments:

  • –video: Path file video.
  • –prototxt: Network file is .prototxt
  • –weights: Network weights file is .caffemodel
  • –thr: Confidence threshold.

Next, we define the labels for the classes of our MobileNet-SSD network.

#Labels of network.
classNames = { 0: 'background',
    1: 'aeroplane', 2: 'bicycle', 3: 'bird', 4: 'boat',
    5: 'bottle', 6: 'bus', 7: 'car', 8: 'cat', 9: 'chair',
    10: 'cow', 11: 'diningtable', 12: 'dog', 13: 'horse',
    14: 'motorbike', 15: 'person', 16: 'pottedplant',
    17: 'sheep', 18: 'sofa', 19: 'train', 20: 'tvmonitor' }

Next, open the video file or capture device depending what we choose, also load the model Caffe model.

# Open video file or capture device. 
if args.video:
    cap = cv2.VideoCapture(args.video)
    cap = cv2.VideoCapture(0)

#Load the Caffe model 
net = cv2.dnn.readNetFromCaffe(args.prototxt, args.weights)

On line 36, pass the arguments prototxt and weights to the function, after that we loaded correctly the network.

Next, we read the video frame by frame and pass to the frame to network for detections. With the dnn module is easily to use our deep learning network in OpenCV and make predictions.

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    frame_resized = cv2.resize(frame,(300,300)) # resize frame for prediction

On line 40-41, read the frame from video and resize to 300×300 because it is the input size of image defined for MobileNet-SSD model.

    # MobileNet requires fixed dimensions for input image(s)
    # so we have to ensure that it is resized to 300x300 pixels.
    # set a scale factor to image because network the objects has differents size. 
    # We perform a mean subtraction (127.5, 127.5, 127.5) to normalize the input;
    # after executing this command our "blob" now has the shape:
    # (1, 3, 300, 300)
    blob = cv2.dnn.blobFromImage(frame_resized, 0.007843, (300, 300), (127.5, 127.5, 127.5), False)
    #Set to network the input blob 
    #Prediction of network
    detections = net.forward()

After the above lines, we obtain the prediction of network, it simply to do in three basic steps:

  • Load an image
  • Pre-process the image
  • Set the image as input of network and obtain the prediction result.

The usage for dnn module is essentially the same for the others networks and architecture, so we can replicate this for own trained models.

Please, help us to create a community, follow us in instagram 

Visualize object detection and prediction confidence

In conclusion, after that previous steps, new questions arise, How to get the object location with MobileNet ? How to know the class of object predicted ?  How to get confidence of prediction ? Let’s go!

We must read detections array for get the prediction data of neural network, the following code do this:

    #Size of frame resize (300x300)
    cols = frame_resized.shape[1] 
    rows = frame_resized.shape[0]

    #For get the class and location of object detected, 
    # There is a fix index for class, location and confidence
    # value in @detections array .
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2] #Confidence of prediction 
        if confidence > args.thr: # Filter prediction 
            class_id = int(detections[0, 0, i, 1]) # Class label

            # Object location 
            xLeftBottom = int(detections[0, 0, i, 3] * cols) 
            yLeftBottom = int(detections[0, 0, i, 4] * rows)
            xRightTop   = int(detections[0, 0, i, 5] * cols)
            yRightTop   = int(detections[0, 0, i, 6] * rows)

We make a loop(line 62)  for read the values. Then, on line 63 we get the confidence of prediction and next line filter with threshold value. On line 65, get the label. On lines 68 – 71, get the corners of object.

With all the information about object predicted, the last step is display the results. The next code draw object detected and display its label and confidence in frame.

            # Factor for scale to original size of frame
            heightFactor = frame.shape[0]/300.0  
            widthFactor = frame.shape[1]/300.0 
            # Scale object detection to frame
            xLeftBottom = int(widthFactor * xLeftBottom) 
            yLeftBottom = int(heightFactor * yLeftBottom)
            xRightTop   = int(widthFactor * xRightTop)
            yRightTop   = int(heightFactor * yRightTop)
            # Draw location of object  
            cv2.rectangle(frame, (xLeftBottom, yLeftBottom), (xRightTop, yRightTop),
                          (0, 255, 0))

            # Draw label and confidence of prediction in frame resized
            if class_id in classNames:
                label = classNames[class_id] + ": " + str(confidence)
                labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)

                yLeftBottom = max(yLeftBottom, labelSize[1])
                cv2.rectangle(frame, (xLeftBottom, yLeftBottom - labelSize[1]),
                                     (xLeftBottom + labelSize[0], yLeftBottom + baseLine),
                                     (255, 255, 255), cv2.FILLED)
                cv2.putText(frame, label, (xLeftBottom, yLeftBottom),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0))

                print label #print class and confidence 

    cv2.namedWindow("frame", cv2.WINDOW_NORMAL)
    cv2.imshow("frame", frame)
    if cv2.waitKey(1) >= 0:  # Break with ESC 

Last on lines 99-93, display the image of frame normal and resize to screen.


The code and MobileNet trained model can be downloaded from:


Follow Us

, , , , , , , ,
  • Michael

    Amazing work guys!

    • Edgar Florez

      thanks so much Mr Michael

    • Edgar

      thanks so much Michael

  • iraq2010

    thanks for tutorial and sharing your knowledge ….

  • Ali Taheri

    thank you for sharing these great possibilities.
    I’m new to DNN.
    I’m trying to create my own Pre-trained modul and use it in Python & opencv.
    Is there anyone can help me that how can I train my own DNN modul with my dataset?
    thank you.

    • Thanks! 🙂 in repo https://github.com/djmv/MobilNet_SSD_opencv there is an issue where response your question.

      • Ali Taheri

        Thank you for your attention.
        but I could not find anything helpful or related information to my issue in that link.

    • Arpit Gupta

      CV2 DNN module is not meant for training the Networks. or training, use Tensorflow/Pytorch or Caffe

      • Ali Taheri

        you are right.
        Do you know any straight forward application that can get the pictures (Targets and not Targets) and train the network.
        I’m trying to do it with Tensorflow but moslty ends to some erros which very difficult to figure it out.
        Thank you

  • Ankur Mahtani

    Thanks for sharing your great work!!
    I’m using your program for real time recognition with a real sense camera.

    But I need to use my own custom classes for recognition…
    Do you know a way to add classes (and remove others) ?

    Changing “classNames” list values doesn’t work.
    Thank you 🙂

  • Karthik Devaraj

    Hello friend, Very nice tutorial. I have tried your code and achieved the same. Here I would like to count the number of detection. tried with several things but couldnt able to achieve that. Your help means a lot to me.

  • Sohib

    could you share your practices about training your own dataset on mobilenetssd? As I can see the comments I think it’d be helpful to everyone. Thanks

  • mehio hatab

    How do I load my own mobilenet H5 file?
    In this code the network and weights are in separate files.
    I already have a pre trained mobilenet H5 file that i want to use with real time tracking and classification or labelling.

    Thank you!

  • Abhimanyu Aryan

    is YOLO TINY also compatible with OpenDNN?