Table of Contents
- 1 Introduction
- 2 Object Detection
- 3 Building Object Detection Systems
- 4 Object Detection using Microcontrollers
- 5 Edge Impulse
- 6 ESP32 Camera Boards
- 7 Capturing Images (Webcam & Edge Impulse)
- 8 Processing with Edge Impulse
- 9 Loading to ESP32-CAM
- 10 Using ESP32-CAM for Image Capture
- 11 Conclusion
If you’re after a simple and inexpensive Object Detection system, consider using an ESP32-CAM. This 9-dollar camera board is probably already in your parts’ drawer, and with the right training, it can be used for simple Object Detection.
Today we’ll see how to use Edge Impulse to train an ESP32-CAM board to detect objects.
When trying to keep pace with the dizzying speed of technology, one thing that never ceases to amaze me is how a cutting-edge application can go from costing millions of dollars to costing pennies, all in the space of a few years.
One such technology is Object Detection, the ability of a machine to recognize an object (or several objects) using a video camera. Only a few years ago, “playing” with applications like this demanded huge computers, often with big (and expensive) graphics cards.
We have also used SBCs like the NVIDIA Jetson Nano, which is essentially an AI-dedicated GPU, for these tasks. This sort of hardware is required to run the neural networks needed for machine learning.
But now we have a new generation of microcontrollers that can run a small version of machine language, something called Embedded ML or Tiny ML. We’ve already seen how well the Kendryte K210 performs in the DFRobot Husky Lens using this technique.
Today we’ll see that even the ESP32 is capable of simple object detection. If your needs are simple, you may be able to fulfill them with an inexpensive ESP32-CAM board.
With the reduction in size that comes with microcontrollers also comes a reduction in price, so it’s now possible to use object detection as part of an “edge computing” application.
Edge computing is a model of computing that involves processing data closer to its source or at the “edge” of the network, rather than relying on a central location like a main computer or cloud-based service.
By using low-cost cameras and microcontrollers, we can implement a system that can recognize specific objects and report their position back to a central controller or react directly to their presence.
ESP32 vs. Other Solution
I have already mentioned the DFRobot Husky Lens camera, it was the subject of an earlier article and video. We have also looked at an older camera, the Pixie Lens, which was an early hobbyist-level object detection camera.
These commercial cameras are more powerful than the solution that we are going to implement today with the ESP32. They have more features and they have more functions.
The setup we are looking at today using the ESP32-CAM can work well if you are willing to put the effort into training it. It is capable of recognizing quite a few objects, especially if you pay attention to the camera mounting, focus, and lighting.
But if you need a more featured or more robust solution for your application, then I suggest the Husky Lens instead. Today’s project is more of an exercise for those interested in learning about building object detection systems from scratch using inexpensive hardware.
Object detection is a field of computer vision that focuses on identifying and locating objects within images or video frames. It involves identifying the presence, location, and type of one or more objects in an image.
Object Detection is actually a combination of two imaging technologies:
- Object Classification
- Object Localization
These technologies are intertwined in many ways.
Object classification refers to the ability of a computer program to identify and categorize different types of objects in an image.
This is done by ‘training’ the computer with numerous labeled images, each identified as a particular object, like a bike, a tree, a person, etc. After this training, the computer can analyze a new image and correctly identify these objects.
Object Classification plays a fundamental role in many applications, including robotics, where machines need to understand their environment to operate effectively, or in healthcare, where it helps analyze medical images.
Object Localization takes Object Classification one step further and provides location coordinates for the target object.
Object Localization systems typically draw a “bounding box” around the target object. They are capable of recognizing several objects concurrently.
This technology is used in self-driving cars, robotics, security systems, and several other applications.
Building Object Detection Systems
Object detection systems work by building up a “model” of the object(s) they are designed to detect. They then use this model to analyze live videos for patterns that match this model.
In order to construct an object detection system, you’ll need to build a model and deploy it. You’ll have to go through the following steps:
1 – Gather Data
You’ll need pictures, lots of pictures, of the object you are trying to detect. You want to train the model on different views of the object from different angles, different distances, and under different lighting.
Some object detection systems are trained with thousands of images, but we can get pretty good results and keep it under a hundred, which is more manageable.
Not only will you need to gather pictures, but you’ll also need to go through each one and apply a label. This involves tracing out the object using a bounding box.
As you might expect, data gathering is the most time-consuming part of building a custom object detection system. Remember that the more time and effort you apply here, the better your system will work.
2 – Build a Model
After you have gathered and organized your data, you will need to run it through a neural network to build your model.
A Neural Network is a computing model that’s inspired by the structure of the human brain. It’s built with layers of nodes, commonly called neurons, which can be understood like the logic gates in electronics: they take one or more inputs, perform some computation, and generate an output.
The real power of Neural Networks, though, is in the connections between these neurons and how they’re weighted or prioritized. These networks are capable of learning from data and adapting their connections, which makes them excellent for complex tasks like pattern recognition or data prediction.
As you might imagine, a neural network takes a fair bit of computing resources. Many times, GPUs are used to take advantage of their incredible parallel-processing abilities.
There are also many cloud-based services that we can use to build our model, and we will be using one in a bit.
3 – Deploy the Model
Once the model is trained and built, it will need to be deployed so that it can be put to use.
The target system can be either the system the model was built on or another system with suitable video capabilities.
If the model is found to be inaccurate, it can be trained further using additional images and different neural network parameters.
Object Detection using Microcontrollers
Running neural networks and building complex object models isn’t something a 32-bit microcontroller like the ESP32 is designed to do. But we don’t need to run these operations on the microcontroller.
Instead, we will train our model using a cloud-based service, one that allows us to easily implement a powerful neural network to build and test our model. Once we are satisfied that our model is working, we can deploy it to our target – the ESP32-CAM board.
Machine Learning & Tiny ML
The model we are creating is coded for use with Machine Learning (ML), or more specifically, TinyML.
Machine Learning, a key branch of Artificial Intelligence, involves algorithms and statistical models that enable computer systems to improve their performance on tasks over time without being explicitly programmed to do so.
These systems “learn” from the data they encounter, adapting and optimizing their actions based on patterns and insights they glean. In effect, they “learn” similarly to humans.
TinyML, or Tiny Machine Learning, is an emerging field that brings machine learning capabilities to ultra-low power microcontroller-based devices. This would include the ESP32.
TinyML is sometimes referred to as Embedded ML. It allows machine learning workloads to run on small microcontrollers, performing edge computation and IoT tasks.
Using TinyML, we can fit our model onto an ESP32-CAM and use it for object detection.
The product that we will be using to create our TinyML model is called Edge Impulse.
Edge Impulse is a web-based application that will allow us to collect and label all of our images and then run them through a neural network to build a model. The application has a number of tools that we can use to train and test our model, and it is designed to be easy to use.
One great thing about Edge Impulse is that it can export our model to an Arduino Library, complete with sample code. You can just install the library into your Arduino IDE and load the code to an ESP32-CAM and test it out.
Edge Impulse Account
You’ll need an account in order to use Edge Impulse, but not to worry; a free account is enough to perform all of these experiments. They also have a Developers account for those who do this for a living.
Head over to Edge Impulse and set up your free account.
We have quite a few tasks to go through in order to deploy an object to the ESP32-CAM. It would probably be a good idea to look at the workflow, illustrated below, to keep track of what we need to do:
Here are the steps in a bit more detail:
- Collect Images – You’ll need to take pictures of your target object from different angles and views. For accurate object detection, you might need 50 or more images for each object.
- Label Images – This is the process of drawing bounding boxes around the objects and labeling them. It is a bit time-consuming, but it is an essential step.
- Train Model – Now that you have your images labeled, you can train a model using them. You’ll be setting up a neural network and running your data through it.
- Export Model – Once you have your model properly trained, you can export it to a format that is compatible with the ESP32-CAM board.
- Run Model – Run your model on the ESP32-CAM board and see if it works! You can always redo some of the training if the results are not what you had hoped for.
Keep this workflow in mind as you proceed through the steps.
ESP32 Camera Boards
Several ESP32 boards have integrated cameras, and you should be able to use most of them with the techniques described here. ESP32 camera boards are manufactured by Espressif, M5, and others.
Let’s take a look at two of the most popular boards.
By far, the most popular and probably the most inexpensive board is the ESP32-CAM board.
This board has been around for several years, it’s an open-source design, so it is sold under many different names and build qualities.
One disadvantage of this low-cost board is that it has no USB connection, so you will need to use either an FTDI adapter or a USB adapter board. The latter is much easier; your ESP32-CAM board just snaps onto it. These are often sold in conjunction with ESP32-CAM boards.
If you need more information about the ESP32-CAM board, check out the article I wrote about it and its accompanying video.
Espressif manufactures the ESP-EYE, and it’s a pretty high-quality build. In addition to the camera, it also has a MEMS microphone. It has a MicroUSB connector, so it is a lot easier to use than the ESP32-CAM board. It’s also smaller.
One disadvantage of this board, however, is that it doesn’t really expose any of the ESP32’s GPIO pins. The ESP32-CAM board, in contrast, gives you access to several GPIO pins.
Capturing Images (Webcam & Edge Impulse)
The first step in any object detection project is to get the data, specifically the images of the object you wish to identify. You can accomplish this in several ways:
- Using your Phone
- Using an external camera
- Using a Webcam
- Using the ESP32-CAM board
To begin, we will use the Webcam option, as it is easy to set up and it gives good results. We will also need to log into Edge Impulse, as we will use one of its utilities to control the webcam.
Webcam & Edge Impulse Setup
Any webcam that works with your computer should work with Edge Impulse.
The key to success in any object detection project is to get good, clear images. Images without shadows or reflections, and lots of them. So the camera setup can make or break the project. Not so much the camera specifications as the lighting and background.
If you have video lights, now is the time to break them out, but you can do a decent job with ordinary room lights. You’ll want to play with the lighting to get your subject well-lit with minimal shadows. It’s also a good idea to find an area with a neutral background, such as a wall.
If you’re lucky enough to have a webcam that accepts a standard ¼ inch tripod mount, you can use a small photo tripod to hold your webcam securely. Otherwise, you may have to improvise. Ensure the camera is securely mounted and can’t be easily moved.
Of course, you will also need an object or two to use for your experiments. Two objects are a good start; it is a good idea to start small and add more objects as you get used to the process.
Image Capture with Edge Impulse
Edge Impulse lets you use a variety of sources for your image data, including live images from connected devices. As we have a USB web camera connected, we can use live data to grab some images.
Set up your webcam “shooting area” as described above; you can fine-tune it once you have connected to Edge Impulse. Put your first object in the camera’s field of view.
Now log in to Edge Impulse.
The first time you log in, you will be greeted by a wizard that offers to assist you in building an object detection project. Although it’s a tempting offer, you should decline it, as we are going to do everything manually.
You’ll be on the main screen, which is your dashboard.
In the center of the screen are a group of boxes in a “Getting Started” area. One of them is labeled “Collect New Data.” Click it, and a pop-up window will appear.
The pop-up window gives you the choice of scanning a QR code for your phone, connecting to your computer, or connecting directly to your device. We will use a webcam attached to our computer, so select “Connect to your computer.”
Assuming that your webcam is connected correctly, you should get a box asking for permission to use it. Give your consent, and a new window will open, and the webcam will connect. You’ll see the image on the screen, along with a few labels and buttons.
The Label button will say “unknown”. Click on it, and a box will pop up. Give the image a label; in my case, I’ll be photographing a robot, so I’ll name mine “robot”.
You can leave the Category setting as it is, a split between Training and Testing data.
Now hit the Capture button. An image will be taken and labeled.
You’ll want to repeat this process with your object many times, I suggest at least 40 images. Try and photograph your object from different positions and angles. The more images you can take, the more data you will have to train your model later.
When you have finished with your first object, you can move on to the second one. Make sure to click the Label button to rename the label for the new object.
When you have finished taking images, you can close the window. Edge Impulse will still be in its original window, displaying a list of the images you have taken.
Processing with Edge Impulse
Now that we have a good selection of images, we need to work with them in Edge Impulse.
We will label our images and use them to build an “impulse,” which is Edge Impulse’s way of saying we build an ML model.
We’ll start by labeling our images.
If you’ve been following along, you should now be in the Data Acquisition screen with a collection of images. At the top of the screen are some statistics, along with three links – Dataset (currently active), Data Sources, and Labelling Queue. The Labelling queue will have a number beside it, which right now equals the number of images you have taken.
These are the images that require labeling. Yes, I know you set a label for them earlier, but that label was simply the image name. The label that we are working with now is the actual object label with a bounding box.
Click on the Labelling Queue link to go to the queue. It will open with the first image, along with instructions for moving your mouse to draw a bounding box.
Follow the instructions and draw a bounding box around the actual object in the image. You can use the arrow icon in the box to change its position.
Once you have the bounding box drawn and let go of your mouse button, a window will pop up for you to assign a label for the image. This is the object name; enter an appropriate one and click Set label.
Now click the Save labels button. The label will be saved, and you’ll advance to the next image. You will note that the count of the items left in the queue will also decrement.
Follow the same procedure for all the images, it can be time-consuming, but some diligent work here will pay off with good object detection performance.
When you come to an image of another object, use the “garbage can” icon on the bounding box to clear it. Draw a new bounding box, and you’ll be prompted to give it a new label.
Continue until you have labeled all the images in the queue.
Now that you have labeled all of your images, we can use them to build an “impulse”. This is Edge Impulse terminology for creating an ML model.
You’ll note that there is an “Impulse Design” section on the left side of your Edge Impulse dashboard. It will have at least one subsection named “Create Impulse” (there may or may not be other subsections). Click on Create Impulse.
The Impulse Design screen has three blocks. The first one is filled in for you; it is the images that we are using to train our model. The settings here are probably OK as they are, with an image width and height of 96px and a resize mode using the shortest axis.
Click Add Processing Block. A window will pop up, suggesting the Image processing block. This is exactly what you want, so select it by clicking the Add button. Save your selection.
Now click the Add Learning Block box. This time you will have two choices, select Object Detection (Images) by clicking Add and saving your selection.
Now click Save Impulse. You’ll note that the Create Impulse submenu item now has a green indicator, and that you have now moved to a new subsection called Image.
The only thing you need to set here is the Color Depth. Change it to Grayscale and click the Save Parameters button.
Now you will move on to the Generate Features screen. Click the Generate Features button; this process will take a few minutes as it analyzes and extracts the features from all the images.
Once this is done, you will see the Feature Explorer. Look for your objects having distinct groups of dots; the closer they are grouped, the better they are identified. They may be split into multiple clusters, which is fine. But hopefully, they won’t be mixed within a cluster.
Please note that the Images submenu is now green, and you have moved on to a third subsection, Object Detection.
This is where you will actually build the model using a neural network. You have a few parameters you can set for the neural network, I would suggest leaving them as they are the first time you use this. You can always go back and repeat these steps to improve your model, and you can try new settings.
You do need to choose a different object model, however, as the default one is too large for an ESP32-CAM.
Click on Choose a different model and choose “FOMO (Faster Objects, More Objects) MobileNetV2 0.1”. This is the only model, as of this writing, that will work with the ESP32-CAM boards.
Now click the Start Training button.
Your model will be run through the neural network. The training may take several minutes, during which time you’ll get a running tab of statistics.
When the training has finished, you’ll get a “Confusion Matrix” showing how successful your model is. A score of 100% is what you are aiming for, but it will work with a lesser score.
You have just trained an ML model using the images you took and Edge Impulse!
Export to Arduino Library
The final step with Edge Impulse is to export our ML model as an Arduino Library. We can then use this library with the Arduino IDE to program our ESP32-CAM board.
Look for the Deployment menu item on the left side menu, and click on it.
You’ll need to select a deployment device. We want Arduino Library, so select that. If there is already a selection, then click the search box to find the Arduino Library to replace it.
Now scroll down to the bottom of the screen and click Build.
After a few minutes, a ZIP file will be downloaded (you may or may not be prompted, depending upon your browser settings).
This file is your Arduino Library, ready to use with the Arduino IDE!
Loading to ESP32-CAM
If you have followed along up to this point, you should now have a ZIP file with a name similar to your project name. Keep track of this file and proceed to the next step.
You’ll need the Arduino IDE for the next steps, I’m using IDE Version 2 as an example, but you should also be able to use the older version 1.8 if you wish.
I assume you have the ESP32 Boards Manager installed. I used the AIThinker ESP32-CAM board for both the ESP32-CAM and ESP-EYE, and it worked properly. If this paragraph hasn’t made much sense to you so far, you should probably check out my article on the ESP32-CAM and ESP32, as you need to know how to work with these boards in the Arduino IDE.
The ZIP file we download from Edge Impulse is an Arduino Library, and it can be installed as you would any ZIP library in the Arduino IDE:
- Open the Arduino IDE.
- Go to the Sketch entry in the top menu
- Select Include Library. A submenu will open.
- Select Add ZIP Library.
- A file dialog box will open. Use it to navigate to the location you stored the ZIP file that you downloaded earlier from Edge Impulse.
- The library should install into your IDE. You’ll get a message when it is done.
We are now ready to deploy the file to our ESP32-CAM board.
Testing with ESP32-CAM
With our library installed, we are ready to test our model on the ESP32.
Ensure you have an ESP32-CAM or ESP-EYE board connected to the computer and selected in the IDE. You can use the “AI Thinker” board for either camera board.
The library we installed included some sample code, and one of the samples is exactly what we need to test everything out.
Go to the top menu and select File. Select Examples from the file menu, a submenu will open with code samples.
Navigate to the Examples from Custom Libraries section and scroll down until you see an entry with earth same name as your Edge Impulse project. Now navigate to ESP32 and then ESP32 Camera.
Open the sketch. Scroll down to about line 32 and make sure that the camera selected matches yours; change it if necessary.
Now upload the sketch to your ESP32 camera board.
Once it is uploaded, try it out using the objects you memorized earlier. You can see the results on the Serial Monitor.
The results will show the “confidence level” as a percentage. This indicates how “sure” the AI is about the object it detects. It will also display the top-left coordinates of the bounding box, as well as its size.
The angle you hold the board and the lighting will make a big difference. With the proper setup and focus, you should be able to recognize multiple objects.
Using ESP32-CAM for Image Capture
It is also possible to use the ESP32-CAM camera to collect images of your object. This can make a lot of sense, as you would be using the same hardware for both training and production. If you kept the camera in the same position for both training and operation (and kept the lighting constant), it would work very well.
Edge Impulse does have a procedure for getting live video from the ESP-EYE board; the procedure is a bit involved and uses the ESP-EYE CLI (command-line interface). We won’t be doing that today; if you are interested, you can check out their documentation for the details.
Another, and perhaps easier, workflow is to use the ESP32-CAM or ESP-EYE to capture images of your target object and save them. After that, the images can be uploaded to Edge Impulse.
A library from EloquentArduino makes this very easy.
EloquentArduino makes a few interesting libraries for the ESP32-CAM, including ones designed specifically for Edge Impulse data. You should check out their website for more information.
We will use the EloquentESP32CAM Library, which simplifies working with the ESP32-CAM board and other ESP32 cameras. You can install this library directly from the Library Manager in the Arduino IDE.
Collect Images Sketch
The EloquentESP32CAM Library comes with an impressive selection of sample sketches, and one of them is precisely what we need to gather our images.
After you install the library, open the File menu and select Examples. Scroll down to the Examples from Custom Libraries section and look for EloquentESP32CAM.
Within the EloquentESP32CAM submenu, look for 26_Collect_Images. This is the sketch we will use to get the images to build our model.
You’ll need to add your SSID and Password for your WiFi network, and we will be using WiFi to create a web page to control the camera. Line 37 is the line you need to edit.
The only other modification you may need to do to the sketch is to change the camera type. The sketch supports several ESP32 camera boards. On line 28, you can change the sketch to match the one you have.
After modifying the sketch, upload it to your ESP32 camera board. Open your Serial Monitor and note the IP address assigned to the ESP32; you will need it for the next step.
Capturing Images with ESP32-CAM
With the ESP32-CAM running the Collect Images sketch, go to a web browser on a computer or tablet that shares the same WiFi network as the ESP32. Type in the IP address you noted earlier – you may also be able to use http://esp32cam.local instead.
You should get a screen with a camera display and a few buttons. The display may lag a bit, but that is normal.
Aim the camera at the object you want to detect and press the Start Collecting button. The camera will collect images until you hit Pause.
It is a good idea to collect about 20 images of just the background.
When you have a collection of images of one object, press Pause and then press Download. A ZIP file of all the images will download.
Repeat the process for every object, saving a ZIP file for each one. I would recommend at least 40 images for each object, remember that you’ll eventually need to draw a bounding box on each image, so don’t capture too many images.
Extract all the ZIP files when you are done photographing objects. You can put them all into one folder, or into separate folders if you wish.
Import to Edge Impulse
Now it’s time to get the images into Edge Impulse.
You’ll probably want o open a new project first. Open Edge Impulse and click on your avatar icon, to open a menu. Choose Projects.
You’ll see a list of projects, which probably has only one item in it. Click Create new project, and a window will pop up. Give your project a name, keep the Developer selection, and click Create new project.
You’ll now be in the dashboard of your new project. Click Add Existing Data and in the pop-up window, select Upload Data. Navigate to where you extracted the images and upload them.
Now you will need to go into the Labelling Queue and label all the images. This and the rest of the steps are the same as before, so I won’t repeat them. Follow them through and create another Arduino Library file.
Testing with ESP32-CAM
Once you have downloaded the ZIP file from Edge Impulse, install it in your Arduino IDE as you did before. Then run the same ESP32 Camera sketch, making sure you use the one from the new library.
If you do everything correctly, you’ll get similar results. I found that the webcam did create a more accurate system, but the ESP32-CAM model also worked quite well. And it’s certainly very easy.
Being able to turn an inexpensive ESP32-CAM board into an object detection sensor will open up a lot of possibilities. Edge Impulse makes experimenting with this technology very easy.
There are also a number of other sketches installed with the EloquentArduino library that you can experiment with. One of them, 27_EdgeImpulse_FOMO, can be used with the model you created instead of the sketch provided by Edge Impulse. You’ll find the EloquentArduino one a lot easier to modify for your own purposes. The video accompanying this article has more details about this.
Now go and grab an ESP32-CAM and create a free Edge Impulse account. It’s time to get started with Object Detection!