At ML6 we regularly develop custom machine learning models to help customers automate computer vision tasks such as detection, recognition, quality estimation etc. Often the hardware setup is already in place and we start from existing images. However, we increasingly receive requests to advise on the camera setup as well. This gives us the opportunity to make sure that the captured images contain the right level of detail for the machine learning model to derive the necessary features from.
Picking the optimal camera setup for your computer vision project may feel like a daunting challenge. Often, that feeling is not unjustified. Fortunately, understanding a few key camera characteristics and features can help you recognize the factors you need to take into consideration.
The first decision to make is the choice of camera sensor and lens. Reduced to its essence, the question is: what size of scene are we trying to capture at what level of detail and at what distance?
The figure below depicts a digital camera sensor. During exposure, incident light rays hit a sensor’s pixels and the resulting electric charge is read out per pixel to produce an image. Clearly the number of pixels on a sensor, or resolution, places an upper bound on the level of detail that can be captured.
You may now be tempted to just get the highest-resolution sensor that your budget allows, but that is probably not a good idea. The reason for that is the three-way balancing act that must be considered for every camera sensor: between resolution, pixel size and sensor size. As the resolution increases for a fixed sensor size, the size of each individual pixel decreases. The resulting smaller pixels are more susceptible to noise and perform worse in low-light conditions. Increasing the total sensor size could solve this problem, but the extra silicon comes at a higher price tag. Hence, a balancing act. A list of common sensor sizes can be found here
Now that we all agree on what camera resolution means, how do you decide it is high enough? A simple approximation is to ask yourself what the smallest detail is that must be identifiable in your image. Consider the example below. Our object to capture is a gentleman who is 2 meters tall (incl. top hat) and we are interested in distinguishing individual hairs in his moustache, each 1 millimeter thick.
The minimum resolution is then:
To get that minimum level of detail we need at least 2000 vertical pixel rows or 4 Megapixel for a square sensor.
The previous example did not do justice to the massive contribution of the lens to the resulting image however. We simply assumed that our object was in focus and projected exactly onto the sensor. In reality, the field of view is decided by both the size of our sensor and the focal length of the lens. The latter is the distance from the lens to where incoming perpendicular light rays converge.
The image below gives an indication of how sensor size and focal length affect the field of view: The field of view increases with decreasing focal length and increasing sensor size. For simplicity, we assume a large working distance which allows us to set the distance between lens and camera sensor equal to the focal length.
Keep in mind that if your field of view is too large for your target object, the object will appear smaller in the image and it will be captured in less detail. A quick google search for “Field of View Calculator” will offer a multitude of tools to help you select the sensor size and focal length you need, so there is no need to brush up on your trigonometry and optics. Two good examples are the following: a simple and a more advanced calculator.
By now, you should already have some idea of what the resolution and size of your camera sensor should be and what lens to combine it with. However, we’re not quite there yet. Several questions remain, mostly pertaining to technical characteristics.
The two main types of electronic image sensors are the charge-coupled device (CCD) and the active-pixel sensor (CMOS). Both types work like the typical sensor we described earlier, but the difference lies in the way each pixel value is read. For a CCD sensor, pixel values can only be read on a per-row basis. Each row of pixels is shifted, one by one, into a readout register. Conversely, For a CMOS sensor each pixel can be read out individually.
Our advice here is to choose for a CMOS sensor whenever your situation allows it. It is cheaper and consumes less energy without sacrificing image quality in most cases. It can also achieve higher frame rates due to the parallel readout of pixel values. However, there are some specific scenarios in which CCD sensors still prevail. For example when long exposure is necessary and very low-noise images are required, such as in astronomy.
A global shutter exposes each pixel to incoming light at the exact same time, whereas a rolling shutter exposes the pixel rows in a certain order, e.g. top to bottom.
The biggest advantage of the global shutter over the rolling shutter is that it does not suffer from the same distortion effects. Consider the example below where an image is captured of a spinning fan. The fast motion of the blades results in a very noticeable distortion with the rolling shutter. This effect is most apparent when large objects are moving at high velocity. In contrast, the global shutter has a perfect temporal correlation between all parts of the image. Another major advantage of a global shutter is that it is much easier to sync with peripheral devices, because there is a single point in time when exposure starts.
The downside of a global shutter is that it is typically more costly. Originally, global shutters were only available on the more expensive CCD sensors whereas CMOS sensors used rolling shutters. However, global shutters are nowadays also available on some CMOS sensors. Another point in favor of the rolling shutter is that it allows for a higher acquisition frame rate.
The aperture is the opening that controls how much light passes through a lens. A large aperture allows a lot of light to pass and vice versa for a small aperture. Much like in a human eye, the aperture is controlled by the iris. Our initial discussion on camera sensors and lenses was focused on how to get a detailed image of an object with a certain size, at a certain distance. We then purposefully omitted the effect of aperture on our camera system to not overcomplicate things. However, it plays a crucial role in capturing sharp and high-contrast images.
Most notable is the influence aperture has on the depth of field. There can only be one image plane truly in focus, but objects in a region close to this focal plane might still appear more or less in focus. The size of this region is called the depth of field. A large aperture results in a shallow depth of field, while a small aperture creates a larger depth of field. However, choosing your aperture too small could require longer exposure times and potentially lead to less sharp images.
Consider the example below, where our original object has now brought a friend. Even though the focus is still on our original object, the depth of field is large enough to also capture his friend with reasonable detail and contrast. Having an optimal depth of field can be crucial for your machine learning project. Suppose that you want to analyze a football game. To accurately detect and track the players, they must all be captured in detail. Therefore, the depth of field should ideally cover the entire football field.
Knowing the function of the iris and the effect of the aperture, what are your options and which one should you choose? The most basic variant is the manual iris. It has to be set by hand and does not dynamically adapt to lighting conditions. In controlled environments where lighting is constant, this may perfectly fit your needs. Other, more dynamic environments call for an automatic solution such as the auto iris or precise iris (P-iris). The main difference between both is that an auto iris only responds to changes in the lighting levels, whereas the P-iris actively communicates with the camera software and directly tries to optimize image quality. On top of that, the P-iris uses hardware that allows for much more precise control. Naturally, the more advanced P-iris comes with a higher price tag.
For specific use cases where color information provides no additional value, a monochrome camera should be considered. It will offer greater sensitivity to light and a higher spatial resolution at a smaller sensor size and lower sensor resolution.
To understand why, it’s essential to talk about how cameras “see” color. The most popular method is the Bayer filter. Each pixel on the sensor is fitted with either a red, blue or green filter. As a result the corresponding pixel will only be sensitive to the corresponding color. This also means that the green value at a pixel sensor with a blue filter can only be approximated by considering the values of the surrounding pixel sensors with a green filter. Hence spatial resolution is lost.
Area scan cameras are what most people would refer to as regular cameras. They capture a 2D image in a single exposure cycle by exposing multiple rows of pixels, either through a global or rolling shutter. On the other hand, a line scan camera has a sensor that consists of just one row of pixels. The latter must therefore scan an object line per line by either moving the camera along the object or the object along the camera.
So do you need a line scan camera? Probably not. Area scan cameras are easy to set up, widely applicable and result in straightforward processing of discrete images. In contrast, line scan cameras need careful synchronization with the movement of the object relative to the camera. However, when applicable, a high-resolution single-line sensor can sample at high frequencies to produce very high-resolution images. A line scan camera is commonly used in manufacturing for the inspection of items on a conveyor belt.
If, after reading this article, you are left with more questions than before, do not worry. The important thing is that you are now better equipped to identify the main variables. Picking the right camera setup is complex and this article knowingly left a lot of concepts and details untouched (e.g. the exposure triangle, illumination techniques, multispectral cameras…). There are numerous online resources where you can find additional information. Also, please feel free to reach out to us at ML6 if you have any questions.