Computer Vision: A Force Multiplier for Machine Learning Adoption

Computer vision is the science that applies machine learning to analyse and derive insights from images and video. As late as a decade ago, computer vision merited no more than cursory treatment in computer science curricula because its mass practical relevance was blurry. But in recent years, especially in the post-pandemic world, the influence of computer vision on daily life has shot up; it is one of the vehicles through which Artificial Intelligence (AI) and Machine Learning (ML) are able to address people-centric problem solving needs.

AI, ML and predictive analytics have been in existence for decades. Neural Networks were first proposed in the 1940s; core discoveries associated with statistical models and distributions are even older. But the time for their adoption to explode arrived only a few years back, thanks primarily to three aspects: First, the new-found ease of consumption that was realized when popular programming languages started implementing ML libraries. Second, the ability to deliver these technologies through affordable on-demand processing power of the cloud reachable via high-speed Internet connectivity. Third, the arrival of novel use cases triggered by the profusion of smart phones, edge devices and computer vision. In the rest of this column, we will look at how AI-powered advances in computer vision – or machine vision – are changing lives and industries.

With computer vision, trained neural networks run inferencing logic – often on chipsets near the eye of a smart camera – and derive insights in real time. Popular neural networks can perform detection and identification of various types of objects ranging from people and animals, to vehicles and equipment; they can identify the gender and age group of a detected person, or conclude whether a recognized vehicle is dirty or has its door open. They can track moving objects by latching on to them post identification. They can also perform activity detection, for example, identifying the scoring of a goal during a football match or the opening of a water tap.

Let us look at exemplar scenarios from several industries starting with sectors that are defining the semantic of AI for public good: healthcare and agriculture.

Medical imaging enables us to see inside the body. AI on image scans produced by X-rays and magnetic resonance imaging (MRI) assists doctors in detecting abnormalities and diagnosing diseases. Consider the case of determining the incidence of Alzheimer’s. Convoluted neural networks (CNNs) are trained using publicly available medical datasets to automatically segment parts of the brain – such as the hippocampus – from a patient’s MRI scan and compute features such as organ volumes. Using deep learning based classification algorithms, correlations are drawn between the extracted features and the presence of Alzheimer’s. Similarly, conditions such as Covid and Pneumonia are diagnosed by running ML models on chest X-rays.

Agritech and farm management are being redefined by ML models applied on data generated by cameras and IoT sensors mounted on autonomous tractors, on drones or underneath the soil. AI in Agriculture is enabling intelligent farming. Plant diseases can be identified by running classification models on close-up photographs of leaves; fertilizer and pesticide delivery can be controlled at spot-level granularity based on soil quality inferred from multiple sensors.

Retailers use computer vision to perceive customer disposition towards a product category and construct personalised campaigns in real time. As shoppers move and dwell along the aisles of a supermarket, video analytics can learn their preferences by the time they reach the checkout counter; the retailer can monetize gathered insights by creating customized offers.

Machine vision is also transforming the manufacturing sector. One example is the new-found ability to automatically control the quality of products as they roll out of assembly lines. By applying deep learning on video streams from smart cameras mounted on the factory shopfloor, virtual engineers are able to classify and reject defective products with high accuracy without manual intervention.

Another popular domain that is thriving with the influx of machine vision is site and worker safety. Suitably trained ML models employed on camera feeds can instantly detect the presence or absence of wearable safety gear such as helmets, goggles, masks, gloves, jackets and boots. Within a factory or construction site, computer vision can continuously monitor vehicle speeds, enforce access restrictions to earmarked areas at pre-specified times, discover violations pertaining to smoking and littering, or detect hazardous objects discarded on the shopfloor.

AI inferencing on feeds from cameras mounted along highways can help perform traffic management, identify location of potholes, and optimize energy expended for road lighting. It can render cities smarter by injecting intelligence and automation into waste management, traffic congestion governance, and crowd control. It can track inventory in warehouses and improve space utilization in offices.

Drone technology has generated tailwinds for computer vision by providing wings rather than static mounts to digital video cameras. Consumer drones can identify obstacles in their path and track objects of interest in their field of vision by applying deep learning on video feeds from built-in high-definition cameras. Many drones leverage AI to react to facial expressions and hand gestures.

Machine vision delivered through drones is helping the telecom and power sectors as well. Consider the problem of periodically inspecting cellular towers to check for physical cracks or other anomalies. Rather than using manual labour to visually examine towers after shutting them down, ML-powered video analytics can be applied on images streaming from stereo cameras mounted on drones to get to a safer, faster and more accurate outcome. Similar scenarios apply to power lines, gas pipes or even roads and rivers.

There is a growing demand for robotic vision in defence and law enforcement. Use cases range from detecting intrusions along international borders to controlling autonomous weapons. Counter-terrorism operations can spot abnormal behaviour using emotion detection and body printing; they can discern suspicious vehicles by programmatically comparing the result of automatic number plate recognition with databases of interest.

Light Detection And Ranging (LiDAR) is a technology that complements computer vision by bringing accurate spatial and range information to detected objects. LiDAR sensors can perform 360-degree scans and generate 3D models of a visual field that is changing either because the sensor is located on a moving mount (like a self-driving car or an autonomous drone) or because it is monitoring moving objects such as airplanes on a taxiway, or a river in spate. Computer vision integrated with LiDAR is opening up new possibilities for advancing safety-critical systems.

Computer vision is now influencing life in a way that was not imaginable just a decade ago. Yet, we have only scratched the surface of what this technology can do. From driverless cars that detect road lanes for self-driving, to aviation technology that monitors airport runways for intrusions, the coming of age of applied computer vision is accelerating the mass adoption of AI.

Written by

Sreekrishnan Venkateswaran, CTO, Kyndryl India

Disclaimer: The views expressed in this article are those of the author and do not necessarily reflect the views of ET Edge Insights, its management, or its members

Disclaimer: The views expressed in this article are those of the author and do not necessarily reflect the views of ET Edge Insights, its management, or its members

Related Articles