AI & Computer Vision at Retail: A Closer Look

May 6, 2024
By Richard Schwartz, President and CEO, Pensa Systems

Video-based CVAI at work in the aisle. Credit: Pensa Systems.

Artificial Intelligence is a prevalent topic in the consumer-packaged goods industry (and pretty much every other industry) with good reason. AI brings the promise of automating manual tasks, to help people perform tasks faster and better, to enable people to make better business decisions and even to automate decision-making processes without human intervention.

With that said, it’s commonplace to see the AI label on many different technologies and approaches, with varying levels of capabilities and benefits. It is good practice to peel back the onion on what’s behind that label to separate hype from commercial reality.

Expectations for Computer Vision + AI

CVAI refers to using a combination of computer vision and AI to rapidly assess CPG inventory on the physical retail shelf.  CVAI can be used to “see” and interpret how the retail shelf is organized and automatically determine what is running low or out-of-stock.  In this sense, it replaces tedious tasks that otherwise consume time for store workers, brand teams and third-party merchandisers. The commercial benefit of CVAI is clear as it enables more time to focus on fixing inventory or merchandising problems to drive incremental growth rather than staring at the shelf to identify if and where inventory or merchandising problems exist, or basing decisions on trailing indicators such as POS.

Not all approaches are the same.  There are differences in both data capture techniques as well as AI to interpret the visual input.  The differences translate to speed of capture, accuracy of the outputs and depth and breadth of the AI takeaways.

Let’s take the data capture process first.

Legacy image-based approaches essentially try to train the person tasked with data capture to play robot, taking one step down the aisle, snapping a photo, very slowly sidestepping another pace down the aisle, snapping again, etc.  This approach then requires the software behind the scenes to “stitch” carefully together the photos of a long aisle into one continuous panorama.

Video-based approaches to data capture, rather than one image at a time, are instead capturing full motion (video) from a quick normal walk down an aisle.

“This approach is designed to be quite fast and provide immediate labor efficiency over historical manual shelf inspection.  Click on the video below to see a comparison of data capture approaches.

Comparison of shelf inventory data capture approaches Credit: Pensa Systems.

Now to the AI to process the vision input – let’s look behind the curtain.

AI for legacy image-based capture generally employs AI algorithms that operate on a single image, typically to process the stitched panorama image “pasted” together from the independent images gathered by the user sidestepping down the aisle. To analyze a single panoramic photo for the entire aisle, the AI behind the capture can be simpler technology.  Downsides are of course the tedium of the collection (hitting business cases negatively), degraded recognition accuracy, and longer AI training cycles.

AI for video-based capture is designed to process native moving frames from a natural walk down the aisle. While the capture is dramatically faster, to extract digital understanding requires substantially more AI to interpret.  Without more advanced AI behind video-capture, digital takeaways are quite limited – and with the more advanced AI, are much richer.

Advanced AI works in concert with video-based data capture.  For example, from the fast walk down the aisle, Pensa’s AI processes motion input from the camera sensor very similar to the AI utilized in autonomous driving technology. In effect, from the video frames, the AI evaluates each item on the shelf from many angles, triangulating as it moves down the aisle just like an autonomous car navigating the streets.

Visual AI evaluates each item from many perspectives. Credit: Pensa Systems.

Pensa’s AI then localizes items and placement across an entire aisle, building a full three-dimensional internal digital model of the shelf and how the shelf space is organized and managed.

This 3D model is not a photograph. It’s AI-generated. Credit: Pensa Systems.

Why is this AI combination of triangulation and 3D reconstruction so important?

First, it can deliver dramatically higher accuracy in identifying the items on the shelf, down to the small differences such as low sodium versus regular or organic chicken broth.

Second, it can learn and train automatically in-situ, or in-place, from what it sees on the shelf – recognizing new products and packaging changes and learning the catalog automatically from the natural turns and placements on the shelves.  And when AI like this detects a new product or packaging change once, it should recognize it globally going forward. This approach is many times faster and more robust than matching images captured at the shelf against a reference database.

Third, and most important, advanced AI can understand and interpret how video frames relate to each other in the context of the shelf. For example, Pensa’s AI-generated 3D digital model of the entire aisle enables our analytics to answer a richness of questions about shelf conditions in near real time; understanding whether a product is running low or out of stock, whether a new item is on the shelf and in the right place, whether the correct number of “facings” are on the shelf, how is the aisle organized etc.

Without advanced AI, a video taken of the shelf is only a jumble of images. A video is otherwise a series of independent snapshots, in essence a one-dimensional view that may be able to recognize whether a product is present in the aisle but may not be able to assess stockouts, share of shelf, number of facings, product positions or de-facto planograms for how the shelves are organized.  All critical benefits expected from CVAI to digitize the shelf and automate otherwise tedious manual activity.

Putting it Together

CVAI is a critically important use case for digitizing the shelf at retail, holding the promise of increasing labor efficiency, reducing tedious manual activity, more accurately and quickly assessing store conditions, and then identifying highest priority areas and actions to improve, both in the moment and as trends over time.

CVAI can be a game-changer. But the speed of the data capture and the AI behind it are the magic that determines the real capabilities that lead to growth and business transformation.

This content is adapted from a previously published Pensa Systems’ blog.