Engineering

Visual Search Technology and Its Applications

Hamish Ogilvy

May 19, 2021

Subscribe

Over half the human brain is dedicated to vision and visual processing, so it’s no surprise that this is also how people also like to search! But a search box is traditionally a text input, so how should it work for visual search?

“A picture tells a thousand words”
- Fred R. Barnard

There are few different ways to visually search. One way is to use actual images as the query, another way is to use AI to describe and classify images into structured data (text, numbers, categories) that can be searched using regular search. In this article, I’ll share how visual search works and how we think it can be best applied to a site today.

Unsplash has a visual search feature; drop images into the search bar to find similar images.

Images as queries

Have you ever used Google Lens or Pinterest Lens (“shop the look”) apps to take a picture of something and search to find what it is? Microsoft also offers their own flavor in Bing Visual Search. This amazing visual search technology works on your phone and at your fingertips!

The image above is from Google Lens. Google Lens is a mobile application that allows you to search visually, right from your phone’s camera.

‍

This type of search is desired by many (see image below) but still fairly limited in usage outside of public search engines and some big e-commerce use cases.

The world has become much more visual in the last 10 years. Facebook updates were mostly text, then they became much more visual. Instagram and Pinterest both took off as visual social platforms at amazing speed. Expect to see more of this kind of visual search in the next few years as the technology finds its way into smaller sized businesses.

How does visual search work?

Images are just a big series of pixels that are either illuminated or not. If you’ve ever looked at a TV up close you might have seen something like below, which is the illumination of the primary coloured pixels red, green and blue (RGB). To make an image, you assemble millions of these in a grid and depending which pixels are active you will see different images! Visual search basically feeds this grid/matrix into a neural network that can be trained to interpret the input.

Finding similar images is done by training the network to recognize similar visual patterns. This can also be done to identify specific objects like faces.

In order to use text input for visual search a similar trick is done where the text is transformed into mathematics (vectors). In this case, text, image alt text, and associated images are both represented mathematically together for the training and search optimization. The concepts of the text are encoded in a vector space and, with training, the model learns the relation to the images. Google has used this for both web and Google photos image search, which is neatly explained here.

Neural networks are now incredibly powerful machine learning tools for analysing and extracting visual information from images. The ability for artificial intelligence (AI) models to describe what is happening in images along with classifying them into distinct groups to deliver visual search results is now highly advanced.

This type of search is available in Bing, Google Images and Google Search, and Duck Duck Go (via Bing). The images tab as shown below has extracted information from millions of images and their context in articles and blogs around the internet to learn what the images are and deliver similar item results.

Visual search for e-commerce can be a powerful real-world application for sellers and buyers.

The models used by public search engines are also now available for anyone else to build upon. Online APIs such as the Google Vision API show what is possible today:

*The Google Vision API analyzes an image to generate useful metadata for search.*

There are several advantages to extracting structured data from images like this:

The image data can be combined with other structured information such as text
It works with existing text search interfaces
It works well with voice based queries

The first of these advantages is huge. By converting images to regular text and structured data it fits nicely into existing search technology and can easily be extended to filter, facet and more. As per the second point above this also means this is a search interface people are already familiar with.

In contrast to using an actual image as a query there are distinct advantages. For example if someone wants to search for “a blue polka dot dress under $50,” the image analysis would be fantastic to identify dresses, their color and material patterns, but the image itself is not useful to work out the price. In this case extracting the information from the image makes a lot more sense as it can be combined with the price field (part relevance, part filtering on price).

Image from HowtoGeek. Visual search and voice search influence how marketers need to think about search engine optimization (SEO).

‍

The third advantage here is for voice queries. Although voice search may initially sound strange in an article about visual search, voice recognition and image analysis are a very powerful combination. The rise of Siri, Alexa and other voice driven technologies has driven voice search mainstream. Combined with AI based visual analysis means you can now search for “a house surrounded by redwood trees with a view” and (hopefully) get exactly what you expected.

Visual search uses

One study found that consumers are 50% more likely to be influenced to buy from visual search results. For retailers, that’s one big reason to invest in visual search. E-commerce and home decor are arguably the best use cases for visual search.

Entire visual search engine companies have sprung up to offer AI-powered “computer vision,” particularly for e-commerce use cases. Amazon has also gotten into the visual search platform game by offering visual search on it’s own catalogue and through AWS services.

And, it’s not just for selling products on websites. Augmented Reality (AR) is expected to generate billions of dollars in the next few years for commerce-related sales, particularly among Millennials and Gen Z, and visual search accounts for a big chunk of that. Even Snapchat has gotten into the game with visual search optimized for AR.

Putting visual search into practice

The future of search isn’t visual, voice, or text. It’s all the above — and just wait till we get to science fiction-like brain-powered search! Today, visual search is a powerful addendum to text and voice-based search functionality, especially for certain use cases, such as e-commerce search.

So how can you put visual search into practice? Here’s one example of a real-world use case using Sajari pipelines.

We’ve used the Google Vision API to add visual search capabilities to an e-commerce database. With the API, we can automatically extract color and other metadata from images as they’re being indexed, which can then be used to design search filters and facets. Now, anytime new products are added to the site, the API will automatically extract the color data for use in filters.

Using color-extraction to automatically generate image metadata that can be used for visual search.

See exactly how we built this e-commerce visual search feature and the final working demo. This is one example of visual search in action, but Sajari’s REST API allows customers to hook into other platforms to design bespoke visual search solutions that you can offer to your customers.

Visual Search Technology and Its Applications

Images as queries

How does visual search work?

Visual search uses

Putting visual search into practice

Similar articles

We're Using Kubernetes and Here's Why You Should Too

BigQuery, a Serverless Data Warehouse

Introducing the New Open Source React Search SDK