Google Search is so magical that you sometimes forget how much technology is involved. Your prompt immediately yields the results, no matter how cryptic you describe what you're looking for!
Funnily enough, searching for the latest sales report or that particular legal clause in your company database doesn't work that way. The big difference? Semantic search!
In this blog post, we'll discuss semantic search from a business point of view and how it can bring value to different types of companies. We'll answer when and why you would need semantic search and how a semantic search engine can be implemented and used.
You'll also find examples of various industries, such as legal, recruitment, media and medical & life sciences, that use semantic search to support their business processes.
Let’s start easy and define what a search engine is. In essence, it is a tool that allows you to search through data quickly and intuitively using natural speech. Probably stating the obvious here, but the most known examples are websites like Google, Yahoo, Yandex, Baidu, Bing, and DuckDuckGo. Search engines like these make it easy to find the right website and/or information publicly available on the internet for a specific search. We can all agree that they work pretty efficiently, and most of the time, it doesn’t take a lot of time to find the information you’re looking for.
However, things become a lot harder when it comes to finding information stored on your computer or your company databases. If these have a clear folder structure, you might be quick in finding the information you are navigating for. If not, you will probably rely on effortless ways like the integrated search tool from your operating system or cloud provider. Most of these tools require you to type in one or multiple words, and then you get a wide range of results with documents in which that specific word is commonly listed. And let’s be honest, how often does it happen that you need one of the first files proposed here? Not very often, indeed. Luckily, we have the answer to speed up your efficiency.
So why is it so much easier to find information on Google or Bing compared to finding something in your company sales folder? There are a couple of reasons for that. This blog post will dig deeper into one of the most important ones: the use of semantic search.
Semantic search adds context to both your search query and the files you scan. Now, what do we mean by the word context here? It is often crucial to understand the exact meaning of the word or words used, as many languages work with many synonyms, homonyms, and figures of speech. When you are looking for ‘python’, do you mean the animal, or are you talking about the programming language? When searching for a ‘Python manual’, you expect results explaining how to program in Python rather than some guidebook on how to handle the snake. You can only know this if you also understand the context of the sentence in which you use these words.
To give another example, if your search query is ‘last year’s sales results’, you don’t want to find files that often have the words ‘last’, ‘year’, ‘sales’, and ‘results’, but files from last year or where dates from last year are often listed AND include text talking about sales results.
How does semantic search do this? In short, it gives numerical values (embeddings) to text so that similarity or dissimilarity can be quantified, hence allowing to determine which text is closely related and which ones aren’t.
Defining those numerical values (embeddings) is first done for individual words. To give an easy example, the words ‘king’ and ‘queen’ will, in one way, be closely related due to the royalty aspect but have opposite gender values. The embeddings for these words will numerically reflect these different aspects.
Next, embeddings are added to sentences and paragraphs. To continue using the example of ‘last year’s sales’, it’s essential to know if a whole file is talking about last year’s sales results or only a small paragraph. It is also important to know if last year’s sales results are the paragraph's main topic or just something being mentioned.
If you would like to understand the deeper technicalities of how semantic search works, I’ll gladly forward you to our technical blog post about semantic search, which you can find here.
Why would you want to build a search engine in your organisation? The key aspect remains context. Suppose you are primarily looking for files or documents based on exact keywords or how often a specific keyword is present in a document. In that case, semantic search might be overkill, and you will probably already obtain excellent results using a traditional lexical search engine. You would probably get some value out of it, but not enough compared to the costs.
However, if you are often looking for documents that are related to each other and would like search tools that can work a lot broader, then a semantic search engine could make a lot more sense value-wise. The next question to ask yourself is how much specific jargon you’re dealing with. A general semantic search engine will already add some value but is not trained to deal with very specific terminologies. In the latter, I would advise developing it custom, fully based on your needs and sector.
Semantic search can be used in most situations where you have a large amount of textual data, but let’s dig deeper into some concrete examples of how exactly you could use it.
Not many sectors have more paperwork than anything related to legal. The sheer amount of documents that need to be searched through when preparing a court case is unfathomable for most of us not active in the field. We see a similar story when you need to prepare for an audit, draft the necessary documents and contracts, …
This results in a significant loss of time in finding the necessary documents and multiple people doing the same work because they couldn't find similar papers to use as a basis for the work they're going to do.
Media can use semantic search to make people’s lives easier when looking for the right article in a digital archive. This semantic search engine on top of a digital archive can help two audiences.
First, it could offer a semantic search engine to its subscribers to make their lives easier when searching for the right articles in a digital archive. Next to that, it improves the journalists’ lives by making it easier for them to retrieve the correct information internally (f.e. unpublished work, old notes from (ex-)colleagues, …).
As a side note: AI can also help a lot when developing these digital archives by making standard OCR (Optical Character Recognition - technique used to recognize text within a digital image) tools cope better with a lot of technical jargon.
In the medical sector, there is a case for both semantic search and more traditional filters, depending on the situation. In a situation where medical practitioners are looking for files with very specific names of diseases, body parts, … listed in them, then looking for an exact keyword is the way to go, as a semantic search will come up with similar results.
However, when you’re trying to find or gather information from notes made by medical practitioners and/or research papers, a semantic search makes a lot more sense. This could add a lot of value in speeding up academic research, clinical trials, drug development, …
The recruitment sector is notorious for being very focused on keywords. A hiring manager or client is looking for a set of skills or specific expertise and asks the recruiter to look for people who match the request. This then results in a search through a huge volume of CVs coming from everywhere (internal database, job boards, LinkedIn, …) where a recruiter tries to find potential candidates with specific keywords in their CV.
This is where a semantic search engine could make a huge difference because it will also push candidates with more relevant experiences listed on their CV, but without any of the necessary keywords being listed.
At a very high level, you first need to identify if semantic search will be something that benefits you. If so, you should check what kind of data you have. Is everything digital, or do you also need to start OCR’ing paper documents? The next step is to see if you have a large enough volume of data. Tens of thousands are definitely better than a couple of thousand documents.
Curious to find out more about semantic search from a more technical point of view? Look at our NLP expert Mathias Leys' blog post on this topic.
Do you want to see if ML6 can help you get more value out of your data by implementing search engines in your organisation? Get in touch!