How does the search work?

When we talk or think of search, the first thing today comes to our mind is google, yahoo, or bing search, of course for all good reasons. Google search is the most used search engine in the world as it gives the most relevant result to what searchers ask. There are many other search engines like DuckDuckGo, AOL, Excite, Baidu, and many more.

Search is, of course, immensely complicated, it may seem very simple but it's definitely not otherwise the concept of SEO ( Search Engine Optimization) would not exist.

The objective of the search:

The main objective of the search is to find the most relevant page/document that exits on the internet that contains your search term and makes the most sense to the search string being searched. Sounds simple right, but it's not so much ... let's see next what things happen behind the scene.

In order for the search to work, your search engine needs to do the following:

Know the existence of the document to be searched - Your search engine has to know the existence of the document.

Indexing the document - Your search engine once finds the document, needs to index it to look for the terms present in the document for the future. We will go through indexing in detail below.

Ranking the document - The search engine then needs to identify how relevant the document is as per the search term being searched, we will go through this too in detail below.

Retrieval of the document - Lastly the search engine has to retrieve the document and display based on its ranking i.e. the most relevant on top and so on.

So in short, the search engine has to know the existence of the document, index the document, define its ranking, and finally retrieve and display the search result.

For each of these steps involved in the search, there are technologies that specialize in each of these steps and are available in the market. For example :

Web Crawler:

A web crawler, also commonly known as a web spider is a kind of a bot or you can a simply say a computer program that is designed to automatically look for web pages across the internet, following links as it sees them build a massive corpus of all document that exists with the search term given by the user. So when you launch your website, you simply launch it and let the powerful web crawler handle the rest.

Indexing:

Every document that the web crawler finds is then indexed. In a descriptive way, indexing means that each document is parsed and tokenized, and each individual term from the document is extracted and stored in a data structure called an inverted index.

inverted index-

An inverted index is a mapping of a word or number to the document where that word or number is found. It is inverted because it starts from a search term and goes until its document where that search term exists. Let us understand by an example.

Suppose we have the following three documents collected by the crawler:

From these three documents, the words are taken, normalized by lowercasing each word and removing punctuations then stored in a dictionary that contains the word and the frequency of its appearance in the documents, and the dictionary is sorted based on frequency. Then lastly it adds a mapping to its document where this word is found

In this document, you can see the patients appears 2 times and it can be found in all two documents, PCU, and pathway.

Since here from a word we locate a webpage hence this indexing is "Inverted Index". Now lets move on.

Ranking:

Each document or the webpage is then ranked after being indexed. The most relevant the document or the web page is based on the search term, the higher the ranking of that document will be.

In simple words, based on what the user searched for each document will get a score.

How does ranking work? Well, many factors are involved in ranking a document, the algorithm first needs to understand the intent behind the user's query. Understanding fundamentally means understanding the language and the feeling which is a critical aspect of the search.

For example when you search for the word "How to do laundry" - It can mean that the user wants to watch a video on how to do laundry, also it will be good to display andy laundry shops nearby the user, and most obvious is that it should show a step by process of how to do laundry if there is any such document. This search should not result in showing laundry machines shops for purchase as this makes no sense. Based on the intent of the search query and some Natural Language Processing (NLP) done, each document searched is ranked or you can say given a score.

Scoring a document is very necessary so that when the documents are returned as search results the highest matching/scored document should appear first. The documents displayed in the search result are displayed based on their ranking/score from high to low.

The ranking algorithms always keep involving and changing - so you might see a link of a page at 7th rank today for a search term, it can change and come to the top of the search result after some hours for the same search term based on how relevant the document was marked by readers or it could also go down to 3rd page of the result if the algorithm identifies that it is as irrelevant.

Retrieval and display of search results:

Now finally the document for the search is available with the most relevant on top of the search result, so finally, the search engine displays the most relevant document on top and so on.

Conclusion:

The search algorithm in the real world is way more complex than what we have gone so far, as in between these processes they have to take care of spammers, wrong intentions queries, humanitarian harmful search queries, bad words, the location of the user and so on.

But the above four steps form the core of any search technology and I hope you gained a clear understanding of "how does the search work?".

If you need your business solution for search - then visit http://s-matrixsoftware.com/ or contact us at info@s-matrixsoftware.com

TechBlog

Monday, April 18, 2022

How does search work?

How does the search work?

Web Crawler:

Indexing:

Ranking:

Retrieval and display of search results:

Conclusion:

No comments:

Post a Comment

Techniques for AI

Search This Blog