- Know the existence of the document to be searched - Your search engine has to know the existence of the document.
- Indexing the document - Your search engine once finds the document, needs to index it to look for the terms present in the document for the future. We will go through indexing in detail below.
- Ranking the document - The search engine then needs to identify how relevant the document is as per the search term being searched, we will go through this too in detail below.
- Retrieval of the document - Lastly the search engine has to retrieve the document and display based on its ranking i.e. the most relevant on top and so on.
Web Crawler:
A web crawler, also commonly known as a web spider is a kind of a bot or you can a simply say a computer program that is designed to automatically look for web pages across the internet, following links as it sees them build a massive corpus of all document that exists with the search term given by the user. So when you launch your website, you simply launch it and let the powerful web crawler handle the rest.
Indexing:
Every document that the web crawler finds is then indexed. In a descriptive way, indexing means that each document is parsed and tokenized, and each individual term from the document is extracted and stored in a data structure called an inverted index.
inverted index-
An inverted index is a mapping of a word or number to the document where that word or number is found. It is inverted because it starts from a search term and goes until its document where that search term exists. Let us understand by an example.
Suppose we have the following three documents collected by the crawler:
Ranking:
Retrieval and display of search results:
Now finally the document for the search is available with the most relevant on top of the search result, so finally, the search engine displays the most relevant document on top and so on.
Conclusion:
The search algorithm in the real world is way more complex than what we have gone so far, as in between these processes they have to take care of spammers, wrong intentions queries, humanitarian harmful search queries, bad words, the location of the user and so on.
But the above four steps form the core of any search technology and I hope you gained a clear understanding of "how does the search work?".
If you need your business solution for search - then visit http://s-matrixsoftware.com/ or contact us at info@s-matrixsoftware.com

No comments:
Post a Comment