Apache Solr uses Lucene’s inverted index. Most of the search engines are using an inverted index data structure to achieve better search performance. In the inverted index, all the search terms will be having associated document ids. Once the user issues a query, it will search for the terms and the associated documents. It is the optimized way to get fast search results from the search engine.
If you go with forward index, where all the documents will have associated search terms, it requires a lot of iterations to find the documents for the query. This leads to poor performance to provide search results. Now we will see an example of how Solr’s inverted index looks like. Let us define some documents with the content.
Document Id |
Content to index |
Doc1 | dc mens shoes |
Doc2 | clarks shoes mens boat shoes |
Doc3 | basketball mens shoes |
Doc4 | mens watches |
Doc5 | jordan shoes |
Let us construct the “Inverted Index”. To construct the inverted index, first, we need to split all the terms and sort the terms in ascending lexicographical order.
Terms |
Document Ids |
basketball | Doc3 |
boat | Doc2 |
clarks | Doc2 |
dc | Doc1 |
jordan | Doc5 |
mens | Doc1, Doc2, Doc3, Doc4 |
shoes | Doc1, Doc2, Doc3, Doc5 |
watches | Doc4 |
Let us perform some queries.
- User searches for mens AND shoes, the Solr will get the intersection of the documents as the results. From both the sets the documents DOC1, DOC2, Doc3 is the intersection result. The below diagram depicts the same.
- If users search for mens OR shoes, the Solr will get the union of the documents as the search results. From both the sets, the union will be DOC1, DOC2, DOC3, Doc4, and DOC5. The below diagram depicts the same.
In the comings articles, we will see a couple more Solr features. Happy Learning !!!!
Leave a Reply