In the last post, we have seen how to set up Solr. In this post, we will see how to set up the data import handler to import the data from the database. Here, I am using a sample product catalog database for demonstration. The table structure is given below.
The data dump file is available https://github.com/2013techsmarts/prod_cat_dump
Step 1 :
Now, we will prepare a document structure for Solr. The configuration of db-data-config.xml is given below.
The db-data-config.xml file will have a root element as <dataConfig>. Here, the dataSource element is having all the database related details. In this example, I am using MySql as a database. So, I configured MySql driver name, connection URL, user name and password. Copy the MySql driver jar file under solr_home lib directory.
In the Solr point of view, the document is nothing but, a row in a database table. Here, we are going to collect the data from various tables. In this example, I am collecting the data from products and retailers tables. We need to wrap the document under the document element. Every SQL query will become as an entity under the document element. Here, the entity element has a name and query attributes. For the name attribute, we can give any value. In this example, I have given the product as value.
The query attribute is having SQL query to collect the necessary columns. All the columns information is given under field elements. For each column, there is an associated field defined. Field has column and name attributes. For column, attribute give the same value which is there in the SQL query. For the name attribute, we can give any suitable name.
Step 2 :
Now, we will configure the schema.xml file. This file contains all the field types information with the analysis portion.
Solr supports for various field types. A field type is similar to the data type. The sample field types are given below,
<!-- boolean type: "true" or "false" --> <!-- A text field that only splits on whitespace for exact matching of words -->
After configuring the field types, we need to specify the fields which we are going to use for indexing.
The configuration of fields is given below.
Here, the name of the field should match with the name which we have given in db-data-config.xml. The explanation of attributes is given below.
type → the field type(One of the field type which we have defined earlier)
indexed → the fields which are indexed=true are searchable
stored → the fields which are stored=true are stored as part of the index and those fields will be returned as part of search response
required → the fields which are having required=true will be mandatory fields (Same as a primary key in a database)
Step 3 :
Now, we will configure solrConfig.xml. Here, we are going to configure data import handler to import the data, request handler and the response handler.
The configuration of the data import handler is given below.
db-data-config.xml
In the above configuration, you can notice the file name is matched which we have given in Step 1.
The request handler configuration to take the search requests is given below.
<!--Make sure your system has some authentication before enabling remote streaming! --> <!-- Set HTTP caching related parameters (for proxy caches and clients). To get the behaviour of Solr 1.2 (ie: no caching related headers) use the never304="true" option and do not specify a value for --> <!--httpCaching lastModifiedFrom="openTime" etagSeed="Solr"--> <!-- lastModFrom="openTime" is the default, the Last-Modified value (and validation against If-Modified-Since requests) will all be relative to when the current Searcher was opened. You can change it to lastModFrom="dirLastMod" if you want the value to exactly corrispond to when the physical index was last modified. etagSeed="..." is an option you can change to force the ETag header (and validation against If-None-Match requests) to be differnet even if the index has not changed (ie: when making significant changes to your config file) lastModifiedFrom and etagSeed are both ignored if you use the never304="true" option. --> <!-- If you include a directive, it will be used to generate a Cache-Control header, as well as an Expires header if the value contains "max-age=" By default, no Cache-Control header is generated. You can use the option even if you have set never304="true" --> <!-- max-age=30, public --> <!-- requestHandler plugins... incoming queries will be dispatched to the correct handler based on the path or the 'qt' param. Names starting with a '/' are accessed with the a path equal to the registered name. Names without a leading '/' are accessed with: http://host/app/select?qt=name If no qt is defined, the requestHandler that declares default="true" will be used. --> <!-- default values for query parameters --> explicit <!-- 10 * 2.1 -->
The query response writer configuration is given below.
If we are not going to configure any response writers in solrConfig.xml, by default XML response writer is enabled. During the search request, we can specify the response writer using “wt” query parameter. Based on this, the search response will be returned in the respective data format.
Now, all the necessary configuration is done. Its time to index the data by using the data import handler. We can invoke the data import handler from the admin console.
or we can send HTTP request to the data import handler by using the below URL
<http>://<host>:<port>/solr/sampleCatalog/dataimport?command=full-import&clean=true
Once the index is completed, we can search for the products based on any indexed field. For example, we can search based on the product name
<http>://<host>:<port>/solr/sampleCatalog/select?q=prodName:Harley%20Davidson&wt=xml&indent=true
Here, I am searching for Harley Davidson keyword and we got 2 results from the search response.
0 500 true prodName:Harley Davidson xml 48.81 95.7 S10_1678 This replica features working kickstand, front suspension, gear-shift lever, footbrake lever, drive chain, wheels and steering. All parts are particularly delicate due to their precise scale and require special care and attention. 1969 Harley Davidson Ultimate Chopper Min Lin Diecast 127.0 127.0 7933 24.23 60.57 S18_2625 Intricately detailed with chrome accents and trim, official die-struck logos and baked enamel finish. 1936 Harley Davidson El Knucklehead Welly Diecast Productions 162.0 162.0 4357
Let us run the SQL query to verify the results.
So, the results from the SOLR and the Database are in sync.
Now, we will see, how we can get the Solr response in other formats by applying “wt” parameter. We will try to get json as response.
<http>://<host>:<port>/solr/sampleCatalog/select?q=prodName:Harley%20Davidson&wt=json&indent=true
{ "responseHeader": { "status": 0, "QTime": 0, "params": { "indent": "true", "q": "prodName:Harley Davidson", "wt": "json" } }, "response": { "numFound": 2, "start": 0, "docs": [ { "buyPrice": 48.81, "listPrice": 95.7, "prodId": "S10_1678", "prodDesc": "This replica features working kickstand, front suspension, gear-shift lever, footbrake lever, drive chain, wheels and steering. All parts are particularly delicate due to their precise scale and require special care and attention.", "prodName": "1969 Harley Davidson Ultimate Chopper", "retailerName": "Min Lin Diecast", "salePrice": 127, "salePriceDisp": 127, "quantity": 7933 }, { "buyPrice": 24.23, "listPrice": 60.57, "prodId": "S18_2625", "prodDesc": "Intricately detailed with chrome accents and trim, official die-struck logos and baked enamel finish.", "prodName": "1936 Harley Davidson El Knucklehead", "retailerName": "Welly Diecast Productions", "salePrice": 162, "salePriceDisp": 162, "quantity": 4357 } ] } }
Now, we will sort the results based on the “buyPrice” field.For that we need sort=<field name> <sort option either ASC/DESC>.
<http>://<host>:<port>/solr/sampleCatalog/select?q=prodName:Harley%20Davidson&wt=json&indent=true&sort=buyPrice%20ASC
0 0 buyPrice ASC true prodName:Harley Davidson xml 24.23 60.57 S18_2625 Intricately detailed with chrome accents and trim, official die-struck logos and baked enamel finish. 1936 Harley Davidson El Knucklehead Welly Diecast Productions 162.0 162.0 4357 48.81 95.7 S10_1678 This replica features working kickstand, front suspension, gear-shift lever, footbrake lever, drive chain, wheels and steering. All parts are particularly delicate due to their precise scale and require special care and attention. 1969 Harley Davidson Ultimate Chopper Min Lin Diecast 127.0 127.0 7933
From this post, we have learnt how to configure Solr and how to search etc..
The entire set up, you can find at https://github.com/2013techsmarts/Solr-Data-Impot-Handler-Conf
In the coming post, we will see some more Solr features and configurations.
Stay tuned …
In the fetching of data from database i got the following error :
collection1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load config for solrconfig.xml…
how to solve that issue……
Thanks..
mahesh
The exception indicates that there is a problem in the solrconfig.xml.