Chapter 6. Indexing Data Using Apache Tika
In previous chapters, we saw how we can use the data import handler provided by Solr to index data using various datasources (JDBC and file datasource). In this chapter, we'll see how we can index data for various file formats, such as MS Word, Excel, PDF and many more. We'll cover the following topics:
Introducing Apache Tika
Configuring Apache Tika in Solr
Indexing PDF and Word documents