Apache Solr for Indexing Data

More Information
Learn
  • Get to know the basic features of Solr indexing and the analyzers/tokenizers available
  • Index XML/JSON data in Solr using the HTTP Post tool and CURL command
  • Work with Data Import Handler to index data from a database
  • Use Apache Tika with Solr to index word documents, PDFs, and much more
  • Utilize Apache Nutch and Solr integration to index crawled data from web pages
  • Update indexes in real-time data feeds
  • Discover techniques to index multi-language and distributed data in Solr
  • Combine the various indexing techniques into a real-life working example of an online shopping web application
About

Apache Solr is a widely used, open source enterprise search server that delivers powerful indexing and searching features. These features help fetch relevant information from various sources and documentation. Solr also combines with other open source tools such as Apache Tika and Apache Nutch to provide more powerful features.

This fast-paced guide starts by helping you set up Solr and get acquainted with its basic building blocks, to give you a better understanding of Solr indexing. You’ll quickly move on to indexing text and boosting the indexing time. Next, you’ll focus on basic indexing techniques, various index handlers designed to modify documents, and indexing a structured data source through Data Import Handler.

Moving on, you will learn techniques to perform real-time indexing and atomic updates, as well as more advanced indexing techniques such as de-duplication. Later on, we’ll help you set up a cluster of Solr servers that combine fault tolerance and high availability. You will also gain insights into working scenarios of different aspects of Solr and how to use Solr with e-commerce data.

By the end of the book, you will be competent and confident working with indexing and will have a good knowledge base to efficiently program elements.

Features
  • Learn about distributed indexing and real-time optimization to change index data on fly
  • Index data from various sources and web crawlers using built-in analyzers and tokenizers
  • This step-by-step guide is packed with real-life examples on indexing data
Page Count 160
Course Length 4 hours 48 minutes
ISBN 9781783553235
Date Of Publication 28 Dec 2015

Authors

Sachin Handiekar

Sachin Handiekar is a senior software developer with over 5 years of experience in Java EE development. He graduated in computer science from the University of Greenwich, London, and currently works for a global consulting company, developing enterprise applications using various open source technologies, such as Apache Camel, ServiceMix, ActiveMQ, and ZooKeeper.

He has a lot of interest in open source projects and has contributed code to Apache Camel and developed plugins for the Spring Social, which can be found on GitHub at https://github.com/sachin-handiekar.

He also actively writes about enterprise application development on his blog (http://www.sachinhandiekar.com/).

Anshul Johri

Anshul Johri has more than 10 years of technical experience in software engineering. He did his masters in computer science from the computer science department in the University of Pune. Anshul has always been a start-up mindset guy, working on fast-paced development using cutting-edge technologies and doing multiple things at a time. His core strength has always been search technology, whereby Solr plays an important role in his career. Anshul started using Solr around 9 years ago, and since then, he has never looked back. He did better and better with Solr, whether using it or contributing to the open source search community. He has used Solr extensively in all his organizations across various projects.

As mentioned earlier, Anshul has always been a start-up mindset guy. Because of that, he has worked with many start-ups in his career so far, which includes early-age and mid-size start-ups as well. To name a few, they are Ibibo.com, Asklaila.com, Bookadda.com, and so on. His last company was Amazon, where he spent around 2 years building scalable systems for Amazon Prime (a global product). Anshul recently started his own company in India with another friend from Amazon and founded http://www.rentomo.com/, a unique concept of a peer-to-peer sharing platform in a trusted community. He heads the technology and other core pillars of his own start-up.

Anshul did the technical review of the book Indexing with Solr, published by Packt Publishing.