Learn to perform web crawling and apply data mining in applications with Apache Nutch using Packt's new book

January 2014 | Open Source, Web Development

Packt is pleased to announce the release of its new book Web Crawling and Data Mining with Apache Nutch, a concise and user-friendly guide that covers all the necessary steps and examples related to web crawling and data mining using Apache Nutch. This book is now available in all the popular formats including eBook, Kindle, and select library formats. The book has 136 pages and is priced at $ 34.99, while the eBook is priced at $17.84.

About the Authors: 

Dr. Zakir Laliwala

Dr. Zakir Laliwala is an entrepreneur, open source specialist, and a hands-on CTO at Attune Infocom. He explores new enterprise open source technologies and defines architecture, roadmaps, and best practices. He has provided consultations and training to corporations around the world on various open source technologies such as Mule ESB, Activiti BPM, JBoss jBPM and Drools, Liferay Portal, Alfresco ECM, JBoss SOA, and cloud computing. He has published many research papers on web services, SOA, grid computing, and the semantic web in IEEE, and has participated in ACM International Conferences. He serves as a reviewer at various international conferences and journals. He was a co-author of Packt's Mule ESB Cookbook and Activiti Business Process Management Beginner's Guide.

Abdulbasit Shaikh

Abdulbasit Shaikh has more than two years of experience in the IT industry. He has a lot of experience in open source technologies. He has worked on a number of open source technologies, such as Apache Hadoop, Apache Solr, Apache ZooKeeper, Apache Mahout, Apache Nutch, and Liferay. He has provided training on Apache Nutch, Apache Hadoop, Apache Mahout, and AWS architect. He is currently working on the OpenStack technology. He has also delivered projects and training on open source technologies. He has a very good knowledge of cloud computing, such as AWS and Microsoft Azure, as he has successfully delivered many projects in cloud computing. 

Apache Nutch helps in creating search engines and customizing them according to one's needs. It can be easily integrated with different components such as Apache Hadoop, Eclipse, and MySQL. Web Crawling and Data Mining with Apache Nutch starts with the basics of crawling webpages for applications. Readers will learn to deploy Apache Solr on servers containing data crawled by Apache Nutch and perform sharding with Apache Nutch using Apache Solr. The applications can then be integrated with databases such as MySQL, Hbase, and Accumulo, and also with Apache Solr, which is used as a searcher. This book features all the necessary steps that are required in crawling webpages for applications and using them to make the application's searching more efficient.

Web Crawling and Data Mining with Apache Nutch covers the following topics:

Chapter 1: Getting Started with Apache Nutch
Chapter 2: Deployment, Sharding, and AJAX Solr with Apache Nutch
Chapter 3: Integration of Apache Nutch with Apache Hadoop and Eclipse
Chapter 4: Apache Nutch with Gora, Accumulo, and MySQL 

This book is aimed at data analysts, application developers, web mining engineers, and data scientists. It is a good start for those who want to learn how web crawling and data mining is applied in the current business world.


Web Crawling and Data Mining with Apache Nutch
Perform web crawling and data mining in your application

For more information, please visit book page

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software