Exploring Data with RapidMiner

Exploring Data with RapidMiner
eBook: $23.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $63.98    Print cover: $39.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • See how to import, parse, and structure your data quickly and effectively
  • Understand the visualization possibilities and be inspired to use these with your own data
  • Structured in a modular way to adhere to standard industry processes

Book Details

Language : English
Paperback : 162 pages [ 235mm x 191mm ]
Release Date : November 2013
ISBN : 1782169334
ISBN 13 : 9781782169338
Author(s) : Andrew Chisholm
Topics and Technologies : All Books, Big Data and Business Intelligence, Open Source

Table of Contents

Chapter 1: Setting the Scene
Chapter 2: Loading Data
Chapter 3: Visualizing Data
Chapter 4: Parsing and Converting Attributes
Chapter 5: Outliers
Chapter 6: Missing Values
Chapter 7: Transforming Data
Chapter 8: Reducing Data Size
Chapter 9: Resource Constraints
Chapter 10: Debugging
Chapter 11: Taking Stock
  • Chapter 1: Setting the Scene
    • A process framework
    • Data volume and velocity
    • Data variety, formats, and meanings
    • Missing data
    • Cleaning data
    • Visualizing data
    • Resource constraints
    • Terminology
    • Accompanying material
    • Summary
    • Chapter 2: Loading Data
      • Reading files
        • Alternative delimiters
        • Reading complete lines
        • Reading large numbers of attributes
        • Splitting files into smaller pieces
      • Databases
        • The Read Database operator
        • Large datasets
      • Using macros
      • Summary
      • Chapter 3: Visualizing Data
        • Getting started
        • Statistical summaries
        • Relationships between attributes
          • Scatter plots
          • Scatter 3D color
          • Parallel and deviation
          • Quartile color
        • Time series data
          • Plotting series
          • Using the survey plotter
        • Relations between examples
          • Using histograms
          • Using block plots
        • Summary
        • Chapter 4: Parsing and Converting Attributes
          • Generating attributes
            • Date functions
            • Regular expression functions
            • Generating extracts
            • Regular expressions
            • XPath
          • Renaming attributes
            • Searching and replacing attribute values
            • Using the Map operator
            • Using the Replace operator
            • Using the Replace (Dictionary) operator
          • Summary
          • Chapter 5: Outliers
            • Manual inspection
              • Increasing the data volume
              • Rules for handling outliers
            • Automated detection of example outliers
              • The Detect Outlier (Distances) operator
              • The Detect Outlier (Densities) operator
              • The Detect Outlier (LOF) operator
              • The Detect Outliers (COF) operator
            • Summary
            • Chapter 6: Missing Values
              • Missing or empty?
              • Types of missing data
                • Missing completely at random
                • Missing at random
                • Not missing at random
              • Categorizing missing data
                • Finding MCAR data
                • Finding MAR data
                • Finding NMAR data
                • A cautionary note
              • Effect of missing data
              • Options for handling missing data
                • Returning to the root cause
                • Ignoring it
                • Manual editing
                • Deletion of examples
                • Deletion of attributes
                • Imputation with single values
                • Modeling
              • Summary
                • Chapter 8: Reducing Data Size
                  • Removing examples using sampling
                  • Removing attributes
                    • Removing useless attributes
                    • Weighting attributes
                    • Selecting attributes using models
                  • Summary
                  • Chapter 9: Resource Constraints
                    • Measuring and estimating performance
                      • Measuring performance
                    • Adding memory
                    • Parallel processing
                    • Restructuring processes
                    • Summary
                    • Chapter 10: Debugging
                      • Breakpoints in RapidMiner Studio
                      • Logging data in RapidMiner Studio
                      • RapidMiner Studio console printing
                      • Groovy scripts
                        • Outputting macros example
                        • Console logging with Groovy
                      • Regex tools
                      • Using XPath effectively
                      • Summary
                      • Chapter 11: Taking Stock
                        • Exploring new techniques
                          • Time series
                          • Web mining
                          • Using R
                          • Java or Groovy
                          • Third-party components
                          • RapidMiner Server
                        • Where to go next

                        Andrew Chisholm

                        Andrew Chisholm completed his degree in Physics from Oxford University nearly thirty years ago. This coincided with the growth in software engineering and it led him to a career in the IT industry. For the last decade he has been very involved in mobile telecommunications, where he is currently a product manager for a market-leading test and monitoring solution used by many mobile operators worldwide. Throughout his career, he has always maintained an active interest in all aspects of data. In particular, he has always enjoyed finding ways to extract value from data and presenting this in compelling ways to help others meet their objectives. Recently, he completed a Master's in Data Mining and Business Intelligence with first class honors. He is a certified RapidMiner expert and has been using this product to solve real problems for several years. He maintains a blog where he shares some miscellaneous helpful advice on how to get the best out of RapidMiner. He approaches problems from a practical perspective and has a great deal of relevant hands-on experience with real data. This book draws this experience together in the context of exploring data—the first and most important step in a data mining process. He has published conference papers relating to unsupervised clustering and cluster validity measures and contributed a chapter called Visualizing cluster validity measures to an upcoming book entitled RapidMiner: Use Cases and Business Analytics Applications, Chapman & Hall/CRC

                        Code Downloads

                        Download the code and support files for this book.

                        Submit Errata

                        Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                        Sample chapters

                        You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                        Frequently bought together

                        Exploring Data with RapidMiner +    Drupal 6 Search Engine Optimization =
                        50% Off
                        the second eBook
                        Price for both: $34.95

                        Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                        What you will learn from this book

                        • Import real data from files in multiple formats and from databases
                        • Extract features from structured and unstructured data
                        • Restructure, reduce, and summarize data to help you understand it more easily and process it more quickly
                        • Visualize data in new ways to help you understand it
                        • Detect outliers and methods to handle them
                        • Detect missing data and implement ways to handle it
                        • Understand resource constraints and what to do about them

                        In Detail

                        Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.

                        Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.

                        Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses
                        real examples to help you understand how to set up processes, quickly.

                        This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.


                        A step-by-step tutorial style using examples so that users of different levels will benefit from the facilities offered by RapidMiner.

                        Who this book is for

                        If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a basic awareness of data mining techniques and some exposure to RapidMiner.

                        Code Download and Errata
                        Packt Anytime, Anywhere
                        Register Books
                        Print Upgrades
                        eBook Downloads
                        Video Support
                        Contact Us
                        Awards Voting Nominations Previous Winners
                        Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                        Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software