Free Sample
+ Collection

Exploring Data with RapidMiner

Andrew Chisholm

RapidMiner is a highly versatile tool that can make data work harder for you. This book will show you how to import, parse, and structure your data with remarkable speed and efficiency. It’s data mining made accessible.
RRP $23.99
RRP $39.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782169338
Paperback162 pages

About This Book

  • See how to import, parse, and structure your data quickly and effectively
  • Understand the visualization possibilities and be inspired to use these with your own data
  • Structured in a modular way to adhere to standard industry processes

Who This Book Is For

If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a basic awareness of data mining techniques and some exposure to RapidMiner.

Table of Contents

Chapter 1: Setting the Scene
A process framework
Data volume and velocity
Data variety, formats, and meanings
Missing data
Cleaning data
Visualizing data
Resource constraints
Accompanying material
Chapter 2: Loading Data
Reading files
Using macros
Chapter 3: Visualizing Data
Getting started
Statistical summaries
Relationships between attributes
Time series data
Relations between examples
Chapter 4: Parsing and Converting Attributes
Generating attributes
Renaming attributes
Chapter 5: Outliers
Manual inspection
Automated detection of example outliers
Chapter 6: Missing Values
Missing or empty?
Types of missing data
Categorizing missing data
Effect of missing data
Options for handling missing data
Chapter 7: Transforming Data
Creating new attributes
Using pivoting
Using de-pivoting
Windowing data
Chapter 8: Reducing Data Size
Removing examples using sampling
Removing attributes
Chapter 9: Resource Constraints
Measuring and estimating performance
Adding memory
Parallel processing
Restructuring processes
Chapter 10: Debugging
Breakpoints in RapidMiner Studio
Logging data in RapidMiner Studio
RapidMiner Studio console printing
Groovy scripts
Regex tools
Using XPath effectively
Chapter 11: Taking Stock
Exploring new techniques
Where to go next

What You Will Learn

  • Import real data from files in multiple formats and from databases
  • Extract features from structured and unstructured data
  • Restructure, reduce, and summarize data to help you understand it more easily and process it more quickly
  • Visualize data in new ways to help you understand it
  • Detect outliers and methods to handle them
  • Detect missing data and implement ways to handle it
  • Understand resource constraints and what to do about them

In Detail

Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications.

Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner.

Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses
real examples to help you understand how to set up processes, quickly.

This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.


Read More

Recommended for You