Reader small image

You're reading from  Practical Predictive Analytics

Product typeBook
Published inJun 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781785886188
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Ralph Winters
Ralph Winters
author image
Ralph Winters

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customerretention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models. He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists. Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA. Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable. Ralph's web site can be found at ralphwinters.com
Read more about Ralph Winters

Right arrow

Other helpful tools


Man does not live by bread alone, so it would behave you to learn additional tools in addition to R, so as to advance your analytic skills:

  • SQL: SQL is a valuable tool to know, regardless of which language/package/environment you choose to work in. Virtually every analytics tool will have a SQL interface, and knowledge of how to optimize SQL queries will definitely speed up your productivity, especially if you are doing a lot of data extraction directly from a SQL database. Today's common thought is to do as much pre-processing as possible within the database, so if you will be doing a lot of extracting from databases such as MySQL, PostgreSQL, Oracle, or Teradata, it will be a good thing to learn how queries are optimized within their native framework. In the R language, there are several SQL packages that are useful for interfacing with various external databases. We will be using sqldf, which is a popular R package for interfacing with R dataframes. There are other packages that are specifically tailored for the specific database you will be working with.
  • Web extraction tools: Not every data source will originate from a data warehouse. Knowledge of APIs that extract data from the internet will be valuable to know. Some popular tools include Curl and Jsonlite.
  • Spreadsheets: Despite their problems, spreadsheets are often the fastest way to do quick data analysis and, more importantly, enable you to share your results with others! R offers several interfaces to spreadsheets but, again, learning standalone spreadsheet skills such as pivot tables and Virtual Basic for applications will give you an advantage if you work for corporations in which these skills are heavily used.
  • Data visualization tools: Data visualization tools are great for adding impact to an analysis, and for concisely encapsulating complex information. Native R visualization tools are great, but not every company will be using R. Learn some third-party visualization tools such as D3.js, Google Charts, Qlikview, or Tableau.
  • Big data, Spark, Hadoop, NoSQL database: It is becoming increasingly important to know a little bit about these technologies, at least from the viewpoint of having to extract and analyze data that resides within these frameworks. Many software packages have APIs that talk directly to Hadoop and can run predictive analytics directly within the native environment, or extract data and perform the analytics locally.

Past the basics

Given that the predictive analytics space is so huge, once you are past the basics, ask yourself what area of predictive analytics really interests you, and what you would like to specialize in. Learning all you can about everything concerning predictive analytics is good at the beginning, but ultimately you will be called upon because you are an expert in certain industries or techniques. This could be research, algorithmic development, or even managing analytics teams.

Data analytics/research

But, as general guidance, if you are involved in, or are oriented toward, data, the analytics or research portion of data science, I would suggest that you concentrate on data mining methodologies and specific data modeling techniques that are heavily prevalent in the specific industries that interest you.

For example, logistic regression is heavily used in the insurance industry, but social network analysis is not. Economic research is geared toward time series analysis, but not so much cluster analysis. Recommender engines are prevalent in online retail.

Data engineering

If you are involved more on the data engineering side, concentrate more on data cleaning, being able to integrate various data sources, and the tools needed to accomplish this.

Management

If you are a manager, concentrate on model development, testing and control, metadata, and presenting results to upper management in order to demonstrate value or return on investment.

Team data science

Of course, predictive analytics is becoming more of a team sport, rather than a solo endeavor, and the data science team is very much alive. There is a lot that has been written about the components of a data science team, much of which can be reduced to the three basic skills that I outlined earlier.

Two different ways to look at predictive analytics

Various industries interpret the goals of predictive analytics differently. For example, social science and marketing like to understand the factors which go into a model, and can sacrifice a bit of accuracy if a model can be explained well enough. On the other hand, a black box stock trading model is more interested in minimizing the number of bad trades, and at the end of the day tallies up the gains and losses, not really caring which parts of the trading algorithm worked. Accuracy is more important in the end.

Depending upon how you intend to approach a particular problem, look at how two different analytical mindsets can affect the predictive analytics process:

  1. Minimize prediction error goal: This is a very common use case for machine learning. The initial goal is to predict using the appropriate algorithms in order to minimize the prediction error. If done incorrectly, an algorithm will ultimately fail and it will need to be continually optimized to come up with the new best algorithm. If this is performed mechanically without regard to understanding the model, this will certainly result in failed outcomes. Certain models, especially over optimized ones with many variables, can have a very high prediction rate, but be unstable in a variety of ways. If one does not have an understanding of the model, it can be difficult to react to changes in the data input.
  2. Understanding model goal: This came out of the scientific method and is tied closely to the concept of hypothesis testing. This can be done in certain kinds of models, such as regression and decision trees, and is more difficult in other kinds of models such as Support Vector Machine (SVM) and neural networks. In the understanding model paradigm, understanding causation or impact becomes more important than optimizing correlations. Typically, understanding models has a lower prediction rate, but has the advantage of knowing more about the causations of the individual parts of the model, and how they are related. For example, industries that rely on understanding human behavior emphasize understanding model goals. A limitation to this orientation is that we might tend to discard results that are not immediately understood. It takes discipline to accept a model with lower prediction ability. However, you can also gain model stability

Of course, the previous examples illustrate two disparate approaches. Combination models, which use the best of both worlds, should be the ones we should strive for. Therefore, one goal for a final model is one which:

  • Has an acceptable prediction error
  • Is stable over time
  • Needs a minimum of maintenance
  • Is simple enough to understand and explain.

You will learn later that is this related to Bias/Variance tradeoff.

Previous PageNext Page
You have been reading a chapter from
Practical Predictive Analytics
Published in: Jun 2017Publisher: PacktISBN-13: 9781785886188
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ralph Winters

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customerretention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models. He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists. Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA. Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable. Ralph's web site can be found at ralphwinters.com
Read more about Ralph Winters