You're reading from R Statistics Cookbook

Product typeBook

Published inMar 2019

Reading LevelExpert

PublisherPackt

ISBN-139781789802566

Edition1st Edition

Languages

Tools

ggplot

Concepts

Statistics

Author (1)

Francisco Juretig

Nonparametric Methods

We will cover the following recipes in this chapter:

The Mann-Whitney test
Estimating nonparametric ANOVA
The Spearman's rank correlation test
LOESS regression
Finding the best transformations via the acepack package
Nonparametric multivariate tests using the npmv package
Semiparametric regression with the semiPar package

Introduction

Unfortunately, parametric methods such as the t-test or ordinary least squares (OLS), make very strong assumptions about the distribution of the data. To some extent, they still work if the distributional assumptions are relaxed, but it really depends to which extent these assumptions are violated.

Nonparametric methods do not work with the usual parametrized distributions and are instead designed to work with any distribution. This gives them a distinct flexibility, and we are no longer required to check any distributional assumption on the data. If the data follows the same distribution that its parametric counterpart requires, they usually perform almost as well.

The Mann-Whitney test

We have already discussed how to compare the means from two groups, when both groups are distributed according to a Gaussian distribution with the same variance. However, the nonparametric test requires no distributional assumption and works well almost every time. Of course, if both distributions are Gaussian with the same variance, then the regular t-test is better—this is derived from the fact that the t-test is uniformly the most powerful one.

The Mann-Whitney-Wilcoxon test is a nonparametric test that tests the null hypothesis that any element chosen at random from group A is equally likely to be greater or smaller than a respective random item from group B. A different way of posing this test is to think of it as a test of whether the distributions of group A and B are the same. The only strong assumption that this test requires is that the observations...

Estimating nonparametric ANOVA

The Mann-Whitney-Wilcoxon test that we presented in the previous recipe, can be extended to multiple groups (not just 2 was before). For one-way Analysis Of Variance (ANOVA), the test that is used is called Kruskal-Wallis; we have the kruskal.test() function in base R.

For nonparametric two-way ANOVA, the Scheirer-Ray-Hare test can be used; however, the documentation is scarce, and it is not frequently used.

Getting ready

In order to run this script, you need to install the FSA and dplyr packages.

How to do it...

We will work with nonparametric...

The Spearman's rank correlation test

The correlation coefficient between X and Y that we usually use is obtained by dividing the covariance of X, Y by the product of the variances of X and Y. It is therefore restricted to lie between -1 and 1. When the correlation is -1, it means that there is a strong negative relationship between the variables. When it is 1, it means that there is a strong positive relationship; and when it is 0, it means that there is no relationship between the variables. But there is an implicit assumption that we usually overlook: the correlation coefficient assumes that there is a linear relationship. So, it is easy to imagine lots of cases where there might be a relationship, but not a linear one.

The Spearman rank statistic does not test correlation in the traditional sense ((whether a greater than average value of X is associated linearly with a...

LOESS regression

When we have a scatterplot between two variables Y, X we usually want to present a curve that relates the two variables. Firstly, because it allows us to see if the relationship is linear (or almost linear); secondly, because interpreting scatterplots is sometimes hard; and, finally, because we might want to have a simple model that can be used to predict Y in terms of X capturing all possible nonlinear patterns.

Locally Estimated Scatterplot Smoothing (LOESS) regression works by fitting lots of local models around each point. These local models are then averaged out. In particular, each model (fitted around a point X₀,Y₀) is fitted using weighted least squares (each point is weighted by how close the regressors are to the point X₀). There is a parameter specified by the user, called the bandwidth, which specifies how much data is used in each one of these regressions...

Finding the best transformations via the acepack package

When fitting linear regressions models, we always want them to fit as best as possible into the data. Sometimes, we want to transform our variables in order to get the model fit to improve as much as possible. For example, we could apply several transformations (taking logarithms, squared values, and so on) in order to improve the fit.

The acepack package implements the alternating conditional expectation algorithm, which finds the optimal transformations that we need to apply to our data in order to maximize the R2. Another way of looking at this would be: given the data that we have, what would be the best R2 we could get if we found the best possible transformations? In this fashion, we could get a maximum boundary on the best model that we would be able to get, assuming we can only transform the variables to capture...

Nonparametric multivariate tests using the npmv package

In our parametric scenario, we used the t-test to compare means across two populations, and Hotelling's T2 to compare a vector of means across two populations. We then extended these cases to ANOVA and MANOVA respectively in case we were dealing with multiple populations. The underlying assumption is that the data comes from a Gaussian population in the first case and a multivariate Gaussian in the second one. In this recipe we will use the npmv package to to non-parametric MANOVA.

Traditional Multivariate Analysis Of Variance (MANOVA) has two main problems: firstly, it depends on a multivariate Gaussian assumption that is hard to satisfy in practice. Secondly, it is hard to identify which are the groups or variables producing the differences.

The npmv package offers a solution to both problems: it does not rely on any...

Semiparametric regression with the SemiPar package

Semiparametric models encompass a huge family of models that have a fully parametric (finite number of parameters) with a nonparametric part. In general, the parametric part will be linear, and the semiparametric part will be treated as nuisance; but this is not always the case. One example where a semiparametric model would be relevant, could be for example modeling the ice-cream sales in terms of the weather and the price. It's likely that the sales-weather relationship is highly nonlinear (sales are really high when the temperature is high, but low when the temperature is moderate), whereas the price-sales one could be quite linear. In that case, we would want to treat the price effect as linear and the rest as nuisance.

Getting...

The rest of the chapter is locked

You have been reading a chapter from

R Statistics Cookbook

Published in: Mar 2019Publisher: PacktISBN-13: 9781789802566

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Francisco Juretig

Francisco Juretig has worked for over a decade in a variety of industries such as retail, gambling and finance deploying data-science solutions. He has written several R packages, and is a frequent contributor to the open source community.
Read more about Francisco Juretig

Other recommended products

Related to this chapter

Data Analysis with IBM SPSS Statistics

SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. This book will have a comprehensive coverage of IBM’s premier statistics and data analysis tool – IBM SPSS Statistics. It is designed for business professionals who wish to analyze their data. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.

BookSep 2017446 pages

Associations and Correlations

Through this book, you’ll learn why most statistical techniques give incorrect results and what you can do to avoid the most common pitfalls. You’ll learn how to make sure you get the correct results the first time, every time.

BookJun 2019134 pages

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

BookOct 2017572 pages

Bayesian Analysis with Python

Bayesian inference uses probability distributions and Bayes' theorem to build flexible models. The book uses PyMC3 to abstract all the mathematical and computational details from this process allowing readers to solve a wide range of problems in data science.

BookDec 2018356 pages4

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

Practical Time Series Analysis

Practical Time Series Analysis will introduce you to the basic concepts of time series analysis and describe powerful yet simple techniques in Python which data scientists and data engineers would find useful in dealing with real life datasets in industrial settings. This book focuses on explaining important concepts and practical techniques to process, summarize and model time series data. Real life case studies with code snippets in Python are used to demonstrate the concepts and techniques.

BookSep 2017244 pages

Hands-On Time Series Analysis with R

This book introduces you to time series analysis and forecasting with R; this is one of the key fields in statistical programming and includes techniques for analyzing data to extract meaningful insights. You will explore methods, such as prediction with time series analysis, and identify the relationship between each data point in the series.

BookMay 2019448 pages

Data Analysis with R

R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.

BookMar 2018570 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Learning Quantitative Finance with R

This book covers applications of quantitative finance in R. It starts with the basics of quantitative finance and goes to complexity at the end of the book along with a varying degree of R complexity. This will guide you to implement different trading strategies for various financial instruments using basic to complex techniques along with its optimization and keeping the risk of financial instruments in check.

BookMar 2017284 pages

Practical Machine Learning Cookbook

Machine learning is the new BLACK GOLD. In this book, we explore topics such as classification, clustering, model selection and regularization, nonlinearity, supervised, unsupervised, and reinforcement learning, structured prediction, neural networks, deep learning, and case studies. The algorithms are developed using R.The book is for students and professionals in the field of statistics, data analytics, and computer science.

BookApr 2017570 pages

Advanced Analytics with R and Tableau

R is the go-to tool for statistics and data mining while Tableau offers an interface to filter data, plug and play with rich visualizations to describe insights from your data. When combined these two tools makes it easier to harness interesting patterns and communicate stories. This book covers various analytical techniques like prediction, classification, clustering and best practices to visualize it using interactive dashboard with drop-downs, sliders, and other visual cues of Tableau. Get to know how R can be used in conjunction with Tableau and implement powerful machine learning techniques making big data analytics accessible and presentable through Tableau workbooks.

BookAug 2017178 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages