Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
SQL for Data Analytics

You're reading from  SQL for Data Analytics

Product type Book
Published in Aug 2019
Publisher Packt
ISBN-13 9781789807356
Pages 386 pages
Edition 1st Edition
Languages
Authors (3):
Upom Malik Upom Malik
Profile icon Upom Malik
Matt Goldwasser Matt Goldwasser
Profile icon Matt Goldwasser
Benjamin Johnston Benjamin Johnston
Profile icon Benjamin Johnston
View More author details

Table of Contents (11) Chapters

Preface 1. Understanding and Describing Data 2. The Basics of SQL for Analytics 3. SQL for Data Preparation 4. Aggregate Functions for Data Analysis 5. Window Functions for Data Analysis 6. Importing and Exporting Data 7. Analytics Using Complex Data Types 8. Performant SQL 9. Using SQL to Uncover the Truth – a Case Study Appendix

Summary

Data is a powerful method by which to understand the world. The ultimate goal for analytics is to turn data into information and knowledge. To accomplish this goal, statistics can be used to better understand data, especially descriptive statistics, and statistical significance testing.

One branch of descriptive statistics, univariate analysis, can be utilized to understand a single variable of data. Univariate analysis can be used to find the distribution of data by utilizing frequency distributions and quantiles. We can also find the central tendency of a variable by calculating the mean, median, and mode of data. It can also be used to find the dispersion of data using the range, standard deviation, and IQR. Univariate analysis can also be used to find outliers.

Bivariate analysis can also be used to understand the relationship between data. Using scatterplots, we can determine trends, changes in trends, periodic behavior, and anomalous points in regard to two variables. We can also use the Pearson correlation coefficient to measure the strength of a linear trend between the two variables. The Pearson correlation coefficient, however, is subject to scrutiny due to outliers or the number of data points used to calculate the coefficient. Additionally, just because two variables have a strong correlation coefficient does not mean that one variable causes the other variable.

Statistical significance testing can also provide important information about data. Statistical significance testing allows us to determine how likely certain outcomes are to occur by chance and can help us to understand whether the changes seen between groups are of consequence.

Now that we have the basic analytical tools necessary to understand data, we will now review SQL and how we can use it to manipulate a database in the next chapter.

You have been reading a chapter from
SQL for Data Analytics
Published in: Aug 2019 Publisher: Packt ISBN-13: 9781789807356
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}