Reader small image

You're reading from  Mastering Numerical Computing with NumPy

Product typeBook
Published inJun 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788993357
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Umit Mert Cakmak
Umit Mert Cakmak
author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

Tiago Antao
Tiago Antao
author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

Mert Cuhadaroglu
Mert Cuhadaroglu
author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu

View More author details
Right arrow

Trimmed statistics

As you will have noticed in the previous section, the distributions of our features are very dispersed. Handling the outliers in your model is a very important part of your analysis. It is also very crucial when you look at descriptive statistics. You can be easily confused and misinterpret the distribution because of these extreme values. SciPy has very extensive statistical functions for calculating your descriptive statistics in regards to trimming your data. The main idea of using the trimmed statistics is to remove the outliers (tails) in order to reduce their effect on statistical calculations. Let's see how we can use these functions and how they will affect our feature distribution:

In [58]: np.set_printoptions(suppress= True, linewidth= 125)
samples = dataset.data
CRIM = samples[:,0:1]
minimum = np.round(np.amin(CRIM), decimals...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Numerical Computing with NumPy
Published in: Jun 2018Publisher: PacktISBN-13: 9781788993357

Authors (3)

author image
Umit Mert Cakmak

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.
Read more about Umit Mert Cakmak

author image
Tiago Antao

Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.
Read more about Tiago Antao

author image
Mert Cuhadaroglu

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.
Read more about Mert Cuhadaroglu