You're reading from Cracking the Data Science Interview

Product typeBook

Published inFeb 2024

PublisherPackt

ISBN-139781805120506

Edition1st Edition

Concepts

Data Science

Authors (2):

Leondra R. Gonzalez

Aaren Stubberfield

View More author details

Programming with Python

Starting from this chapter, we will now transition into preparing you for the technical portion of data science job interviews. For this reason, this second part of the book is best used as a study/quick reference guide as you prepare for your interviews. Therefore, feel free to skip or review chapters according to your studying needs.

In each of the following chapters, we will review key concepts and provide sample problems. Thus, it is important that you are at least familiar with introductory programming concepts, preferably with functional programming. This includes, but is not limited to, syntax, data types, variables and assignments, control flow, and packages such as pandas and numpy for data wrangling.

By the end of this chapter in particular, you will have a handle on expected Python questions within a data science interview, and know how to tackle them logically. Additionally, you will be more comfortable and confident with thinking through...

Using variables, data types, and data structures

In Python, variables are the building blocks of any code. It’s simply a value of some given type assigned to an object. For example, if I set a variable called x equal to 10, the variable x now holds that value (until it is changed). In short, variables are used to store data. Unlike some other programming languages, such as Java, the variable type does not need explicit declaration in Python. The declaration or type of a variable is determined automatically when you assign a value to it (although you can and should change data types as needed). There are several built-in data types in Python. Here are some common ones:

Numeric types: There are numerous types of numeric data types, including int (integers), float (floating-point numbers), and complex (complex numbers). Numeric variables in Python are used to store numerical data:
- Integers represent whole numbers without any fractional or decimal part. They can be positive...

Indexing in Python

To access values within a data object, we use indexing. Indexing is the process of accessing individual elements within a data structure. In this case, the data structure is a list, but as you will soon learn, indexing is applicable to many data structures.

Note

Each element or item within a data structure is assigned a unique index or position, starting from a specific value. In Python, this value is 0. This means that the first position in any data structure in Python is located at index 0, followed by the second position, which is located at index 1, and so on.

Indexing allows you to retrieve or manipulate specific elements within the data structure by specifying their index. It provides a way to refer to elements individually rather than accessing the entire data structure as a whole.

The basic syntax for indexing a list or tuple in Python is as follows:

list_or_tuple_name[index_position]

The list_or_tuple_name object is the name of the list...

Using string operations

String operations are very common when working with Python and text data. Therefore, this section will review how to initialize a string, string indexing/slicing, and some common string methods.

Note

We will not review string regular expressions, as this is a large topic with significant depth. Check out Mastering Python Regular Expressions by Victor Romero and Felix L. Luis for more instructions on this topic.

Initializing a string

Python allows for string initialization (creation) in several ways. Two ways include single quotes ('') and double quotes (""):

# Single quotes
s = 'Hello, World!'
print(s)  # prints: Hello, World!
# Double quotes
s = "Hello, World!"
print(s)  # prints: Hello, World!

Single and double quotes are basically interchangeable. The only difference comes into play when you have a quote mark (single or double) inside a string. For example, one common scenario is...

Using Python control statements, loops, and list comprehensions

Control statements are used for various tasks. For example, they’re used to filter data based on certain conditions, perform a calculation on each item in a list, iterate through rows in a dataframe, and more. Additionally, list comprehensions are widely used in data science as they provide efficiency and legibility. It’s often used in data cleaning and preprocessing tasks, feature engineering, and more.

Control statements in Python allow you to control the flow of your program’s execution based on certain conditions or loops. The main types of control statements are conditional statements (such as if, elif, and else) and loop statements (such as for and while).

Meanwhile, list comprehensions are a sort of short-hand approach to writing loop statements. More specifically, they are a shorter, more concise syntax for creating a list based on the values of an existing list.

Conditional statements...

Using user-defined functions

Sometimes, you may need to create your own function to perform very specific operations. This is common in the data science world, especially as it relates to data cleaning, preprocessing, and modeling activities.

In this section, we will discuss user-defined functions, which are functions created by the programmer to perform specific tasks. They are not unlike mathematical functions, which (usually) take some inputs and (often) produce some outputs. User-defined functions are designed to take 0 or more inputs, do some specific computation(s) (we’ll just call it stuff), and produce an output.

This process is especially helpful when performing repeated tasks. In fact, the rule of thumb is to use it if you have to do a task more than once. In more advanced cases, user functions are also helpful for code reusability, organization, readability, and maintainability.

Breaking down the user-defined function syntax

When used effectively, user...

Handling files in Python

In Python, the built-in open function is used to open a file, and it returns a file object. Once a file is opened, you can read its contents using the read method. However, an important aspect to consider while managing files is ensuring they are closed after use, allowing for the setup and teardown of computational resources. One way to accomplish this is by using context managers.

Context managers are an object that manages the context of a block of code, typically with a with statement. It’s particularly useful for setting up and tearing down computational resources, such as efficiently opening and closing files. In short, the with keyword, which automatically closes the file once the nested block of code is executed, is more efficient and reduces the risk of a file not being properly closed.

The syntax to open files using context managers is as follows:

with open(<file_name.csv>) as file_object:
    # Code block...

Wrangling data with pandas

Data wrangling is one of the most important topics in data science interviews. For starters, data is often not presented in an analysis-ready format, which makes it necessary for data modeling preprocessing and addressing data quality concerns. Thus, data scientists can spend upward of 80% of their time cleaning and wrangling data [1].

Furthermore, data wrangling skills demonstrate your comfort and fluency with computer programming. Having the ability to use functions, loops, indexing, aggregation, filtering, and forming calculations will serve you well in your data science journey, enabling you to complete work quickly and efficiently. It is also fundamental for extract, transform, load (ETL) activities, querying data, data modeling, descriptive statistics, reporting, and a host of other data tasks.

In this section, we will review a couple of common data wrangling challenges, including handling missing data, filtering data, merging, and aggregating...

Summary

In this chapter, we covered many Python programming fundamentals you would need for your technical interview. First, we covered Python variable data types and string operations, including string indexing. Afterward, we reviewed Python list comprehensions and control statements, including loops. Then we focused on some aspects of Python classes, indexing, merging, sorting, data aggregation, and handling missing data.

It is incredibly important to be proficient in the area of data wrangling and manipulation, which comprises a large part of data science interviews and assessments. Although it comprises a large part, data wrangling is tested proportional to its presence in data science jobs.

In the next chapter, we will move our focus from Python fundamentals to data visualization and storytelling.

References

[1] A Comparative Study of Data Cleaning Tools by Chen, Z., Oni, S., Hoban, S., & Jademi, O., from International Journal of Data Warehousing and Mining (IJDWM) (2019).

The rest of the chapter is locked

You have been reading a chapter from

Cracking the Data Science Interview

Published in: Feb 2024Publisher: PacktISBN-13: 9781805120506

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Leondra R. Gonzalez

Leondra R. Gonzalez is a data scientist at Microsoft and Chief Data Officer for tech startup CulTRUE, with 10 years of experience in tech, entertainment, and advertising. During her academic career, she has completed educational opportunities with Google, Amazon, NBC, and AT&T.
Read more about Leondra R. Gonzalez

Aaren Stubberfield

Aaren Stubberfield is a senior data scientist for Microsoft's digital advertising business and the author of three popular courses on Datacamp. He graduated with an MS in Predictive Analytics and has over 10 years of experience in various data science and analytical roles focused on finding insights for business-related questions.
Read more about Aaren Stubberfield

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages