Reader small image

You're reading from  Developing Kaggle Notebooks

Product typeBook
Published inDec 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781805128519
Edition1st Edition
Languages
Right arrow
Author (1)
Gabriel Preda
Gabriel Preda
author image
Gabriel Preda

Dr. Gabriel Preda is a Principal Data Scientist for Endava, a major software services company. He has worked on projects in various industries, including financial services, banking, portfolio management, telecom, and healthcare, developing machine learning solutions for various business problems, including risk prediction, churn analysis, anomaly detection, task recommendations, and document information extraction. In addition, he is very active in competitive machine learning, currently holding the title of a three-time Kaggle Grandmaster and is well-known for his Kaggle Notebooks.
Read more about Gabriel Preda

Right arrow

Models

Models are the newest section introduced on the platform, at the time of writing this book being less than one month old. Models started to be contributed quite often by the users in several ways and for few purposes. Most frequently, models were saved as output of Notebooks (Code) after being trained using a custom code and related to a competition. Then the model will be optionally included in a dataset or used directly (one can add to a code either a dataset or directly the output of another code). Also, sometime models build outside the platform were uploaded as datasets and then included in the pipeline of users to prepare a solution for a competition. Meantime models repositories were available either through a public cloud, like Google Cloud, AWS, or Azure or from a company specialized in such a service, like HuggingFace. With the concept of downloadable models, ready to be used or easy to fine-tune for a custom task, Kaggle choose to include the Models in his platform....

The content of next chapters

We learned what Kaggle is and how we can use the resources and features of the platform. Let’s take a quick look into the content of the next chapters that focus on how to create original, insightful, and recognizable content in the Notebooks space.

Getting ready for Kaggle environment

You learn here more details about Code features on Kaggle, with information about the computing environments, how to use the online editor, how to fork and modify an existing example, how to use the source control facilities on Kaggle to either save or save and run a new Notebook.

Starting our travel – how to survive on Titanic?

Most of the Kagglers will start their journey on the platform with this competition. Although is using a small and simple dataset, it has some hidden insights that we will explore together. Here we start to build the skills that we will further develop in the book. We introduce some tools for data analysis in Python (pandas and numpy...

Summary

In this chapter we learned about the Kaggle platform resources and capabilities and introduced the content of the following chapters. It’s now the time to get ready for your trip. In the next chapter, you will learn how to use the full capacity of the platform to code, get familiar with the development environment, learn how to use it at its maximum potential. Let’s get ready.

Exploring notebook capabilities

Notebooks serve as powerful tools for data exploration, model training, and running inferences. In this section, we will examine the various capabilities that Kaggle Notebooks have to offer.

We will start off with the most frequently used features of notebooks. We will go through the options to add various resources to a notebook (data and models) and to modify the execution environment. Then, we continue with more advanced features, which will include setting up utility scripts, adding or using secrets, using Google Cloud services, or upgrading a notebook to a Google Cloud AI Notebook. Let’s get started!

Basic capabilities

On the right-side panel, we have quick menu actions for access to frequently used features of notebooks. In the following screenshot, we take a more detailed look at these quick menu actions.

Figure 2.6: Zoomed-in view of the right-side panel with quick menus

As you can see, the first quick menu actions...

Using the Kaggle API to create, update, download, and monitor your notebooks

The Kaggle API is a powerful tool that extends the functionality available in the Kaggle user interface. You can use it for various tasks: define, update, and download datasets, submit to competitions, define new notebooks, push or pull versions of notebooks, or verify a run status.

There are just two simple steps for you to start using the Kaggle API. Let’s get started:

  1. First, you will need to create an authentication token. Navigate to your account, and from the right-side icon, select the menu item Account. Then go to the API section. Here, click on the Create new API token button to download your authentication token (it is a file named kaggle.json). If you will be using the Kaggle API from a Windows machine, its location is C:\Users\<your_name>\.kaggle\kaggle.json. On a Mac or Linux machine, the path to the file should be ~/.kaggle/kaggle.json.
  2. Next, you will have to...

Summary

In this chapter, we learned what Kaggle Notebooks are, what types we can use, and with what programming languages. We also learned how to create, run, and update notebooks. We then visited some of the basic features for using notebooks, which will allow you to start using notebooks in an effective way, to ingest and analyze data from datasets or competitions, to start training models, and to prepare submissions for competitions. Additionally, we also reviewed some of the advanced features and even introduced the use of the Kaggle API to further extend your usage of notebooks, allowing you to build external data and ML pipelines that integrate with your Kaggle environment.

The more advanced features give you more flexibility in using Kaggle Notebooks. With Utility scripts, you can create modular code, with specialized Python modules for ingesting data, performing statistical analysis on it, preparing visualizations, generating features, and building models. You can reuse...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Developing Kaggle Notebooks
Published in: Dec 2023Publisher: PacktISBN-13: 9781805128519
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Gabriel Preda

Dr. Gabriel Preda is a Principal Data Scientist for Endava, a major software services company. He has worked on projects in various industries, including financial services, banking, portfolio management, telecom, and healthcare, developing machine learning solutions for various business problems, including risk prediction, churn analysis, anomaly detection, task recommendations, and document information extraction. In addition, he is very active in competitive machine learning, currently holding the title of a three-time Kaggle Grandmaster and is well-known for his Kaggle Notebooks.
Read more about Gabriel Preda