You're reading from Developing Kaggle Notebooks

Product typeBook

Published inDec 2023

Reading LevelIntermediate

PublisherPackt

ISBN-139781805128519

Edition1st Edition

Languages

Python

Concepts

Data Analysis

Author (1)

Gabriel Preda

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Qr code Description automatically generated

In this chapter we will analyze data from an early Kaggle competition, Data Science for Good: Kiva Crowdfunding (see reference [1]). You will learn how to tell a story with data in a way that is both informative and catching for the reader. Then we will verify through a detailed analysis a hypothesis from another competition dataset, about Kaggle Metadata.

All data tells a story

Approaching a new dataset resembles sometime to an archeological excavation and sometime to a police investigation. We proceed to unearth hidden valorous insights from under a pile of data or we try to uncover elusive evidence by a systematic, sometime arride process, that resemble either the technical discipline of the archeologist or the method of the detective. All data can tell a story. It is the analyst choice if this story is told in the style of a scientific report or in the vivid, attractive form of a detective novel. In this chapter, we will combine techniques we developed in previous chapters for analysis of tabular- numerical and categorical, text and geospatial data and combining data from multiple sources and show how to tell a story with data.

The background

Kiva.org is an online crowdfunding platform that has the mission to extend to the poor and financially excluded people around the world the benefits of financial services. Those people can benefit, through services of Kiva, to borrow small amounts of money. These microloans are provided by Kiva through their partnerships with financial services institutions in the countries where resides the receiver of the loans. In the past, Kiva has provided in the targeted communities, over one billion US dollars in microloans. To extend the reach of their assistance, and at the same time, to improve the understanding of specific needs, factors that make the whole difference for impoverished people in different parts of the world, Kiva wanted to better understand the particular conditions of each potential borrower. Due to the diversity of the problems in different parts of the world, the specificity of each case, the multitude of influencing factors, the mission of Kiva to identify...

The data

The competition requires the participants to identify or collect relevant data, besides the data provided by organizers. This includes loans information, Kiva global multidimensional poverty index (MPI) by region and location, loan theme and loan themes by region. The loans information include an unique id, loan theme id, loan theme type, local financial organization partner id, funded amount (how much Kiva provided to the local partner), loan amount (how much the local partner disbursed to the borrower), the activity of the borrower, sector, use of the loan, country code, country name, region, currency, posted time, disbursed time, funded time, duration in time for which the loan was disbursed, the total number of lenders that contributed to a loan, gender of borrowers, and repayment interval. Kiva MPI by region and location include region or country name, ISO-3 code for the country, region, world region, MPI value and the geolocation (latitude and longitude) of the current...

What is a good solution to an analytics competition?

It is important to stress from the start that one good solution for an analytics competition is not necessarily a complete exploratory data analysis. From experience with several analytics competitions and looking to the highest ranked solutions, sometime is quite the opposite. Criteria for scoring a solution of an analytics competition are changing in time, but some are repeatedly adopted. The evaluators will frequently look to the originality of the approach, to the composition and to the documentation. To obtain high scores on all these criteria, the authors will have to prepare very well. An extended exploration of the data is still necessary, so that all results presented can be well documented. While useful for the research, this approach doesn’t need to be fully included in the narrative of the solution Notebook. Actually, the author could select and discuss, in his story, a small part of the data, as long as the narrative...

The rest of the chapter is locked

You have been reading a chapter from

Developing Kaggle Notebooks

Published in: Dec 2023Publisher: PacktISBN-13: 9781805128519

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Gabriel Preda

Dr. Gabriel Preda is a Principal Data Scientist for Endava, a major software services company. He has worked on projects in various industries, including financial services, banking, portfolio management, telecom, and healthcare, developing machine learning solutions for various business problems, including risk prediction, churn analysis, anomaly detection, task recommendations, and document information extraction. In addition, he is very active in competitive machine learning, currently holding the title of a three-time Kaggle Grandmaster and is well-known for his Kaggle Notebooks.
Read more about Gabriel Preda

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages