Reader small image

You're reading from  Data Ingestion with Python Cookbook

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837632602
Edition1st Edition
Right arrow
Author (1)
Gláucia Esppenchutz
Gláucia Esppenchutz
author image
Gláucia Esppenchutz

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz

Right arrow

Inserting formatted SparkSession logs to facilitate your work

A commonly underestimated best practice is how to create valuable logs. Applications that log information and small code files can save a significant amount of debugging time. This is also true when ingesting or processing data.

This recipe approaches the best practice of logging events in our PySpark scripts. The examples here will give a more generic overview, which can be applied to any other piece of code and will even be used later in this book.

Getting ready

We will use the listings.csv file to execute the read method from Spark. You can find this dataset inside the GitHub repository for this book. Make sure your SparkSession is up and running.

How to do it…

Here are the steps to perform this recipe:

  1. Setting the log level: Now, using sparkContext, we will assign the log level:
    spark.sparkContext.setLogLevel("INFO")
  2. Instantiating the log4j logger: The next step is to create...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Ingestion with Python Cookbook
Published in: May 2023Publisher: PacktISBN-13: 9781837632602

Author (1)

author image
Gláucia Esppenchutz

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz