Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Time Series Analysis with Spark
Time Series Analysis with Spark

Time Series Analysis with Spark: A practical guide to processing, modeling, and forecasting time series with Apache Spark

eBook
$28.79 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Time Series Analysis with Spark

What Are Time Series?

“Time is the wisest counselor of all.” – Pericles

History is fascinating. It offers a profound narrative of our origins, the journey we are on, and the destination we strive toward. History equips us with learnings from the past to better face the future.

Let’s take, for example, the impact of meteorological data on history. Disruptions in weather patterns, starting in the Middle Ages and worsened by the Laki volcanic eruption in 1783, caused widespread hardship in France. This climatic upheaval contributed to the social unrest that ultimately led to the French Revolution in 1789. (Find out more about this in the Further reading section.)

Time series embody this narrative with numbers echoing our past. They are history quantified, a numerical narrative of our collective past, with lessons for the future.

This book takes you on a comprehensive journey with time series, starting with foundational concepts, guiding you through...

Technical requirements

In the first part of the book, which sets the foundations, you can follow along without participating in hands-on examples (although it’s recommended). The latter part of the book will be more practice-driven. If you want to get hands-on from the beginning, the code for this chapter can be found in the GitHub repository of this book at:

https://github.com/PacktPublishing/Time-Series-Analysis-with-Spark/tree/main/ch1

Note

Refer to this GitHub repository for the latest revisions of the code, which will be commented on if updated post-publication. The updated code (if any) might differ from what is presented in the book's code sections.

The following hands-on sections will give you further details to get started with time series analysis.

Introduction to time series

In this section, we will develop an understanding of what time series are and some related terms. This will be illustrated by hands-on examples to visualize time series. We will look at different types of time series and what characterizes them. This knowledge of the nature of time series is necessary for us to choose the appropriate time series analysis approach in the upcoming chapters.

Let’s start with an example of a time series with the average temperature in Mauritius every year since 1950. A short sample of the data is shown in Table 1.1.

Year

Average temperature

1950

22.66

1951

22.35

1952

22.50

1953

22.71

1954

...

Hands-on: Loading and visualizing time series

Let’s go through the hands-on exercise to load a time series dataset and visualize it. We will try to create the visual representation we’ve already seen in Figure 1.1.

Development environment

In order to run the code, you will need a Python development environment where you can install Apache Spark and other required libraries. Specific libraries will be detailed, together with installation instructions, in the corresponding chapters when required.

PaaS

An easy way to get going with these requirements is by using Databricks Community Edition, which is free. This comes with a notebook-based development interface, as well as compute with pre-installed Spark and some other libraries.

The instructions to sign up for Databricks Community Edition can be found here:

https://docs.databricks.com/en/getting-started/community-edition.html

Community Edition’s compute size is limited as it is a free cloud-based...

Breaking a time series down into its components

This section aims to further your understanding of a time series by analyzing its components and detailing several terms introduced so far. This will set you on track for the rest of the book, to use the right methods based on the nature of the time series you are analyzing.

Time series models can be broken down into three main components: trend, seasonality, and residuals:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mi>T</mi><mi>i</mi><mi>m</mi><mi>e</mi><mi>S</mi><mi>e</mi><mi>r</mi><mi>i</mi><mi>e</mi><mi>s</mi><mo>=</mo><mi>T</mi><mi>r</mi><mi>e</mi><mi>n</mi><mi>d</mi><mo>+</mo><mi>S</mi><mi>e</mi><mi>a</mi><mi>s</mi><mi>o</mi><mi>n</mi><mi>a</mi><mi>l</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>+</mo><mi>R</mi><mi>e</mi><mi>s</mi><mi>i</mi><mi>d</mi><mi>u</mi><mi>a</mi><mi>l</mi><mi>s</mi></mrow></mrow></math>

Note

The mathematical representations in this book will follow a simplified English notation, in favour of a broad audience. Refer to the following great resource on time series for mathematical formulations: Forecasting: Principles and Practice: https://otexts.com/fpp3/.

As you will see in the next hands-on section, this breakdown into components is derived from the model fitted to the time series data. For most real-life datasets, the breakdown is only an approximation of reality by the model. As such, each model will come up with its own identification...

Multiple overlapping seasonalities

We will be going through the code to create the data visualization in Figure 1.13. The code for this section is in the notebook file named ts-spark_ch1_3.dbc.

The location URL is as follows: https://github.com/PacktPublishing/Time-Series-Analysis-with-Spark/raw/main/ch1/ts-spark_ch1_3.dbc

The dataset is synthetic and generated as three different sine curves representing three overlapping seasonalities.

The following code is an extract from the notebook. Let’s look at it at a high level:

  1. The import statements add libraries for numerical calculations and for drawing graphs:
    import numpy as np
    from plotly.subplots import make_subplots

    NumPy is an open source Python library for scientific computing significantly more efficient in terms of computation and memory use than standard Python. We will use it here for its mathematical functions.

  2. We then generate a number of sine curves, using np.sin, to represent different seasonalities...

Additional considerations with time series analysis

This section is probably the most important in this early part of the book. In the introductory section, we mentioned some key considerations for time series, such as the preservation of chronological order, regularity, and stationarity. Here, we map out the key challenges and additional considerations when analyzing time series in real-life projects. In doing so, it allows you to plan your learning and practice accordingly, with guidance in the relevant sections of this book as well as further reading.

According to Hidden Technical Debt in Machine Learning Systems a well-known paper published in 2015, only a fraction of the effort is with the code in advanced analytics projects. The rest of the time is mostly spent on other considerations such as data preparation and infrastructure.

The solutions to these challenges are very specific to your context. The aim in this chapter is to bring these considerations, as summarized in...

Summary

Time series are everywhere, and this chapter gave us an introduction to what they are, their components, and the challenges in working with them. We started with some simple code to explore time series, setting the foundation for further practice in upcoming chapters. The concepts discussed in this first chapter will be built upon to get us to the point of analyzing time series at scale by the end of this book.

Now that you understand the “what” for time series, in the next chapter, we will be looking at the “why,” which will pave the way to applications in various domains.

Further reading

This section serves as a repository of sources that can help you build on your understanding of the topic:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Quickly get started with your first models and explore the potential of Generative AI
  • Learn how to use Apache Spark and Databricks for scalable time series solutions
  • Establish best practices to ensure success from development to production and beyond
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Written by Databricks Senior Solutions Architect Yoni Ramaswami, whose expertise in Data and AI has shaped innovative digital transformations across industries, this comprehensive guide bridges foundational concepts of time series analysis with the Spark framework and Databricks, preparing you to tackle real-world challenges with confidence. From preparing and processing large-scale time series datasets to building reliable models, this book offers practical techniques that scale effortlessly for big data environments. You’ll explore advanced topics such as scaling your analyses, deploying time series models into production, Generative AI, and leveraging Spark's latest features for cutting-edge applications across industries. Packed with hands-on examples and industry-relevant use cases, this guide is perfect for data engineers, ML engineers, data scientists, and analysts looking to enhance their expertise in handling large-scale time series data. By the end of this book, you’ll have mastered the skills to design and deploy robust, scalable time series models tailored to your unique project needs—qualifying you to excel in the rapidly evolving world of big data analytics. *Email sign-up and proof of purchase required

Who is this book for?

If you are a data engineer, ML engineer, data scientist, or analyst looking to enhance your skills in time series analysis with Apache Spark and Databricks, this book is for you. Whether you’re new to time series or an experienced practitioner, this guide provides valuable insights and techniques to improve your data processing capabilities. A basic understanding of Apache Spark is helpful, but no prior experience with time series analysis is required.

What you will learn

  • Understand the core concepts and architectures of Apache Spark
  • Clean and organize time series data
  • Choose the most suitable modeling approach for your use case
  • Gain expertise in building and training a variety of time series models
  • Explore ways to leverage Apache Spark and Databricks to scale your models
  • Deploy time series models in production
  • Integrate your time series solutions with big data tools for enhanced analytics
  • Leverage GenAI to enhance predictions and uncover patterns

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 28, 2025
Length: 302 pages
Edition : 1st
Language : English
ISBN-13 : 9781803247175
Category :
Languages :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Mar 28, 2025
Length: 302 pages
Edition : 1st
Language : English
ISBN-13 : 9781803247175
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Table of Contents

17 Chapters
Part 1: Introduction to Time Series and Apache Spark Chevron down icon Chevron up icon
Chapter 1: What Are Time Series? Chevron down icon Chevron up icon
Chapter 2: Why Time Series Analysis? Chevron down icon Chevron up icon
Chapter 3: Introduction to Apache Spark Chevron down icon Chevron up icon
Part 2: From Data to Models Chevron down icon Chevron up icon
Chapter 4: End-to-End View of a Time Series Analysis Project Chevron down icon Chevron up icon
Chapter 5: Data Preparation Chevron down icon Chevron up icon
Chapter 6: Exploratory Data Analysis Chevron down icon Chevron up icon
Chapter 7: Building and Testing Models Chevron down icon Chevron up icon
Part 3: Scaling to Production and Beyond Chevron down icon Chevron up icon
Chapter 8: Going at Scale Chevron down icon Chevron up icon
Chapter 9: Going to Production Chevron down icon Chevron up icon
Chapter 10: Going Further with Apache Spark Chevron down icon Chevron up icon
Chapter 11: Recent Developments in Time Series Analysis Chevron down icon Chevron up icon
Chapter 12: Unlock Your Exclusive Benefits Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
(1 Ratings)
5 star 0%
4 star 0%
3 star 100%
2 star 0%
1 star 0%
Daniel Nov 22, 2025
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
The percentage of reading isn't right and I cant see the entire chapter its istarting from the midle
Subscriber review Packt
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Modal Close icon
Modal Close icon