You're reading from Redis Stack for Application Modernization

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781837638185

Edition1st Edition

Concepts

Database Administration

Authors (2):

Luigi Fugaro

Mirko Ortensi

View More author details

Understanding Probabilistic Data Structures

The probabilistic data structures of Redis Stack are packed into a set of capabilities also known as Bloom filters. Such structures owe their name to Burton Howard Bloom, a computer scientist who introduced the concept of a probabilistic data structure in 1970 to resolve the problem of verifying whether an item belongs to a set. By using hash data representations, it is possible to achieve a sufficient approximation to the problem under analysis, allowing false positives (the item may belong to the set), but without false negatives (the item definitely does not belong to the set).

The Bloom filter has since become a widely used data structure in computer science. It is used in various applications, such as spell-checking, network routing, content filtering, and DNA sequence analysis.

Probabilistic data structures process large volumes of data in real time with minimal memory requirements. This chapter covers several types of probabilistic...

Technical requirements

To follow along with the examples in this chapter, you will need to install Redis Stack Server 7.2 or later in your development environment. Alternatively, you can create a free Redis Cloud subscription to get a free plan and use a managed Redis Stack database.

HyperLogLog

HyperLogLog was the first probabilistic data structure available in Redis, theorized and published in 2003 by Marianne Durand and Philippe Flajolet. HyperLogLog is an efficient solution to count the number of unique occurrences in a set, which, in practice, is the cardinality of the set. This is especially useful when accuracy is not required: the count is probabilistic but presents a very low error. HyperLogLog does not store any information as the items that are added pass through a hashing function, so it is not possible to remove elements once they are counted.

HyperLogLog is the simplest probabilistic data structure and using it is as easy as adding elements to it using the PFADD command:

PFADD mortensi.com:1hnsn9n6kb:012023 IGaJ9c5KqYHFUEogCQWc
PFADD mortensi.com:1hnsn9n6kb:012023 XUq9br38-SYFOVpfN-vq
PFADD mortensi.com:1sneb4qq3t:012023 bmGteHMoGpAKbtk4X-9Y
PFADD mortensi.com:1sneb4qq3t:022023 IGaJ9c5KqYHFUEogCQWc

Using the former commands, we can track the...

Bloom filter

A Bloom filter is one of the probabilistic data structures supported by Redis Stack and is used to test whether an item is a member of a set. It is crucial as a data deduplication solution – that is, for removing duplicated data from a set. It is a memory-efficient and fast data structure that uses a bit array and a set of hash functions to determine whether an item is in the set or not. Testing for membership to the filter can return “possibly in the set” or “definitely not in the set,” which means that false positives are possible, but false negatives are not. Imprecisions (or approximations) are around the corner in every aspect of life, and digital computing does not make any difference. Think of the lossy compression algorithms for images (JPEG) or audio files (MP3): we can still enjoy media files and not even realize there is a loss of quality. A Bloom filter simplifies the management and speed of solutions that require the existence...

Cuckoo filters

Cuckoo filters are an evolution of Bloom filters that were published in 2014 and address the limitations of Bloom filters, especially around collision handling. This filter inherits its name from the cuckoo bird, famous for laying its eggs in the nests of other bird species and leaving them to be raised by the host bird. In doing so, the cuckoo pushes the other eggs or chicks out of the nest. This behavior describes the implementation of Cuckoo filters. Differently from Bloom filters, Cuckoo filters use a fingerprint-based technique that allows for the fast and efficient handling of collisions and reduces the rate of false positives while maintaining the same space requirements as Bloom filters.

Instead of storing the hashed version of an item as Bloom filters do, Cuckoo filters use multiple locations, or buckets, to store the fingerprint representation of the item. When a new item is added to the filter and a collision occurs at a candidate bucket, the existing item...

Count-Min sketch

The Count-Min sketch probabilistic data structure, like HyperLogLog, counts the items that have been added, with the difference that the Count-Min sketch counts the number of times specific items have been added – that is, their frequency.

When using a Count-Min sketch data structure, any frequency counts below a predetermined threshold (established by the error rate) should be disregarded. The Count-Min sketch serves as a valuable tool for counting element frequencies in a data stream, especially when dealing with higher counts. Nevertheless, very low counts are often perceived as noise and are typically discarded in this context. To start using the data structure, we have the option to initialize it either based on the probabilities to be maintained or on the desired dimensions. It is important to note that the dimensions of the Count-Min sketch play a significant role because to merge two Count-Min sketches, they must have identical dimensions.

We can...

Top-K

The Top-K data structure is used to keep track of items with the highest rank, such as the top players in a leaderboard. The ranking, or score, is often based on the count of how many times an item appears in the data source (such as a stream), making the data structure ideal for identifying elements with the highest frequency. Among the most common use cases of this data structure are leaderboards, trending entities in a system, detecting network anomalies, and DDoS attacks. Here, the Top-K data structure can help answer questions such as “Which top addresses or IPs have the highest surge in the flow of requests?”

Let’s dive into an example of using the Top-K data structure and insert a few items into it. First, we must initialize it using the following command:

TOPK.RESERVE key topk [width depth decay]

In addition to key, which specifies the Top-K name, topk indicates the number of top items we want to keep track of, and width indicates the number...

t-digest

t-digest is a data structure for estimating quantiles from a data stream or a large dataset using a compact sketch.

The t-digest data structure enables the resolution of various inquiries, such as “What proportion of values in the data stream is less than a specific value?” and “How many values in the data stream are below a given threshold?” To better understand t-digest, we need to define quantiles and percentiles.

A quantile is a value or cut point that divides a dataset into intervals with equal proportions or frequencies of observations. As an example, the median is an example of a quantile as it divides the dataset in half (that is, 50% of observations below and 50% above).

A percentile represents a specific position within a dataset, where a certain percentage of the data falls below that position. For example, if a value is at the 75th percentile of a dataset, it means that 75% of the data falls below that value. Percentiles are...

Summary

In this chapter, we introduced probabilistic data structures, also known as “sketches.” We explained their strengths in different areas, such as fraud detection, online gaming, network, device analysis, and social media trends analysis. We also learned how to use the right data structures for the right use cases and understood that using such solutions is often an acceptable compromise between accuracy, performance, and the usage of resources, especially when dealing with datasets of exceptional size or that contain big data. In addition, we discussed how accuracy can be fine-tuned and the guarantees of outcomes when a certain item is tested against a specific sketch. To learn more about probabilistic data structures and understand how to achieve acceptable accuracy, refer to the Sizing section in the documentation at https://redis.io/docs/data-types/probabilistic/.

In Chapter 9, The Programmability of Redis Stack, we will dive into the strengths of Redis Stack...

The rest of the chapter is locked

You have been reading a chapter from

Redis Stack for Application Modernization

Published in: Dec 2023Publisher: PacktISBN-13: 9781837638185

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Luigi Fugaro

Luigi Fugaro's first encounter with computers was in the early 80s when he was a kid. He started with a Commodore Vic-20, passing through a Sinclair, a Commodore 64, and an Atari ST 1040, where he spent days and nights giving breath mints to Otis. In 1998, he started his career as a webmaster doing HTML, JavaScript, Applets, and some graphics with Paint Shop Pro. He then switched to Delphi, Visual Basic, and then started working on Java projects. He has been developing all kinds of web applications, dealing with backend and frontend frameworks. In 2012, he started working for Red Hat and is now an architect in the EMEA Middleware team. He has authored WildFly Cookbook and Mastering JBoss Enterprise Application Platform 7 by Packt Publishing.
Read more about Luigi Fugaro

Mirko Ortensi

Mirko Ortensi earned a degree in Electronic Engineering and a Master's degree in Software Engineering. Mirko's career has spanned several roles from Software Engineering to Customer Support, particularly centered around distributed database systems. As a Senior Technical Enablement Architect at Redis, Mirko shares technical knowledge about Redis's products and services.
Read more about Mirko Ortensi

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages