Large Language Models and Graph Machine Learning

DataPro is a weekly, expert-curated newsletter trusted by 120k+ global data professionals. Built by data practitioners, it blends first-hand industry experience with practical insights and peer-driven learning.
Make sure to subscribe here so you never miss a key update in the data world.

Introduction

Large Language Models (LLMs) have transformed the landscape of artificial intelligence, redefining how machines understand and generate human language. From early statistical methods to the breakthrough of Transformer architectures, LLMs such as GPT, BERT, and T5 have unlocked unprecedented capabilities in natural language processing (NLP). As organizations increasingly rely on both structured and unstructured data, a powerful new paradigm is emerging: the integration of LLMs with Graph Machine Learning (GraphML). This combination enables systems to leverage both deep contextual language understanding and rich relational data, paving the way for more accurate, scalable, and intelligent AI applications across domains like search, recommendation systems, and knowledge graphs.

LLMs: an overview

In the rapidly evolving field of artificial intelligence, LLMs have significantly advanced natural language processing (NLP) and understanding. These models, characterized by their extensive number of parameters and trained on large datasets, have demonstrated remarkable capabilities across a wide set of language-related tasks.

The journey of language models began with statistical approaches that relied on probabilistic methods to predict word sequences. These early models, while creating the foundations, were limited by their reliance on fixed-size context windows and the inability to capture long-range dependencies. However, as we also discussed in Chapter 4, Unsupervised Graph Learning, with the advent of neural networks, the field has undergone a significant shift, introducing models capable of learning word embeddings. In order to improve the ability to capture long-range dependencies, the initial neural network models were based on the Long-Short Term Memory

(LSTM) and Gate-Recurrent Unit (GRU) architecture, which are forms of Recurrent Neural Networks (RNNs). However, a pivotal moment occurred with the introduction of the Transformer architecture by Vaswani et al. in 2017. Unlike its predecessors, the Transformer model utilized self-attention mechanisms, enabling it to consider the entire context of a sentence without the sequential constraints inherent in RNNs. This innovation facilitated the development of models capable of processing and generating text in a more coherent and fluent way.

Building upon the Transformer architecture, researchers scaled models to unprecedented sizes, leading to the emergence of LLMs such as OpenAI’s GPT series, Google’s BERT and T5, and more recently, models such as GPT-3 and GPT-4.

In a nutshell, training LLMs involves optimizing a large number of parameters on very large datasets. This process, known as pretraining, typically employs unsupervised learning objectives, such as predicting missing words in a sentence (masked language modeling) or forecasting subsequent words (causal language modeling). As a side effect, the pre-training phase lets the model learn and “understand” a language, resulting in a remarkable ability to generalize across various tasks, often achieving state-of-the-art performance. LLMs have demonstrated proficiency in a diverse array of applications, reflecting their versatility and depth of language understanding. Key areas include text generation, language translation, question answering, and summarization, among many others.

Given the strengths of LLMs in unstructured text processing and generative tasks, an exciting frontier emerges when we consider their integration with graphs. While LLMs excel in understanding and generating natural language, graphs are particularly powerful for representing and analyzing structured relationships between entities. In the rest of the book, we

will see examples of how we can take advantage of both.

Why combine GraphML with LLMs?

As we have learned throughout this book, GraphML excels at representing and analyzing structured data such as knowledge graphs, social networks, chemical structures, and so on. It is extremely useful for situations where exploiting relationships between entities is crucial for achieving good performances. However, LLMs are particularly good at interpreting unstructured text, offering generative skills, reasoning, and profound contextual awareness. When it comes to language-based activities such as content creation, question answering, and summarization, they excel.

Despite their impressive capabilities, LLMs are not without limitations. One of the most significant challenges is the problem of hallucination, where an LLM generates factually incorrect or misleading information that appears plausible. This is particularly problematic in domains requiring high factual accuracy, such as healthcare, finance, and legal applications. To mitigate hallucinations and enhance the reliability of LLM outputs, Retrieval-Augmented Generation (RAG) has emerged as a powerful technique. RAG works by dynamically retrieving relevant information from an external knowledge source (such as a knowledge graph) at inference time, rather than just relying on pre-trained knowledge. This approach ensures that the model has access to up-to-date and accurate data, grounding answers in verified information rather than generating content purely from its internal representations.

Recent advancements highlight how integrating GraphML with LLMs can drive significant innovation, enabling the development of applications that require both rich semantic understanding and relational analysis. For instance:

Graph-Augmented Question Answering: LLMs can leverage knowledge graphs to answer domain-specific questions with factual accuracy.
Node Embedding Generation: State-of-the-art frameworks such as GraphGPT use LLMs to generate node embeddings directly from textual data, enabling seamless integration with graph structures.
Knowledge Graph Construction and Enhancement: Recent applications have shown how LLMs can be used to enrich knowledge graphs, where LLMs are used to extract semantic relationships and entities from text to enhance existing graph data.

Therefore, by bridging the gap between structured knowledge and natural language understanding, the synergy between GraphML and LLMs paves the way for more accurate, explainable, and intelligent systems.

In the next section, we will explore the state-of-the-art trends in combining GraphML and LLMs, as well as the current challenges.

State-of-the-art trends and challenges

Before diving into specific examples, it is crucial to understand the current landscape of

GraphML and LLM integration. According to a recent survey by Jin et al. (https://arxiv.org/ abs/2312.02783, 2024), the application scenario can be categorized into three main scenarios:

Pure Graphs: These are graphs that lack associated textual information. Examples include social networks, traffic networks, and protein interaction networks. In such cases, the focus is on leveraging LLMs to process and analyze the structural aspects of the graph data.
Text-Attributed Graphs: In these graphs, nodes or edges are enriched with textual attributes. For instance, in academic networks, papers (nodes) come with titles and abstracts, while authors (nodes) have profiles. E-commerce networks also fall into this category, where products (nodes) have descriptions, and user interactions (edges) may include reviews. The challenge here is to effectively combine the textual content with the graph’s structural information.
Text-Paired Graphs: This scenario involves graphs that are paired with separate textual descriptions or documents. Unlike text-attributed graphs, where text is embedded within the graph as attributes, text-paired graphs treat the graph and text as distinct but related entities. A pertinent example is molecular graphs accompanied by detailed textual descriptions of their properties. The objective is to align and integrate the information from both the graph structure and the associated text to enhance understanding and analysis.

To effectively utilize LLMs in these scenarios, three primary techniques can be used: LLMs as predictors, LLMs as encoders, and LLMs as aligners. Let’s see these approaches one by one.

LLMs as predictors

The simplest and most direct approach is to use LLMs as predictors. In this paradigm, the LLM operates as a tool to infer outcomes directly from graph data. Imagine a scenario where textual information is either minimal or entirely absent (pure graphs). In this case, you can transform the graph data into a format that the LLM can process, such as converting graph structures into sequences or textual descriptions.

For instance, consider a simple social network graph where nodes represent people and edges indicate friendships (Figure 12.1). These features can be converted into a textual narrative, such as Alice is linked with Bob. An LLM can then process this narrative to predict new relationships or infer additional attributes about the nodes, such as professional interests or potential connections.

large-language-models-and-graph-machine-learning-img-0

Figure 12.1: Examples of how graphs can be converted to text narratives

Once the data is prepared, the LLM can be fine-tuned or prompted to perform specific tasks. These might include predicting node classifications, such as identifying the role of individuals in a social network, or link predictions, such as forecasting interactions between entities. In molecular research, LLMs as predictors can help determine the properties of chemical compounds based solely on their structural representations.

One advantage of this approach is its simplicity: LLMs can be applied directly to graph data without requiring extensive preprocessing or specialized models. However, this simplicity can also be a limitation. Purely structural information might not always be sufficient for complex tasks, particularly when additional contextual or textual data is available but not leveraged. Moreover, scalability and cost must be considered: encoding entire graphs as text can lead to an explosion of sentences, making inference expensive, potentially inefficient, and sometimes impossible (for example, if the maximum number of words an LLM can process at once is too small to contain the whole graph). Performance may also be limited, as this approach is similar to providing an LLM with a structured dataset and expecting accurate predictions without tailored adaptations. For this reason, more complex graph2text formalisms can be designed, incorporating node/edge descriptions into textual narratives while balancing efficiency and accuracy.

LLMs as encoders

When graphs are enriched with textual attributes, the LLM as encoder approach becomes particularly powerful. Here, the LLM is tasked with processing and encoding the textual information associated with nodes or edges, producing meaningful representations that can be integrated with the graph’s structural features. These embeddings are then integrated into the graph through proper algorithms such as graph neural networks, which process the combined representation to perform downstream tasks.

This hybrid representation combines the strengths of both modalities, capturing the nuances of text alongside the relationships encoded in the graph. As depicted in Figure 12.2, each node could have attributes, such as a name and a brief bio for a node representing a person, while the edges might be annotated with information about the nature of the relation, e.g., close friend or colleague for a graph representing social networks. These features can be converted into a textual narrative, such as Alice, a software engineer, is close friends with Bob, a data scientist.

large-language-models-and-graph-machine-learning-img-1

Figure 12.2: Examples of how LMMs can be used as encoders for node attributes

Other examples include academic citation networks, where papers (nodes) come with titles, abstracts, and keywords. An LLM can process these textual attributes to generate embeddings that encapsulate their semantic content. These embeddings are then combined with graph-specific features, such as the citation relationships between papers, to create a unified representation. Similarly, in e-commerce platforms, product descriptions and user reviews can be encoded by LLMs to enhance product similarity graphs or user behavior analysis.

It is worth noticing that the process of using LLMs as encoders typically involves fine-tuning the LLM on domain-specific textual data to ensure that the embeddings accurately reflect the requirements of the task.

This encoder approach offers significant benefits. By leveraging textual data, it captures context and nuances that purely structural methods might miss. It is particularly effective in scenarios where textual attributes can provide critical insights, such as identifying the themes of academic papers or understanding user preferences in recommendation systems.

LLMs as aligners

The goal here is to align and integrate the information from both structure and textual descriptions (or accompanying documents, in the case of text-paired graphs), enabling a comprehensive analysis that leverages the strengths of each. This can be achieved, for example, by finding a shared latent space or a semantic mapping that connects the two modalities. Such an approach might involve designing models that jointly optimize both modalities or using attention mechanisms to focus on the most relevant parts of each input.

In more detail, the synergy between textual encoding (handled by the LLM) and graph structure encoding (handled by, for example, a GNN), can be typically in two ways:

1. Prediction Alignment: Iterative training where LLMs and GNNs generate pseudo-labels to guide each other’s learning

2. Latent Space Alignment: Contrastive learning to align the latent representation of the text and the graph structure in a shared space (e.g., Figure 12.3)

large-language-models-and-graph-machine-learning-img-2

Figure 12.3: Graphs and associated texts can be embedded in a shared latent space

For example, in molecular research, a molecular graph might represent the structure of a compound, while a textual description provides information about its properties, synthesis, or applications. In this context, an LLM can be used to process the text to extract relevant features and align these with the structural characteristics of the graph, enabling tasks such as property prediction or drug discovery.

As you can imagine, this approach is particularly powerful in interdisciplinary fields where graphs and text provide complementary points of view. In computational social science, for instance, social graphs representing interactions between individuals can be aligned with news articles, social media posts, or other textual data to study the spread of information or public sentiment. Similarly, in e-commerce, user behavior graphs can be integrated with textual reviews to improve personalized recommendations.

Now that we have a clearer understanding of the LLM and graph landscape, let’s dive into a practical example of how this integration works. We will explore this in the next section.

Conclusion

In summary, the convergence of LLMs and GraphML represents a major step forward in building next-generation AI systems that combine natural language intelligence with graph-based relational reasoning. By addressing challenges such as hallucination through techniques like Retrieval-Augmented Generation (RAG), and by applying frameworks where LLMs act as predictors, encoders, and aligners, this hybrid approach opens new possibilities for question answering, recommendation systems, knowledge graph enhancement, and explainable AI.

This article is an excerpt from the book Graph Machine Learning – Second Edition. Readers who want to explore these concepts in greater depth, along with practical examples and broader GraphML applications, can continue reading in the book.

Author Bio

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis titled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher and collaborates with several international institutions.

Enrico Deusebio is currently working as an engineering manager at Canonical, the publisher of Ubuntu, to promote open source technologies in the data and AI space and to make them more accessible to everyone. He has been working with data and distributed computing for over 15 years, both in an academic and industrial context, helping organizations implement data-driven strategies and build AI-powered solutions. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He holds a B.Sc. and an M.Sc. degree in aerospace engineering from Politecnico di Torino.

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he developed a solid background in AI, graph theory, and machine learning with a focus on the biomedical field.