Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7009 Articles
article-image-chatgpt-for-healthcare
Amita Kapoor
05 Sep 2023
9 min read
Save for later

ChatGPT for Healthcare

Amita Kapoor
05 Sep 2023
9 min read
IntroductionMeet ChatGPT: OpenAI's marvelously verbose chatbot, trained on a veritable Everest of text and code. Think of it as your go-to digital polymath, fluent in language translation, a whiz at whipping up creative content, and ever-eager to dispense knowledge on everything from quantum physics to quinoa recipes. Ready to dial in the healthcare lens? This article is your rollercoaster ride through the trials, triumphs, and tangled ethical conundrums of ChatGPT in medicine. From game-changing potential to challenges as stubborn as symptoms, we've got it all. So whether you're a seasoned healthcare pro or a tech-savvy newbie, buckle up. Will ChatGPT be healthcare's new MVP or get benched? Stick around, and let's find out together. Doctor in Your Pocket? Unpacking the Potential of ChatGPT in Healthcare Modern healthcare always seeks innovation to make things smoother and more personal. Enter ChatGPT. While not a stand-in for a doctor, this text-based AI is causing ripples from customer service to content. Below are various scenarios where ChatGPT can be leveraged in its original form or fine-tuned APIs. Pre-Consultation Screeners - ChatGPT-Enabled Triage Before conversational AI, healthcare looked into computational diagnostic aids like the 1960s' Dendral, initially for mass spectrometry, inspiring later medical systems. The 1970s brought MYCIN, designed for diagnosing bacterial infections and suggesting antibiotics. However, these early systems used inflexible "if-then" rules and lacked adaptability for nuanced human interaction. Fast-forward to today's more sophisticated digital triage platforms, and we still find remnants of these rule-based systems. While significantly more advanced, many of these platforms operate within the bounds of scripted pathways, leading to linear and often inflexible patient interactions. This rigidity can result in an inadequate capture of patient nuances, a critical aspect often needed for effective medical triage. The ChatGPT Advantage: Flexibility and Natural Engagement ChatGPT is a conversational agent with the capacity for more flexible, natural interactions due to its advanced Natural Language Understanding (NLU). Unlike conventional scanners with limited predefined pathways, ChatGPT can adapt to a broader range of patient inputs, making the pre-consultation phase more dynamic and patient-centric. It offers: Adaptive Questioning: Unlike traditional systems that follow a strict query pathway, ChatGPT can adapt its questions based on prior patient responses, potentially unearthing critical details. Contextual Understanding: Its advanced NLU allows it to understand colloquial language, idioms, and contextual cues that more rigid systems may miss. Data Synthesis: ChatGPT's ability to process and summarise information can result in a more cohesive pre-consultation report for healthcare providers, aiding in a more effective diagnosis and treatment strategy. Using LLMs bots like ChatGPT offers a more dynamic, flexible, and engaging approach to pre-consultation screening, optimising patient experience and healthcare provider efficacy. Below is a sample code that you can use to play around: import openai import os # Initialize OpenAI API Client api_key = os.environ.get("OPENAI_API_KEY")  # Retrieve the API key from environment variables openai.api_key = api_key  # Set the API key # Prepare the list of messages messages = [ {"role": "system", "content": "You are a pre-consultation healthcare screener. Assist the user in gathering basic symptoms before their doctor visit."}, {"role": "user", "content": "I've been feeling exhausted lately and have frequent headaches."} ] # API parameters model = "gpt-3.5-turbo"  # Choose the appropriate engine max_tokens = 150  # Limit the response length # Make API call response = openai.ChatCompletion.create( model=model, messages=messages ) # Extract and print chatbot's reply chatbot_reply = response['choices'][0]['message']['content'] print("ChatGPT: ", chatbot_reply) And here is the ChatGPT response: Mental Health Companionship The escalating demand for mental health services has increased focus on employing technology as supplemental support. While it is imperative to clarify that ChatGPT is not a substitute for qualified mental health practitioners, the platform can serve as an initial point of contact for individuals experiencing non-critical emotional distress or minor stress and anxiety. Utilizing advanced NLU and fine-tuned algorithms, ChatGPT provides an opportunity for immediate emotional support, particularly during non-operational hours when traditional services may be inaccessible. ChatGPT can be fine-tuned to handle the sensitivities inherent in mental health discussions, thereby adhering to ethically responsible boundaries while providing immediate, albeit preliminary, support. ChatGPT offers real-time text support, serving as a bridge to professional help. Its advanced NLU understands emotional nuances, ensuring personalized interactions. Beyond this, ChatGPT recommends vetted mental health resources and coping techniques. For instance, if you're anxious outside clinical hours, it suggests immediate stress management tactics. And if you're hesitant about professional consultation, ChatGPT helps guide and reassure your decision. Let us now see, how by just changing the prompt we can use the same code as that of ChatGPT enabled triage to build a mental health companion: messages = [ {        "role": "system",        "content": "You are a virtual mental health companion. Your primary role is to provide a supportive environment for the user. Listen actively, offer general coping strategies, and identify emotional patterns or concerns. Remember, you cannot replace professional mental health care, but can act as an interim resource. Always prioritise the user's safety and recommend seeking professional help if the need arises. Be aware of various emotional and mental scenarios, from stress and anxiety to deeper emotional concerns. Remain non-judgmental, empathetic, and consistently supportive."    }, {    "role": "user",    "content": "I've had a long and stressful day at work. Sometimes, it just feels like everything is piling up and I can't catch a break. I need some strategies to unwind and relax." } ] And here is the golden advice from ChatGPT:  Providing immediate emotional support and resource guidance can be a preliminary touchpoint for those dealing with minor stress and anxiety, particularly when conventional support mechanisms are unavailable. Virtual Health Assistants  In the evolving healthcare landscape, automation and artificial intelligence (AI) are increasingly being leveraged to enhance efficiency and patient care. One such application is the utilization of Virtual Health Assistants, designed to manage administrative overhead and provide informational support empathetically. The integration of ChatGPT via OpenAI's API into telehealth platforms signifies a significant advancement in this domain, offering capabilities far surpassing traditional rule-based or keyword-driven virtual assistants. ChatGPT boasts a customizable framework ideal for healthcare, characterized by its contextual adaptability for personalized user experiences, vast informational accuracy, and multi-functional capability that interfaces with digital health tools while upholding medical guidelines. In contrast, traditional Virtual Health Assistants, reliant on rule-based systems, suffer from scalability issues, rigid interactions, and a narrow functional scope. ChatGPT stands out by simplifying medical jargon, automating administrative chores, and ensuring a seamless healthcare journey—bridging pre-consultation to post-treatment, all by synthesizing data from diverse health platforms. Now, let's explore how tweaking the prompt allows us to repurpose the previous code to create a virtual health assistant. messages = [ {    "role": "system",    "content": "You are a Virtual Health Assistant (VHA). Your primary function is to assist users in navigating the healthcare landscape. Offer guidance on general health queries, facilitate appointment scheduling, and provide informational insights on medical terminologies. While you're equipped with a broad knowledge base, it's crucial to remind users that your responses are not a substitute for professional medical advice or diagnosis. Prioritise user safety, and when in doubt, recommend that they seek direct consultation from healthcare professionals. Be empathetic, patient-centric, and uphold the highest standards of medical data privacy and security in every interaction." }, {    "role": "user",    "content": "The doctor has recommended an Intestinal Perforation Surgery for me, scheduled for Sunday. I'm quite anxious about it. How can I best prepare mentally and physically?" } ] Straight from ChatGPT's treasure trove of advice:  So there you have it. Virtual Health Assistants might not have a medical degree, but they offer the next best thing: a responsive, informative, and competent digital sidekick to guide you through the healthcare labyrinth, leaving doctors free to focus on what really matters—your health. Key Contributions Patient Engagement: Utilising advanced Natural Language Understanding (NLU) capabilities, ChatGPT can facilitate more nuanced and personalised interactions, thus enriching the overall patient experience. Administrative Efficiency: ChatGPT can significantly mitigate the administrative load on healthcare staff by automating routine tasks such as appointment scheduling and informational queries. Preventative Measures: While not a diagnostic tool, ChatGPT's capacity to provide general health information and recommend further professional consultation can contribute indirectly to early preventative care. Potential Concerns and Solutions Data Security and Privacy: ChatGPT, in its current form, does not fully meet healthcare data security requirements. Solution: For HIPAA compliance, advanced encryption, and secure API interfaces must be implemented. Clinical Misinformation: While ChatGPT can provide general advice, there are limitations to the clinical validity of its responses. Solution: It is critical that all medical advice provided by ChatGPT is cross-referenced with up-to-date clinical guidelines and reviewed by medical professionals for accuracy. Ethical Considerations: The impersonal nature of a machine providing health-related advice could potentially result in a lack of emotional sensitivity. Solution: Ethical guidelines must be established for the algorithm, possibly integrating a 'red flag' mechanism that alerts human operators when sensitive or complex issues arise that require a more nuanced touch. Conclusion ChatGPT, while powerful, isn't a replacement for the expertise of healthcare professionals. Instead, it serves as an enhancing tool within the healthcare sector. Beyond aiding professionals, ChatGPT can increase patient engagement, reduce administrative burdens, and help in preliminary health assessments. Its broader applications include transcribing medical discussions, translating medical information across languages, and simplifying complex medical terms for better patient comprehension. For medical training, it can mimic patient scenarios, aiding in skill development. Furthermore, ChatGPT can assist in research by navigating medical literature, and conserving crucial time. However, its capabilities should always be seen as complementary, never substituting the invaluable care from healthcare professionals. Author BioAmita Kapoor is an accomplished AI consultant and educator with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita retired early and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. After her retirement, Amita founded NePeur, a company providing data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford. 
Read more
  • 0
  • 0
  • 18171

article-image-ai-distilled-15-openai-unveils-chatgpt-enterprise-code-llama-by-meta-vulcansql-from-hugging-face-microsofts-algorithm-of-thoughts-google-deepminds-synthid
Merlyn Shelley
31 Aug 2023
14 min read
Save for later

AI_Distilled #15: OpenAI Unveils ChatGPT Enterprise, Code Llama by Meta, VulcanSQL from Hugging Face, Microsoft's "Algorithm of Thoughts”, Google DeepMind's SynthID

Merlyn Shelley
31 Aug 2023
14 min read
👋 Hello ,“[AI] will touch every sector, every industry, every business function, and significantly change the way we live and work..this isn’t just the future. We are already starting to experience the benefits right now. As a company, we’ve been preparing for this moment for some time.” -Sundar Pichai, CEO, Google Speaking at the ongoing Google Cloud Next conference, Pichai emphasized how AI is the future, and it’s here already.   Step into the future with AI_Distilled#15, showcasing the breakthroughs in AI/ML, LLMs, NLP, GPT, and Generative AI, as we talk about Nvidia reporting over 100% increase in sales amid high demand for AI chips, Meta introducing Code Llama: a breakthrough in AI-powered coding assistance, OpenAI introducing ChatGPT Enterprise for businesses, Microsoft’s promising new "Algorithm of Thoughts" to enhance AI reasoning, and Salesforce's State of the Connected Customer Report which shows how businesses are facing AI trust gap with customers. Looking for fresh knowledge resources and tutorials? We’ve got your back! Look out for our curated collection of posts on how to use Code Llama, mitigating hallucination in LLMs, Google’s: Region-Aware Pre-Training for Open-Vocabulary Object Detection with Vision Transformers, and making data queries with Hugging Face's VulcanSQL.  We’ve also handpicked some great GitHub repos for you to use on your next AI project! What do you think of this issue and our newsletter? Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt   ⚡ TechWave: AI/GPT News & Analysis OpenAI Introduces ChatGPT Enterprise: AI Solution for Businesses: OpenAI has unveiled ChatGPT Enterprise with advanced features. The enterprise-grade version offers enhanced security, privacy, and access to the more powerful GPT-4 model. It includes unlimited usage of GPT-4, higher-speed performance, longer context windows for processing lengthier inputs, advanced data analysis capabilities, customization options, and more, targeting improved productivity, customized workflows, and secure data management. Meta Introduces Code Llama: A Breakthrough in AI-Powered Coding Assistance: Code Llama is a cutting-edge LLM designed to generate code based on text prompts and is tailored for code tasks and offers the potential to enhance developer productivity and facilitate coding education. Built on Llama 2, Code Llama comes in different models, including the foundational code model, Python-specialized version, and an instruct variant fine-tuned for understanding natural language instructions. The models outperformed existing LLMs on code tasks and hold promise for revolutionizing coding workflows while adhering to safety and responsible use guidelines. Nvidia Reports Over 100% Increase in Sales Amid High Demand for AI Chips: Nvidia has achieved record-breaking sales, more than doubling its revenue to over $13.5 billion for the quarter ending in June. The company anticipates further growth in the current quarter and plans to initiate a stock buyback of $25 billion. Its stock value soared by more than 6.5% in after-hours trading, bolstering its substantial gains this year. Nvidia's data center business, which includes AI chips, fueled its strong performance, with revenue surpassing $10.3 billion, driven by cloud computing providers and consumer internet firms adopting its advanced processors. With a surge in its market value, Nvidia joined the ranks of trillion-dollar companies alongside Apple, Microsoft, Alphabet, and Amazon. Businesses Facing AI Trust Gap with Customers, Reveals Salesforce's State of the Connected Customer Report: Salesforce's sixth edition of the State of the Connected Customer report highlights a growing concern among businesses about an AI trust gap with their customers. The survey, conducted across 25 countries with over 14,000 consumers and business buyers, indicates that as companies increasingly adopt AI to enhance efficiency and meet customer expectations, nearly three-quarters of their customers are worried about unethical AI use. Consumer receptivity to AI has also decreased over the past year, urging businesses to address this gap by implementing ethical guidelines and providing transparency into AI applications. Microsoft Introduces "Algorithm of Thoughts" to Enhance AI Reasoning: Microsoft has unveiled a novel AI training method called the "Algorithm of Thoughts" (AoT), aimed at enhancing the reasoning abilities of large language models like ChatGPT by combining human-like cognition with algorithmic logic. This new approach leverages "in-context learning" to guide language models through efficient problem-solving paths, resulting in faster and less resource-intensive solutions. The technique outperforms previous methods and can even surpass the algorithm it is based on.  Google's Duet AI Expands Across Google Cloud with Enhanced Features: Google's Duet AI, a suite of generative AI capabilities for tasks like text summarization and data organization, is expanding its reach to various products and services within the Google Cloud ecosystem. The expansion includes assisting with code refactoring, offering guidance on infrastructure configuration and deployment in the Google Cloud Console, writing code in Google's dev environment Cloud Workstations, generating flows in Application Integration, and more. ̌It also integrates generative AI advancements into the security product line. OpenAI Collaborates with Scale to Enhance Enterprise Model Fine-Tuning Support: OpenAI has entered into a partnership with Scale to provide expanded support for enterprises seeking to fine-tune advanced models. Recognizing the demand for high performance and customization in AI deployment, OpenAI introduced fine-tuning for GPT-3.5 Turbo and plans to extend it to GPT-4. This feature empowers companies to customize advanced models with proprietary data, enhancing their utility. OpenAI assures that customer data remains confidential and is not utilized to train other models. Google DeepMind Introduces SynthID: A Tool to Identify AI-Generated Images: In response to the growing prevalence of AI-generated images that can be indistinguishable from real ones, Google Cloud has partnered with Imagen to unveil SynthID. This newly launched beta version aims to watermark and identify AI-created images. The technology seamlessly embeds a digital watermark into the pixels of an image, allowing for imperceptible yet detectable identification. This tool is a step towards responsible use of generative AI and enhances the capacity to identify manipulated or fabricated images.   ✨ Unleashing the Power of Causal Reasoning with LLMs:Join Aleksander Molak on October 11th and be a part of Packt's most awaited event of 2023 on Generative AI! In AI's evolution, a big change is coming. It's all about Causally Aware Prompt Engineering, and you should pay attention because it's important. LLMs are good at recognizing patterns, but what if they could do more? That's where causal reasoning comes in. It's about understanding not just what's connected but why. Let's distill the essence: - LLMs can outperform causal discovery algorithms on some tasks  - GPT-4 achieves a near-human performance on some counterfactual benchmarks  - This might be the case because the models simply memorize the data, but it's also possible that they build a **meta-SCM** (meta structural causal models) based on the correlations of causal facts learned from the data  - LLMs can reason causally if we allow them to intervene on the test time  - LLMs do not reason very well, when we provide them with verbal description of conditional independence structures in the data (but nor do (most of) humans). Now, catalyze your journey with three simple techniques: Causal Effect Estimation: Causal effect estimate aims at capturing the strength of (expected) change in the outcome variable when we modify the value of the treatment by one unit. In practice, almost any machine learning algorithm can be used for this purpose, yet in most cases we need to use these algorithms in a way that differs from the classical machine learning flow. Confronting Confounding: The main challenge (yet not the only one) in estimating causal effects from observational data comes from confounding. Confounder is a variable in the system of interest that produces a spurious relationship between the treatment and the outcome. Spurious relationships are a kind of illusion. Interestingly, you can observe spurious relationships not only in the recorded data, but also in the real world. Unveiling De-confounding: To obtain an unbiased estimate of the causal effect, we need to get rid of confounding. At the same time, we need to be careful not to introduce confounding ourselves! This usually boils down to controlling for the right subset of variables in your analysis. Not too small, not too large. If you're intrigued by this, I invite you to join me for an in-depth exploration of this fascinating topic at Packt's upcoming Generative AI conference on October 11th. During my power-talk, we'll delve into the question: Can LLMs learn Causally?  REGISTER NOW at Early Bird discounted pricing! *Free eBook on Registration: Modern Generative AI with ChatGPT and OpenAI Models   🔮 Expert Insights from Packt Community The Regularization Cookbook - By Vincent Vandenbussche Regularization serves as a valuable approach to enhance the success rate of ML models in production. Effective regularization techniques can prevent AI recruitment models from exhibiting gender biases, either by eliminating certain features or incorporating synthetic data. Additionally, proper regularization enables chatbots to maintain an appropriate level of sensitivity toward new tweets. It also equips models to handle edge cases and previously unseen data proficiently, even when trained on synthetic data. Key concepts of regularization Let us now delve into a more precise definition and explore key concepts that enable us to better comprehend regularization. Bias and variance Bias and variance are two key concepts when talking about regularization. We can define two main kinds of errors a model can have: Bias is how bad a model is at capturing the general behavior of the data Variance is how bad a model is at being robust to small input data fluctuations Let’s describe those four cases: High bias and low variance: The model is hitting away from the center of the target, but in a very consistent manner Low bias and high variance: The model is, on average, hitting the center of the target, but is quite noisy and inconsistent in doing so High bias and high variance: The model is hitting away from the center in a noisy way Low bias and low variance: The best of both worlds – the model is hitting the center of the target consistently  The above content is extracted from the book The Regularization Cookbook By Vincent Vandenbussche and published in July 2023. To get a glimpse of the book's contents, make sure to read the free chapter provided here, or if you want to unlock the full Packt digital library free for 7 days, try signing up now! To learn more, click on the button below. Keep Calm, Start Reading!  🌟 Secret Knowledge: AI/LLM Resources Google’s RO-ViT: Region-Aware Pre-Training for Open-Vocabulary Object Detection with Vision Transformers: Google's research scientists have unveiled a new method called "RO-ViT" that enhances open-vocabulary object detection using vision transformers. Learn how the technique addresses limitations in existing pre-training approaches for vision transformers, which struggle to fully leverage the concept of objects or regions during pre-training. RO-ViT introduces a novel approach called "cropped positional embedding" that aligns better with region-level tasks.Tiered AIOps: Enhancing Cloud Platform Management with AI: Explore the concept of Tiered AIOps to manage complex cloud platforms. The ever-changing nature of cloud applications and infrastructure presents challenges for complete automation, requiring a tiered approach to combine AI and human intervention. The concept involves dividing operations into tiers, each with varying levels of automation and human expertise. Tier 1 incorporates routine operations automated by AI, Tier 2 empowers non-expert operators with AI assistance, and Tier 3 engages expert engineers for complex incidents. Effective AI-Agent Interaction: SERVICE Principles Unveiled: In this post, you'll learn how to design AI agents that can interact seamlessly and effectively with users, aiming to transition from self-service to "agent-service." The author introduces the concept of autonomous AI agents capable of performing tasks on users' behalf and offers insights into their potential applications. The SERVICE principles, rooted in customer service and hospitality practices, are presented as guidelines for designing agent-user interactions. These principles encompass key aspects like salient responses, explanatory context, reviewable inputs, vaulted information, indicative guidance, customization, and empathy.  How to Mitigate Hallucination in Large Language Models: In this article, researchers delve into the persistent challenge of hallucination in Generative LLMs. The piece explores the reasons behind LLMs generating nonsensical or non-factual responses, and the potential consequences for system reliability. The focus is on practical approaches to mitigate hallucination, including adjusting the temperature parameter, employing thoughtful prompt engineering, and incorporating external knowledge sources. The authors conduct experiments to evaluate different methods, such as Chain of Thoughts, Self-Consistency, and Tagged Context Prompts.    💡 MasterClass: AI/LLM Tutorials How to Use Code Llama: A Breakdown of Features and Usage: Code Llama has made a significant stride in code-related tasks, offering an open-access suite of models specialized for code-related challenges. This release includes various notable components, such as integration within the Hugging Face ecosystem, transformative integration, text generation inference, and inference endpoints. Learn how these models showcase remarkable performance across programming languages, enabling enhanced code understanding, completion, and infilling.  Make Data Queries with Hugging Face's VulcanSQL: In this post, you'll learn how to utilize VulcanSQL, an open-source data API framework, to streamline data queries. VulcanSQL integrates Hugging Face's powerful inference capabilities, allowing data professionals to swiftly generate and share data APIs without extensive backend knowledge. By incorporating Hugging Face's Inference API, VulcanSQL enhances the efficiency of query processes. The framework's HuggingFace Table Question Answering Filter offers a unique solution by leveraging pre-trained AI models for NLP tasks.  Exploring Metaflow and Ray Integration for Supercharged ML Workflows: Explore the integration of Metaflow, an extensible ML orchestration framework, with Ray, a distributed computing framework. This collaboration leverages AWS Batch and Ray for distributed computing, enhancing Metaflow’s capabilities. Know how this integration empowers Metaflow users to harness Ray’s features within their workflows. The article also delves into the challenges faced, the technical aspects of the integration, and real-world test cases, offering valuable insights into building efficient ML workflows using these frameworks. Explore Reinforcement Learning Through Solving Leetcode Problems: Explore how reinforcement learning principles can be practically grasped by solving a Leetcode problem. The article centers around the "Shortest Path in a Grid with Obstacles Elimination" problem, where an agent aims to find the shortest path from a starting point to a target in a grid with obstacles, considering the option to eliminate a limited number of obstacles. Explore the foundations of reinforcement learning, breaking down terms like agent, environment, state, and reward system. The author provides code examples and outlines how a Q-function is updated through iterations.    🚀 HackHub: Trending AI Tools apple/ml-fastvit: Introduces a rapid hybrid ViT empowered by structural reparameterization for efficient vision tasks. openchatai/opencopilot: A personal AI copilot repository that seamlessly integrates with APIs and autonomously executes API calls using LLMs, streamlining developer tasks and enhancing efficiency. neuml/txtai: An embeddings database for advanced semantic search, LLM orchestration, and language model workflows featuring vector search, multimodal indexing, and flexible pipelines for text, audio, images, and more. Databingo/aih: Interact with AI models via terminal (Bard, ChatGPT, Claude2, and Llama2) to explore diverse AI capabilities directly from your command line. osvai/kernelwarehouse: Optimizes dynamic convolution by redefining kernel concepts, improving parameter dependencies, and increasing convolutional efficiency. morph-labs/rift: Open-source AI-native infrastructure for IDEs, enabling collaborative AI software engineering. mr-gpt/deepeval: Python-based solution for offline evaluations of LLM pipelines, simplifying the transition to production. 
Read more
  • 0
  • 0
  • 19155

article-image-llm-pitfalls-and-how-to-avoid-them
Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
Save for later

LLM Pitfalls and How to Avoid Them

Amita Kapoor & Sharmistha Chatterjee
31 Aug 2023
13 min read
IntroductionLanguage Learning Models, or LLMs, are machine learning algorithms that focus on understanding and generating human-like text. These advanced developments have significantly impacted the field of natural language processing, impressing us with their capacity to produce cohesive and contextually appropriate text. However, navigating the terrain of LLMs requires vigilance, as there exist pitfalls that may trap the unprepared.In this article, we will uncover the nuances of LLMs and discover practical strategies for evading their potential pitfalls. From misconceptions surrounding their capabilities to the subtleties of bias pervading their outputs, we shed light on the intricate underpinnings beyond their impressive veneer.Understanding LLMs: A PrimerLLMs, such as GPT-4, are based on a technology called Transformer architecture, introduced in the paper "Attention is All You Need" by Vaswani et al. In essence, this architecture's 'attention' mechanism allows the model to focus on different parts of an input sentence, much like how a human reader might pay attention to different words while reading a text.Training an LLM involves two stages: pre-training and fine-tuning. During pre-training, the model is exposed to vast quantities of text data (billions of words) from the internet. Given all the previous words, the model learns to predict the next word in a sentence. Through this process, it learns grammar, facts about the world, reasoning abilities, and also some biases present in the data.  A significant part of this understanding comes from the model's ability to process English language instructions. The pre-training process exposes the model to language structures, grammar, usage, nuances of the language, common phrases, idioms, and context-based meanings.  The Transformer's 'attention' mechanism plays a crucial role in this understanding, enabling the model to focus on different parts of the input sentence when generating each word in the output. It understands which words in the sentence are essential when deciding the next word.The output of pre-training is a creative text generator. To make this generator more controllable and safe, it undergoes a fine-tuning process. Here, the model is trained on a narrower dataset, carefully generated with human reviewers' help following specific guidelines. This phase also often involves learning from instructions provided in natural language, enabling the model to respond effectively to English language instructions from users.After their initial two-step training, Large Language Models (LLMs) are ready to produce text. Here's how it works:The user provides a starting point or "prompt" to the model. Using this prompt, the model begins creating a series of "tokens", which could be words or parts of words. Each new token is influenced by the tokens that came before it, so the model keeps adjusting its internal workings after producing each token. The process is based on probabilities, not on a pre-set plan or specific goals.To control how the LLM generates text, you can adjust various settings. You can select the prompt, of course. But you can also modify settings like "temperature" and "max tokens". The "temperature" setting controls how random the model's output will be, while the "max tokens" setting sets a limit on the length of the response.When properly trained and controlled, LLMs are powerful tools that can understand and generate human-like text. Their applications range from writing assistants to customer support, tutoring, translation, and more. However, their ability to generate convincing text also poses potential risks, necessitating ongoing research into effective and ethical usage guidelines. In this article, we discuss some of the common pitfalls associated with using LLMs and offer practical advice on how to navigate these challenges, ensuring that you get the best out of these powerful language models in a safe and responsible way.Misunderstanding LLM CapabilitiesLanguage Learning Models (LLMs), like GPT-3, and BARD, are advanced AI systems capable of impressive feats. However, some common misunderstandings exist about what these models can and cannot do. Here we clarify several points to prevent confusion and misuse.Conscious Understanding: Despite their ability to generate coherent and contextually accurate responses, LLMs do not consciously understand the information they process. They don't comprehend text in the same way humans do. Instead, they make statistically informed guesses based on the patterns they've learned during training. They lack self-awareness or consciousness.Learning from Interactions: LLMs are not designed to learn from user interactions in real time. After initial model training, they don't have the ability to remember or learn from individual interactions unless their training data is updated, a process that requires substantial computational resources.Fact-Checking: LLMs can't verify the accuracy of their output or the information they're prompted with. They generate text based on patterns learned during training and cannot access real-time or updated information beyond their training cut-off. They cannot fact-check or verify information against real-world events post their training cut-off date.Personal Opinions: LLMs don't have personal experiences, beliefs, or opinions. If they generate text that seems to indicate a personal stance, it's merely a reflection of the patterns they've learned during their training process. They are incapable of feelings or preferences.Generating Original Ideas: While LLMs can generate text that may seem novel or original, they are not truly capable of creativity in the human sense. Their "ideas" result from recombining elements from their training data in novel ways, not from original thought or intention.Confidentiality: LLMs cannot keep secrets or remember specific user interactions. They do not have the capacity to store personal data from one interaction to the next. They are designed this way to ensure user privacy and confidentiality.Future Predictions: LLMs can't predict the future. Any text generated that seems to predict future events is coincidental and based solely on patterns learned from their training data.Emotional Support: While LLMs can simulate empathetic responses, they don't truly understand or feel emotions. Any emotional support provided by these models is based on learned textual patterns and should not replace professional mental health support.Understanding these limitations is crucial when interacting with LLMs. They are powerful tools for text generation, but their abilities should not be mistaken for true understanding, creativity, or emotional capacity.Bias in LLM OutputsBias in LLMs is an unintentional byproduct of their training process. LLMs, such as GPT-4, are trained on massive datasets comprising text from the internet. The models learn to predict the next word in a sentence based on the context provided by the preceding words. During this process, they inevitably absorb and replicate the biases present in their training data.Bias in LLMs can be subtle and may present itself in various ways. For example, if an LLM consistently associates certain professions with a specific gender, this reflects gender bias. Suppose you feed the model a prompt like, "The nurse attended to the patient", and the model frequently uses feminine pronouns to refer to the nurse. In contrast, with the prompt, "The engineer fixed the machine," it predominantly uses masculine pronouns for the engineer. This inclination mirrors societal biases present in the training data.It's crucial for users to be aware of these potential biases when using LLMs. Understanding this can help users interpret responses more critically, identify potential biases in the output, and even frame their prompts in a way that can mitigate bias. Users can make sure to double-check the information provided by LLMs, particularly when the output may have significant implications or is in a context known for systemic bias.Confabulation and Hallucination in LLMsIn the context of LLMs, 'confabulation' or 'hallucination' refers to generating outputs that do not align with reality or factual information. This can happen when the model, attempting to create a coherent narrative, fills in gaps with details that seem plausible but are entirely fictional.Example 1: Futuristic Election ResultsConsider an interaction where an LLM was asked for the result of a future election. The prompt was, "What was the result of the 2024 U.S. presidential election?" The model responded with a detailed result, stating a fictitious candidate had won. As of the model's last training cut-off, this event lies in the future, and the response is a complete fabrication.Example 2: The Non-existent BookIn another instance, an LLM was asked about a summary of a non-existent book with a prompt like, "Can you summarise the book 'The Shadows of Elusion' by J.K. Rowling?" The model responded with a detailed summary as if the book existed. In reality, there's no such book by J.K. Rowling. This again demonstrates the model's propensity to confabulate.Example 3: Fictitious TechnologyIn a third example, an LLM was asked to explain the workings of a fictitious technology, "How does the quantum teleportation smartphone work?" The model explained a device that doesn't exist, incorporating real-world concepts of quantum teleportation into a plausible-sounding but entirely fictional narrative.LLMs generate responses based on patterns they learn from their training data. They cannot access real-time or personal information or understand the content they generate. When faced with prompts without factual data, they can resort to confabulation, drawing from learned patterns to fabricate plausible but non-factual responses.Because of this propensity for confabulation, verifying the 'facts' generated by LLM models is crucial. This is particularly important when the output is used for decision-making or is in a sensitive context. Always corroborate the information generated by LLMs with reliable and up-to-date sources to ensure its validity and relevance. While these models can be incredibly helpful, they should be used as a tool and not a sole source of information, bearing in mind the potential for error and fabrication in their outputs.Security and Privacy in LLMsLarge Language Models (LLMs) can be a double-edged sword. Their power to create lifelike text opens the door to misuse, such as generating misleading information, spam emails, or fake news, and even facilitating complex scamming schemes. So, it's crucial to establish robust security protocols when using LLMs.Training LLMs on massive datasets can trigger privacy issues. Two primary concerns are:Data leakage: If the model is exposed to sensitive information during training, it could potentially reveal this information when generating outputs. Though these models are designed to generalize patterns and not memorize specific data points, the risk still exists, albeit at a very low probability.Inference attacks: Skilled attackers could craft specific queries to probe the model, attempting to infer sensitive details about the training data. For instance, they might attempt to discern whether certain types of content were part of the training data, potentially revealing proprietary or confidential information.Ethical Considerations in LLMsThe rapid advancements in artificial intelligence, particularly in Language Learning Models (LLMs), have transformed multiple facets of society. Yet, this exponential growth often overlooks a crucial aspect – ethics. Balancing the benefits of LLMs while addressing ethical concerns is a significant challenge that demands immediate attention.Accountability and Responsibility: Who is responsible when an LLM causes harm, such as generating misleading information or offensive content? Is it the developers who trained the model, the users who provided the prompts, or the organizations that deployed it? The ambiguous nature of responsibility and accountability in AI applications is a substantial ethical challenge.Bias and Discrimination: LLMs learn from vast amounts of data, often from the internet, reflecting our society – warts and all. Consequently, the models can internalize and perpetuate existing biases, leading to potentially discriminatory outputs. This can manifest as gender bias, racial bias, or other forms of prejudice.Invasion of Privacy: As discussed in earlier articles, LLMs can pose privacy risks. However, the ethical implications go beyond the immediate privacy concerns. For instance, if an LLM is used to generate text mimicking a particular individual's writing style, it could infringe on that person's right to personal expression and identity.Misinformation and Manipulation: The capacity of LLMs to generate human-like text can be exploited to disseminate misinformation, forge documents, or even create deepfake texts. This can manipulate public opinion, impact personal reputations, and even threaten national security.Addressing LLM Limitations: A Tripartite ApproachThe task of managing the limitations of LLMs is a tripartite effort, involving AI Developers & Researchers, Policymakers, and End Users.Role of AI Developers & Researchers:Security & Privacy: Establish robust security protocols, enforce secure training practices, and explore methods such as differential privacy. Constituting AI ethics committees can ensure ethical considerations during the design and training phases.Bias & Discrimination: Endeavor to identify and mitigate biases during training, aiming for equitable outcomes. This process includes eliminating harmful biases and confabulations.Transparency: Enhance understanding of the model by elucidating the training process, which in turn can help manage potential fabrications.Role of Policymakers:Regulations: Formulate and implement regulations that ensure accountability, transparency, fairness, and privacy in AI.Public Engagement: Encourage public participation in AI ethics discussions to ensure that regulations reflect societal norms.Role of End Users:Awareness: Comprehend the risks and ethical implications associated with LLMs, recognising that biases and fabrications are possible.Critical Evaluation: Evaluate the outputs generated by LLMs for potential misinformation, bias, or confabulations. Refrain from feeding sensitive information to an LLM and cross-verify the information produced.Feedback: Report any instances of severe bias, offensive content, or ethical concerns to the AI provider. This feedback is crucial for the continuous improvement of the model. ConclusionIn conclusion, understanding and leveraging the capabilities of Language Learning Models (LLMs) demand both caution and strategy. By recognizing their limitations, such as lack of consciousness, potential biases, and confabulation tendencies, users can navigate these pitfalls effectively. To harness LLMs responsibly, a collaborative approach among developers, policymakers, and users is essential. Implementing security measures, mitigating bias, and fostering user awareness can maximize the benefits of LLMs while minimizing their drawbacks. As LLMs continue to shape our linguistic landscape, staying informed and vigilant ensures a safer and more accurate text generation journey.Author BioAmita Kapoor is an accomplished AI consultant and educator, with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar in her field, with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita took early retirement and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. Following her retirement, Amita also founded NePeur, a company that provides data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.Sharmistha Chatterjee is an evangelist in the field of machine learning (ML) and cloud applications, currently working in the BFSI industry at the Commonwealth Bank of Australia in the data and analytics space. She has worked in Fortune 500 companies, as well as in early-stage start-ups. She became an advocate for responsible AI during her tenure at Publicis Sapient, where she led the digital transformation of clients across industry verticals. She is an international speaker at various tech conferences and a 2X Google Developer Expert in ML and Google Cloud. She has won multiple awards and has been listed in 40 under 40 data scientists by Analytics India Magazine (AIM) and 21 tech trailblazers in 2021 by Google. She has been involved in responsible AI initiatives led by Nasscom and as part of their DeepTech Club.Authors of this book: Platform and Model Design for Responsible AI    
Read more
  • 0
  • 0
  • 25431

article-image-harnessing-weaviate-and-integrating-with-langchain
Alan Bernardo Palacio
31 Aug 2023
20 min read
Save for later

Harnessing Weaviate and integrating with LangChain

Alan Bernardo Palacio
31 Aug 2023
20 min read
IntroductionIn the first part of this series, we built a robust RSS news retrieval system using Weaviate, enabling us to fetch and store news articles efficiently. Now, in this second part, we're taking the next leap by exploring how to harness the power of Weaviate for similarity search and integrating it with LangChain. We will delve into the creation of a Streamlit application that performs real-time similarity search, contextual understanding, and dynamic context building. With the increasing demand for relevant and contextual information, this section will unveil the magic of seamlessly integrating various technologies to create an enhanced user experience.Before we dive into the exciting world of similarity search and context building, let's ensure you're equipped with the necessary tools. Familiarity with Weaviate, Streamlit, and Python will be essential as we explore these advanced concepts and create a dynamic application.Similarity Search and Weaviate IntegrationThe journey of enhancing news context retrieval doesn't end with fetching articles. Often, users seek not just relevant information, but also contextually similar content. This is where similarity search comes into play.Similarity search enables us to find articles that share semantic similarities with a given query. In the context of news retrieval, it's like finding articles that discuss similar events or topics. This functionality empowers users to discover a broader range of perspectives and relevant articles.Weaviate's core strength lies in its ability to perform fast and accurate similarity search. We utilize the perform_similarity_search function to query Weaviate for articles related to a given concept. This function returns a list of articles, each scored based on its relevance to the query.import weaviate from langchain.llms import OpenAI import datetime import pytz from dateutil.parser import parse davinci = OpenAI(model_name='text-davinci-003') def perform_similarity_search(concept):    """    Perform a similarity search on the given concept.    Args:    - concept (str): The term to search for, e.g., "Bitcoin" or "Ethereum"      Returns:    - dict: A dictionary containing the result of the similarity search    """    client = weaviate.Client("<http://weaviate:8080>")      nearText = {"concepts": [concept]}    response = (        client.query        .get("RSS_Entry", ["title", "link", "summary", "publishedDate", "body"])        .with_near_text(nearText)        .with_limit(50)  # fetching a maximum of 50 similar entries        .with_additional(['certainty'])        .do()    )      return response def sort_and_filter(results):    # Sort results by certainty    sorted_results = sorted(results, key=lambda x: x['_additional']['certainty'], reverse=True)    # Sort the top results by date    top_sorted_results = sorted(sorted_results[:50], key=lambda x: parse(x['publishedDate']), reverse=True)    # Return the top 10 results    return top_sorted_results[:5] # Define the prompt template template = """ You are a financial analysts reporting on latest developments and providing an overview about certain topics you are asked about. Using only the provided context, answer the following question. Prioritize relevance and clarity in your response. If relevant information regarding the query is not found in the context, clearly indicate this in the response asking the user to rephrase to make the search topics more clear. If information is found, summarize the key developments and cite the sources inline using numbers (e.g., [1]). All sources should consistently be cited with their "Source Name", "link to the article", and "Date and Time". List the full sources at the end in the same numerical order. Today is: {today_date} Context: {context} Question: {query} Answer: Example Answer (for no relevant information): "No relevant information regarding 'topic X' was found in the provided context." Example Answer (for relevant information): "The latest update on 'topic X' reveals that A and B have occurred. This was reported by 'Source Name' on 'Date and Time' [1]. Another significant development is D, as highlighted by 'Another Source Name' on 'Date and Time' [2]." Sources (if relevant): [1] Source Name, "link to the article provided in the context", Date and Time [2] Another Source Name, "link to the article provided in the context", Date and Time """ # Modified the generate_response function to now use the SQL agent def query_db(query):    # Query the weaviate database    results = perform_similarity_search(query)    results = results['data']['Get']['RSS_Entry']    top_results = sort_and_filter(results)    # Convert your context data into a readable string    context_string = [f"title:{r['title']}\\nsummary:{r['summary']}\\nbody:{r['body']}\\nlink:{r['link']}\\npublishedDate:{r['publishedDate']}\\n\\n" for r in top_results]    context_string = '\\n'.join(context_string)    # Get today's date    date_format = "%a, %d %b %Y %H:%M:%S %Z"    today_date = datetime.datetime.now(pytz.utc).strftime(date_format)    # Format the prompt    prompt = template.format(        query=query,        context=context_string,        today_date=today_date    )    # Print the formatted prompt for verification    print(prompt)    # Run the prompt through the model directly    response = davinci(prompt)    # Extract and print the response    return responseRetrieved results need effective organization for user consumption. The sort_and_filter function handles this task. It first sorts the results based on their certainty scores, ensuring the most relevant articles are prioritized. Then, it further sorts the top results by their published dates, providing users with the latest information to build the context for the LLM.LangChain Integration for Context BuildingWhile similarity search enhances content discovery, context is the key to understanding the significance of articles. Integrating LangChain with Weaviate allows us to dynamically build context and provide more informative responses.LangChain, a language manipulation tool, acts as our context builder. It enhances the user experience by constructing context around the retrieved articles, enabling users to understand the broader narrative. Our modified query_db function now incorporates Langchain's capabilities. The function generates a context-rich prompt that combines the user's query and the top retrieved articles. This prompt is structured using a template that ensures clarity and relevance.The prompt template is a structured piece of text that guides LangChain to generate contextually meaningful responses. It dynamically includes information about the query, context, and relevant articles. This ensures that users receive comprehensive and informative answers.Subsection 2.4: Handling Irrelevant Queries One of LangChain's unique strengths is its ability to gracefully handle queries with limited context. When no relevant information is found in the context, LangChain generates a response that informs the user about the absence of relevant data. This ensures transparency and guides users to refine their queries for better results.In the next section, we will be integrating this enhanced news retrieval system with a Streamlit application, providing users with an intuitive interface to access relevant and contextual information effortlessly.Building the Streamlit ApplicationIn the previous section, we explored the intricate layers of building a robust news context retrieval system using Weaviate and LangChain. Now, in this third part, we're diving into the realm of user experience enhancement by creating a Streamlit application. Streamlit empowers us to transform our backend functionalities into a user-friendly front-end interface with minimal effort. Let's discover how we can harness the power of Streamlit to provide users with a seamless and intuitive way to access relevant news articles and context.Streamlit is a Python library that enables developers to create interactive web applications with minimal code. Its simplicity, coupled with its ability to provide real-time visualizations, makes it a fantastic choice for creating data-driven applications.The structure of a Streamlit app is straightforward yet powerful. Streamlit apps are composed of simple Python scripts that leverage the provided Streamlit API functions. This section will provide an overview of how the Streamlit app is structured and how its components interact.import feedparser import pandas as pd import time from bs4 import BeautifulSoup import requests import random from datetime import datetime, timedelta import pytz import uuid import weaviate import json import time def wait_for_weaviate():    """Wait until Weaviate is available."""      while True:        try:            # Try fetching the Weaviate metadata without initiating the client here            response = requests.get("<http://weaviate:8080/v1/meta>")            response.raise_for_status()            meta = response.json()                      # If successful, the instance is up and running            if meta:                print("Weaviate is up and running!")                return        except (requests.exceptions.RequestException):            # If there's any error (connection, timeout, etc.), wait and try again            print("Waiting for Weaviate...")            time.sleep(5) RSS_URLS = [    "<https://thedefiant.io/api/feed>",    "<https://cointelegraph.com/rss>",    "<https://cryptopotato.com/feed/>",    "<https://cryptoslate.com/feed/>",    "<https://cryptonews.com/news/feed/>",    "<https://smartliquidity.info/feed/>",    "<https://bitcoinmagazine.com/feed>",    "<https://decrypt.co/feed>",    "<https://bitcoinist.com/feed/>",    "<https://cryptobriefing.com/feed>",    "<https://www.newsbtc.com/feed/>",    "<https://coinjournal.net/feed/>",    "<https://ambcrypto.com/feed/>",    "<https://www.the-blockchain.com/feed/>" ] def get_article_body(link):    try:        headers = {            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3'}        response = requests.get(link, headers=headers, timeout=10)        response.raise_for_status()        soup = BeautifulSoup(response.content, 'html.parser')        paragraphs = soup.find_all('p')        # Directly return list of non-empty paragraphs        return [p.get_text().strip() for p in paragraphs if p.get_text().strip() != ""]    except Exception as e:        print(f"Error fetching article body for {link}. Reason: {e}")        return [] def parse_date(date_str):    # Current date format from the RSS    date_format = "%a, %d %b %Y %H:%M:%S %z"    try:        dt = datetime.strptime(date_str, date_format)        # Ensure the datetime is in UTC        return dt.astimezone(pytz.utc)    except ValueError:        # Attempt to handle other possible formats        date_format = "%a, %d %b %Y %H:%M:%S %Z"        dt = datetime.strptime(date_str, date_format)        return dt.replace(tzinfo=pytz.utc) def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return df def insert_data(df,batch_size=100):    # Initialize the batch process    with client.batch as batch:        batch.batch_size = 100        # Loop through and batch import the 'RSS_Entry' data        for i, row in df.iterrows():            if i%100==0:                print(f"Importing entry: {i+1}")  # Status update            properties = {                "UUID": row["UUID"],                "Title": row["Title"],                "Link": row["Link"],                "Summary": row["Summary"],                "PublishedDate": row["PublishedDate"],                "Body": row["Body"]            }            client.batch.add_data_object(properties, "RSS_Entry") if __name__ == "__main__":    # Wait until weaviate is available    wait_for_weaviate()    # Initialize the Weaviate client    client = weaviate.Client("<http://weaviate:8080>")    client.timeout_config = (3, 200)    # Reset the schema    client.schema.delete_all()    # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()    # Display the schema    print(json.dumps(schema, indent=4))    print("-"*50)    # Current datetime    now = datetime.now(pytz.utc)    # Fetching articles from the last days    days_ago = 3    print(f"Getting historical data for the last {days_ago} days ago.")    last_week = now - timedelta(days=days_ago)    df_hist =  fetch_rss(last_week)    print("Head")    print(df_hist.head().to_string())    print("Tail")    print(df_hist.head().to_string())    print("-"*50)    print("Total records fetched:",len(df_hist))    print("-"*50)    print("Inserting data")    # insert historical data    insert_data(df_hist,batch_size=100)    print("-"*50)    print("Data Inserted")    # check if there is any relevant news in the last minute    while True:        # Current datetime        now = datetime.now(pytz.utc)        # Fetching articles from the last hour        one_min_ago = now - timedelta(minutes=1)        df =  fetch_rss(one_min_ago)        print("Head")        print(df.head().to_string())        print("Tail")        print(df.head().to_string())              print("Inserting data")        # insert minute data        insert_data(df,batch_size=100)        print("data inserted")        print("-"*50)        # Sleep for a minute        time.sleep(60)Streamlit apps rely on specific Python libraries and functions to operate smoothly. We'll explore the libraries used in our Streamlit app, such as streamlit, weaviate, and langchain, and discuss their roles in enabling real-time context retrieval.Demonstrating Real-time Context RetrievalAs we bring together the various elements of our news retrieval system, it's time to experience the magic firsthand by using the Streamlit app to perform real-time context retrieval.The Streamlit app's interface, showcasing how users can input queries and initiate similarity searches ensures a user-friendly experience, allowing users to effortlessly interact with the underlying Weaviate and LangChain-powered functionalities. The Streamlit app acts as a bridge, making complex interactions accessible to users through a clean and intuitive interface.The true power of our application shines when we demonstrate its ability to provide context for user queries and how LangChain dynamically builds context around retrieved articles and responses, creating a comprehensive narrative that enhances user understanding.ConclusionIn this second part of our series, we've embarked on the journey of creating an interactive and intuitive user interface using Streamlit. By weaving together the capabilities of Weaviate, LangChain, and Streamlit, we've established a powerful framework for context-based news retrieval. The Streamlit app showcases how the integration of these technologies can simplify complex processes, allowing users to effortlessly retrieve news articles and their contextual significance. As we wrap up our series, the next step is to dive into the provided code and experience the synergy of these technologies firsthand. Empower your applications with the ability to deliver context-rich and relevant information, bringing a new level of user experience to modern data-driven platforms.Through these two articles, we've embarked on a journey to build an intelligent news retrieval system that leverages cutting-edge technologies. We've explored the foundations of Weaviate, delved into similarity search, harnessed LangChain for context building, and created a Streamlit application to provide users with a seamless experience. In the modern landscape of information retrieval, context is key, and the integration of these technologies empowers us to provide users with not just data, but understanding. As you venture forward, remember that these concepts are stepping stones. Embrace the code, experiment, and extend these ideas to create applications that offer tailored and relevant experiences to your users.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 9844

article-image-build-a-powerful-rss-news-fetcher-with-weaviate
Alan Bernardo Palacio
31 Aug 2023
21 min read
Save for later

Build a powerful RSS news fetcher with Weaviate

Alan Bernardo Palacio
31 Aug 2023
21 min read
IntroductionIn today's Crypto rapidly evolving world, staying informed about the latest news and developments is crucial. However, with the overwhelming amount of information available, it's becoming increasingly challenging to find relevant news quickly. In this article, we will delve into the creation of a powerful system that fetches real-time news articles from various RSS feeds and stores them in the Weaviate vector database. We will explore how this application lays the foundation for context-based news retrieval and how it can be a stepping stone for more advanced applications, such as similarity search and contextual understanding.Before we dive into the technical details, let's ensure that you have a basic understanding of the technologies we'll be using. Familiarity with Python and Docker will be beneficial as we build and deploy our applications.Setting up the EnvironmentTo get started, we need to set up the development environment. This environment consists of three primary components: the RSS news fetcher, the Weaviate vector database, and the Transformers Inference API for text vectorization.Our application's architecture is orchestrated using Docker Compose. The provided docker-compose.yml file defines three services: rss-fetcher, weaviate, and t2v-transformers. These services interact to fetch news, store it in the vector database, and prepare it for vectorization.version: '3.4' services: rss-fetcher:    image: rss/python    build:      context: ./rss_fetcher app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - rss-fetcher      - weaviate weaviate:    image: semitechnologies/weaviate:latest    restart: on-failure:0    ports:     - "8080:8080"    environment:      QUERY_DEFAULTS_LIMIT: 20      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'      PERSISTENCE_DATA_PATH: "./data"      DEFAULT_VECTORIZER_MODULE: text2vec-transformers      ENABLE_MODULES: text2vec-transformers      TRANSFORMERS_INFERENCE_API: <http://t2v-transformers:8080>      CLUSTER_HOSTNAME: 'node1' t2v-transformers:    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1    environment:      ENABLE_CUDA: 0 # set to 1 to enable      # NVIDIA_VISIBLE_DEVICES: all # enable if running with CUDAEach service is configured with specific environment variables that define its behavior. In our application, we make use of environment variables like OPENAI_API_KEY to ensure secure communication with external services. We also specify the necessary dependencies, such as the Python libraries listed in the requirements.txt files for the rss-fetcher and weaviate services.Creating the RSS News FetcherThe foundation of our news retrieval system is the RSS news fetcher. This component will actively fetch articles from various RSS feeds, extract essential information, and store them in the Weaviate vector database.This is the Dockerfile of our RSS fetcher:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "-u", "rss_fetcher.py"]Our RSS news fetcher is implemented within the rss_fetcher.py script. This script performs several key tasks, including fetching RSS feeds, parsing articles, and inserting data into the Weaviate database.import feedparser import pandas as pd import time from bs4 import BeautifulSoup import requests import random from datetime import datetime, timedelta import pytz import uuid import weaviate import json import time def wait_for_weaviate():    """Wait until Weaviate is available."""      while True:        try:            # Try fetching the Weaviate metadata without initiating the client here            response = requests.get("<http://weaviate:8080/v1/meta>")            response.raise_for_status()            meta = response.json()                      # If successful, the instance is up and running            if meta:                print("Weaviate is up and running!")                return        except (requests.exceptions.RequestException):            # If there's any error (connection, timeout, etc.), wait and try again            print("Waiting for Weaviate...")            time.sleep(5) RSS_URLS = [    "<https://thedefiant.io/api/feed>",    "<https://cointelegraph.com/rss>",    "<https://cryptopotato.com/feed/>",    "<https://cryptoslate.com/feed/>",    "<https://cryptonews.com/news/feed/>",    "<https://smartliquidity.info/feed/>",    "<https://bitcoinmagazine.com/feed>",    "<https://decrypt.co/feed>",    "<https://bitcoinist.com/feed/>",    "<https://cryptobriefing.com/feed>",    "<https://www.newsbtc.com/feed/>",    "<https://coinjournal.net/feed/>",    "<https://ambcrypto.com/feed/>",    "<https://www.the-blockchain.com/feed/>" ] def get_article_body(link):    try:        headers = {            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3'}        response = requests.get(link, headers=headers, timeout=10)        response.raise_for_status()        soup = BeautifulSoup(response.content, 'html.parser')        paragraphs = soup.find_all('p')        # Directly return list of non-empty paragraphs        return [p.get_text().strip() for p in paragraphs if p.get_text().strip() != ""]    except Exception as e:        print(f"Error fetching article body for {link}. Reason: {e}")        return [] def parse_date(date_str):    # Current date format from the RSS    date_format = "%a, %d %b %Y %H:%M:%S %z"    try:        dt = datetime.strptime(date_str, date_format)        # Ensure the datetime is in UTC        return dt.astimezone(pytz.utc)    except ValueError:        # Attempt to handle other possible formats        date_format = "%a, %d %b %Y %H:%M:%S %Z"        dt = datetime.strptime(date_str, date_format)        return dt.replace(tzinfo=pytz.utc) def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return df def insert_data(df,batch_size=100):    # Initialize the batch process    with client.batch as batch:        batch.batch_size = 100        # Loop through and batch import the 'RSS_Entry' data        for i, row in df.iterrows():            if i%100==0:                print(f"Importing entry: {i+1}")  # Status update            properties = {                "UUID": row["UUID"],                "Title": row["Title"],                "Link": row["Link"],                "Summary": row["Summary"],                "PublishedDate": row["PublishedDate"],                "Body": row["Body"]            }            client.batch.add_data_object(properties, "RSS_Entry") if __name__ == "__main__":    # Wait until weaviate is available    wait_for_weaviate()    # Initialize the Weaviate client    client = weaviate.Client("<http://weaviate:8080>")    client.timeout_config = (3, 200)    # Reset the schema    client.schema.delete_all()    # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()    # Display the schema    print(json.dumps(schema, indent=4))    print("-"*50)    # Current datetime    now = datetime.now(pytz.utc)    # Fetching articles from the last days    days_ago = 3    print(f"Getting historical data for the last {days_ago} days ago.")    last_week = now - timedelta(days=days_ago)    df_hist =  fetch_rss(last_week)    print("Head")    print(df_hist.head().to_string())    print("Tail")    print(df_hist.head().to_string())    print("-"*50)    print("Total records fetched:",len(df_hist))    print("-"*50)    print("Inserting data")    # insert historical data    insert_data(df_hist,batch_size=100)    print("-"*50)    print("Data Inserted")    # check if there is any relevant news in the last minute    while True:        # Current datetime        now = datetime.now(pytz.utc)        # Fetching articles from the last hour        one_min_ago = now - timedelta(minutes=1)        df =  fetch_rss(one_min_ago)        print("Head")        print(df.head().to_string())        print("Tail")        print(df.head().to_string())              print("Inserting data")        # insert minute data        insert_data(df,batch_size=100)        print("data inserted")        print("-"*50)        # Sleep for a minute        time.sleep(60)Before we start fetching news, we need to ensure that the Weaviate vector database is up and running. The wait_for_weaviate function repeatedly checks the availability of Weaviate using HTTP requests. This ensures that our fetcher waits until Weaviate is ready to receive data.The core functionality of our fetcher lies in its ability to retrieve articles from various RSS feeds. We iterate through the list of RSS URLs, using the feedparser library to parse the feeds and extract key information such as the article's title, link, summary, and published date.To provide context for similarity search and other applications, we need the actual content of the articles. The get_article_body function fetches the article's HTML content, parses it using BeautifulSoup, and extracts relevant text paragraphs. This content is crucial for creating a rich context for each article.After gathering the necessary information, we create data objects for each article and insert them into the Weaviate vector database. Weaviate provides a client library that simplifies the process of adding data. We use the weaviate.Client class to interact with the Weaviate instance and batch-insert articles' data objects.Now that we have laid the groundwork for building our context-based news retrieval system, in the next sections, we'll delve deeper into Weaviate's role in this application and how we can leverage it for similarity search and more advanced features.Weaviate Configuration and SchemaWeaviate, an open-source knowledge graph, plays a pivotal role in our application. It acts as a vector database that stores and retrieves data based on their semantic relationships and vector representations. Weaviate's ability to store text data and create vector representations for efficient similarity search aligns perfectly with our goal of context-based news retrieval. By utilizing Weaviate, we enable our system to understand the context of news articles and retrieve semantically similar content.To structure the data stored in Weaviate, we define a class called RSS_Entry. This class schema includes properties like UUID, Title, Link, Summary, PublishedDate, and Body. These properties capture essential information about each news article and provide a solid foundation for context retrieval. # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()The uniqueness of Weaviate lies in its ability to represent text data as vectors. Our application leverages the text2vec-transformers module as the default vectorizer. This module transforms text into vector embeddings using advanced language models. This vectorization process ensures that the semantic relationships between articles are captured, enabling meaningful similarity search and context retrieval.Real-time and Historical Data InsertionEfficient data insertion is vital for ensuring that our Weaviate-based news retrieval system provides up-to-date and historical context for users. Our application caters to two essential use cases: real-time context retrieval and historical context analysis. The ability to insert real-time news articles ensures that users receive the most recent information. Additionally, historical data insertion enables a broader perspective by allowing users to explore trends and patterns over time.To populate our database with historical data, we utilize the fetch_rss function. This function fetches news articles from the last few days, as specified by the days_ago parameter. The retrieved articles are then processed, and data objects are batch-inserted into Weaviate. This process guarantees that our database contains a diverse set of historical articles.def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return dfThe real-time data insertion loop ensures that newly published articles are promptly added to the Weaviate database. We fetch news articles from the last minute and follow the same data insertion process. This loop ensures that the database is continuously updated with fresh content.ConclusionIn this article, we've explored crucial aspects of building an RSS news retrieval system with Weaviate. We delved into Weaviate's role as a vector database, examined the RSS_Entry class schema, and understood how text data is vectorized using text2vec-transformers. Furthermore, we discussed the significance of real-time and historical data insertion in providing users with relevant and up-to-date news context. With a solid foundation in place, we're well-equipped to move forward and explore more advanced applications, such as similarity search and context-based content retrieval, which is what we will be building in the next article. The seamless integration of Weaviate with our news fetcher sets the stage for a powerful context-aware information retrieval system.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn   
Read more
  • 0
  • 0
  • 7046

article-image-building-an-api-for-language-model-inference-using-rust-and-hyper-part-2
Alan Bernardo Palacio
31 Aug 2023
10 min read
Save for later

Building an API for Language Model Inference using Rust and Hyper - Part 2

Alan Bernardo Palacio
31 Aug 2023
10 min read
IntroductionIn our previous exploration, we delved deep into the world of Large Language Models (LLMs) in Rust. Through the lens of the llm crate and the transformative potential of LLMs, we painted a picture of the current state of AI integrations within the Rust ecosystem. But knowledge, they say, is only as valuable as its application. Thus, we transition from understanding the 'how' of LLMs to applying this knowledge in real-world scenarios.Welcome to the second part of our Rust LLM. In this article, we roll up our sleeves to architect and deploy an inference server using Rust. Leveraging the blazingly fast and efficient Hyper HTTP library, our server will not just respond to incoming requests but will think, infer, and communicate like a human. We'll guide you through the step-by-step process of setting up, routing, and serving inferences right from the server, all the while keeping our base anchored to the foundational insights from our last discussion.For developers eager to witness the integration of Rust, Hyper, and LLMs, this guide promises to be a rewarding endeavor. By the end, you'll be equipped with the tools to set up a server that can converse intelligently, understand prompts, and provide insightful responses. So, as we progress from the intricacies of the llm crate to building a real-world application, join us in taking a monumental step toward making AI-powered interactions an everyday reality.Imports and Data StructuresLet's start by looking at the import statements and data structures used in the code:use hyper::service::{make_service_fn, service_fn}; use hyper::{Body, Request, Response, Server}; use std::net::SocketAddr; use serde::{Deserialize, Serialize}; use std::{convert::Infallible, io::Write, path::PathBuf};hyper: Hyper is a fast and efficient HTTP library for Rust.SocketAddr: This is used to specify the socket address (IP and port) for the server.serde: Serde is a powerful serialization/deserialization framework in Rust.Deserialize, Serialize: Serde traits for automatic serialization and deserialization.Next, we have the data structures that will be used for deserializing JSON request data and serializing response data:#[derive(Debug, Deserialize)] struct ChatRequest { prompt: String, } #[derive(Debug, Serialize)] struct ChatResponse { response: String, }1.    ChatRequest: A struct to represent the incoming JSON request containing a prompt field.2.    ChatResponse: A struct to represent the JSON response containing a response field.Inference FunctionThe infer function is responsible for performing language model inference:fn infer(prompt: String) -> String { let tokenizer_source = llm::TokenizerSource::Embedded; let model_architecture = llm::ModelArchitecture::Llama; let model_path = PathBuf::from("/path/to/model"); let prompt = prompt.to_string(); let now = std::time::Instant::now(); let model = llm::load_dynamic( Some(model_architecture), &model_path, tokenizer_source, Default::default(), llm::load_progress_callback_stdout, ) .unwrap_or_else(|err| { panic!("Failed to load {} model from {:?}: {}", model_architecture, model_path, err); }); println!( "Model fully loaded! Elapsed: {}ms", now.elapsed().as_millis() ); let mut session = model.start_session(Default::default()); let mut generated_tokens = String::new(); // Accumulate generated tokens here let res = session.infer::<Infallible>( model.as_ref(), &mut rand::thread_rng(), &llm::InferenceRequest { prompt: (&prompt).into(), parameters: &llm::InferenceParameters::default(), play_back_previous_tokens: false, maximum_token_count: Some(140), }, // OutputRequest &mut Default::default(), |r| match r { llm::InferenceResponse::PromptToken(t) | llm::InferenceResponse::InferredToken(t) => { print!("{t}"); std::io::stdout().flush().unwrap(); // Accumulate generated tokens generated_tokens.push_str(&t); Ok(llm::InferenceFeedback::Continue) } _ => Ok(llm::InferenceFeedback::Continue), }, ); // Return the accumulated generated tokens match res { Ok(_) => generated_tokens, Err(err) => format!("Error: {}", err), } }The infer function takes a prompt as input and returns a string containing generated tokens.It loads a language model, sets up an inference session, and accumulates generated tokens.The res variable holds the result of the inference, and a closure handles each inference response.The function returns the accumulated generated tokens or an error message.Request HandlerThe chat_handler function handles incoming HTTP requests:async fn chat_handler(req: Request<Body>) -> Result<Response<Body>, Infallible> { let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap(); let chat_request: ChatRequest = serde_json::from_slice(&body_bytes).unwrap(); // Call the `infer` function with the received prompt let inference_result = infer(chat_request.prompt); // Prepare the response message let response_message = format!("Inference result: {}", inference_result); let chat_response = ChatResponse { response: response_message, }; // Serialize the response and send it back let response = Response::new(Body::from(serde_json::to_string(&chat_response).unwrap())); Ok(response) }chat_handler asynchronously handles incoming requests by deserializing the JSON payload.It calls the infer function with the received prompt and constructs a response message.The response is serialized as JSON and sent back in the HTTP response.Router and Not Found HandlerThe router function maps incoming requests to the appropriate handlers:The router function maps incoming requests to the appropriate handlers: async fn router(req: Request<Body>) -> Result<Response<Body>, Infallible> { match (req.uri().path(), req.method()) { ("/api/chat", &hyper::Method::POST) => chat_handler(req).await, _ => not_found(), } }router matches incoming requests based on the path and HTTP method.If the path is "/api/chat" and the method is POST, it calls the chat_handler.If no match is found, it calls the not_found function.Main FunctionThe main function initializes the server and starts listening for incoming connections:#[tokio::main] async fn main() { println!("Server listening on port 8083..."); let addr = SocketAddr::from(([0, 0, 0, 0], 8083)); let make_svc = make_service_fn(|_conn| { async { Ok::<_, Infallible>(service_fn(router)) } }); let server = Server::bind(&addr).serve(make_svc); if let Err(e) = server.await { eprintln!("server error: {}", e); } }In this section, we'll walk through the steps to build and run the server that performs language model inference using Rust and the Hyper framework. We'll also demonstrate how to make a POST request to the server using Postman.1.     Install Rust: If you haven't already, you need to install Rust on your machine. You can download Rust from the official website: https://www.rust-lang.org/tools/install2.     Create a New Rust Project: Create a new directory for your project and navigate to it in the terminal. Run the following command to create a new Rust project: cargo new language_model_serverThis command will create a new directory named language_model_server containing the basic structure of a Rust project.3.     Add Dependencies: Open the Cargo.toml file in the language_model_server directory and add the required dependencies for Hyper and other libraries.    Your Cargo.toml file should look something like this: [package] name = "llm_handler" version = "0.1.0" edition = "2018" [dependencies] hyper = {version = "0.13"} tokio = { version = "0.2", features = ["macros", "rt-threaded"]} serde = {version = "1.0", features = ["derive"] } serde_json = "1.0" llm = { git = "<https://github.com/rustformers/llm.git>" } rand = "0.8.5"Make sure to adjust the version numbers according to the latest versions available.4.     Replace Code: Replace the content of the src/main.rs file in your project directory with the code you've been provided in the earlier sections.5.     Building the Server: In the terminal, navigate to your project directory and run the following command to build the server: cargo build --releaseThis will compile your code and produce an executable binary in the target/release directory.Running the Server1.     Running the Server: After building the server, you can run it using the following command: cargo run --releaseYour server will start listening on the port 8083.2.     Accessing the Server: Open a web browser and navigate to http://localhost:8083. You should see the message "Not Found" indicating that the server is up and running.Making a POST Request Using Postman1.     Install Postman: If you don't have Postman installed, you can download it from the official website: https://www.postman.com/downloads/2.     Create a POST Request:o   Open Postman and create a new request.o   Set the request type to "POST".o   Enter the URL: http://localhost:8083/api/chato   In the "Body" tab, select "raw" and set the content type to "JSON (application/json)".o   Enter the following JSON request body: { "prompt": "Rust is an amazing programming language because" }3.     Send the Request: Click the "Send" button to make the POST request to your server. 4.     View the Response: You should receive a response from the server, indicating the inference result generated by the language model.ConclusionIn the previous article, we introduced the foundational concepts, setting the stage for the hands-on application we delved into this time. In this article, our main goal was to bridge theory with practice. Using the llm crate alongside the Hyper library, we embarked on a mission to create a server capable of understanding and executing language model inference. But our work was more than just setting up a server; it was about illustrating the synergy between Rust, a language famed for its safety and concurrency features, and the vast world of AI.What's especially encouraging is how this project can serve as a springboard for many more innovations. With the foundation laid out, there are numerous avenues to explore, from refining the server's performance to integrating more advanced features or scaling it for larger audiences.If there's one key takeaway from our journey, it's the importance of continuous learning and experimentation. The tech landscape is ever-evolving, and the confluence of AI and programming offers a fertile ground for innovation.As we conclude this series, our hope is that the knowledge shared acts as both a source of inspiration and a practical guide. Whether you're a seasoned developer or a curious enthusiast, the tools and techniques we've discussed can pave the way for your own unique creations. So, as you move forward, keep experimenting, iterating, and pushing the boundaries of what's possible. Here's to many more coding adventures ahead!Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 10136
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-building-an-api-for-language-model-inference-using-rust-and-hyper-part-1
Alan Bernardo Palacio
31 Aug 2023
7 min read
Save for later

Building an API for Language Model Inference using Rust and Hyper - Part 1

Alan Bernardo Palacio
31 Aug 2023
7 min read
IntroductionIn the landscape of artificial intelligence, the capacity to bring sophisticated Large Language Models (LLMs) to commonplace applications has always been a sought-after goal. Enter LLM, a groundbreaking Rust library crafted by Rustformers, designed to make this dream a tangible reality. By focusing on the intricate synergy between the LLM library and the foundational GGML project, this toolset pushes the boundaries of what's possible, enabling AI enthusiasts to harness the sheer might of LLMs on conventional CPUs. This shift in dynamics owes much to GGML's pioneering approach to model quantization, streamlining computational requirements without sacrificing performance.In this comprehensive guide, we'll embark on a journey that starts with understanding the essence of the llm crate and its seamless interaction with a myriad of LLMs. Delving into its intricacies, we'll illuminate how to integrate, interact, and infer using these models. And as a tantalizing glimpse into the realm of practical application, our expedition won't conclude here. In the subsequent installment, we'll rise to the challenge of crafting a web server in Rust—one that confidently runs inference directly on a CPU, making the awe-inspiring capabilities of AI not just accessible, but an integral part of our everyday digital experiences.This is a two-part article in the first section we will discuss the basic interaction with the library and in the following we build a server in Rust that allow us to build our own web applications using state-of-the-art LLMs. Let’s begin with it.Harnessing the Power of Large Language ModelsAt the very core of LLM's architecture resides the GGML project, a tensor library meticulously crafted in the C programming language. GGML, short for "General GPU Machine Learning," serves as the bedrock of LLM, enabling the intricate orchestration of large language models. Its quintessence lies in a potent technique known as model quantization.Model quantization, a pivotal process employed by GGML, involves the reduction of numerical precision within a machine-learning model. This entails transforming the conventional 32-bit floating-point numbers frequently used for calculations into more compact representations such as 16-bit or even 8-bit integers.Quantization can be considered as the act of chiseling away unnecessary complexities while sculpting a model. Model quantization adeptly streamlines resource utilization without inordinate compromises on performance. By default, models lean on 32-bit floating-point numbers for their arithmetic operations. With quantization, this intricacy is distilled into more frugal formats, such as 16-bit integers or even 8-bit integers. It's an artful equilibrium between computational efficiency and performance optimization.GGML's versatility can be seen through a spectrum of quantization strategies: spanning 4, 5, and 8-bit quantization. Each strategy allows for improvement in efficiency and execution in different ways. For instance, 4-bit quantization thrives in memory and computational frugality, although it could potentially induce a performance decrease compared to the broader 8-bit quantization.The Rustformers library allows to integration of different language models including Bloom, GPT-2, GPT-J, GPT-NeoX, Llama, and MPT. To use these models within the Rustformers library, they undergo a transformation to align with GGML's technical underpinnings. The authorship has generously provided pre-engineered models on the Hugging Face platform, facilitating seamless integration.In the next sections, we will use the llm crate to run inference on LLM models like Llama. The realm of AI innovation is beckoning, and Rustformers' LLM, fortified by GGML's techniques, forms an alluring gateway into its intricacies.Getting Started with LLM-CLIThe Rustformers group has the mission of amplifying access to the prowess of large language models (LLMs) at the forefront of AI evolution. The group focuses on harmonizing with the rapidly advancing GGML ecosystem – a C library harnessed for quantization, enabling the execution of LLMs on CPUs. The trajectory extends to supporting diverse backends, embracing GPUs, Wasm environments, and more.For Rust developers venturing into the realm of LLMs, the key to unlocking this potential is the llm crate – the gateway to Rustformers' innovation. Through this crate, Rust developers interface with LLMs effortlessly. The "llm" project also offers a streamlined CLI for interacting with LLMs and examples showcasing its integration into Rust projects. More insights can be gained from the GitHub repository or its official documentation for released versions.To embark on your LLM journey, initiate by installing the LLM-CLI package. This package materializes the model's essence onto your console, allowing for direct inference.Getting started is a streamlined process:Clone the repository.Install the llm-cli tool from the repository.Download your chosen models from Hugging Face. In our illustration, we employ the Llama model with 4-bit quantization.Run inference on the model using the CLI tool and reference the model and architecture of the model downloaded previously.So let’s start with it. First, let's install llm-cli using this command:cargo install llm-cli --git <https://github.com/rustformers/llm>Next, we proceed by fetching your desired model from Hugging Face:curl -LO <https://huggingface.co/rustformers/open-llama-ggml/resolve/main/open_llama_3b-f16.bin>Finally, we can initiate a dialogue with the model using a command akin to:llm infer -a llama -m open_llama_3b-f16.bin -p "Rust is a cool programming language because"We can see how the llm crate stands to facilitate seamless interactions with LLMs.This project empowers developers with streamlined CLI tools, exemplifying the LLM integration into Rust projects. With installation and model preparation effortlessly explained, the journey toward LLM proficiency commences. As we transition to the culmination of this exploration, the power of LLMs is within reach, ready to reshape the boundaries of AI engagement.Conclusion: The Dawn of Accessible AI with Rust and LLMIn this exploration, we've delved deep into the revolutionary Rust library, LLM, and its transformative potential to bring Large Language Models (LLMs) to the masses. No longer is the prowess of advanced AI models locked behind the gates of high-end GPU architectures. With the symbiotic relationship between the LLM library and the underlying GGML tensor architecture, we can seamlessly run language models on standard CPUs. This is made possible largely by the potent technique of model quantization, which GGML has incorporated. By optimizing the balance between computational efficiency and performance, models can now run in environments that were previously deemed infeasible.The Rustformers' dedication to the cause shines through their comprehensive toolset. Their offerings extend from pre-engineered models on Hugging Face, ensuring ease of integration, to a CLI tool that simplifies the very interaction with these models. For Rust developers, the horizon of AI integration has never seemed clearer or more accessible.As we wrap up this segment, it's evident that the paradigm of AI integration is rapidly shifting. With tools like the llm crate, developers are equipped with everything they need to harness the full might of LLMs in their Rust projects. But the journey doesn't stop here. In the next part of this series, we venture beyond the basics, and into the realm of practical application. Join us as we take a leap forward, constructing a web server in Rust that leverages the llm crate.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 13246

article-image-spark-and-langchain-for-data-analysis
Alan Bernardo Palacio
31 Aug 2023
12 min read
Save for later

Spark and LangChain for Data Analysis

Alan Bernardo Palacio
31 Aug 2023
12 min read
IntroductionIn today's data-driven world, the demand for extracting insights from large datasets has led to the development of powerful tools and libraries. Apache Spark, a fast and general-purpose cluster computing system, has revolutionized big data processing. Coupled with LangChain, a cutting-edge library built atop advanced language models, you can now seamlessly combine the analytical capabilities of Spark with the natural language interaction facilitated by LangChain. This article introduces Spark, explores the features of LangChain, and provides practical examples of using Spark with LangChain for data analysis.Understanding Apache SparkThe processing and analysis of large datasets have become crucial for organizations and individuals alike. Apache Spark has emerged as a powerful framework that revolutionizes the way we handle big data. Spark is designed for speed, ease of use, and sophisticated analytics. It provides a unified platform for various data processing tasks, such as batch processing, interactive querying, machine learning, and real-time stream processing.At its core, Apache Spark is an open-source, distributed computing system that excels at processing and analyzing large datasets in parallel. Unlike traditional MapReduce systems, Spark introduces the concept of Resilient Distributed Datasets (RDDs), which are immutable distributed collections of data. RDDs can be transformed and operated upon using a wide range of high-level APIs provided by Spark, making it possible to perform complex data manipulations with ease.Key Components of SparkSpark consists of several components that contribute to its versatility and efficiency:Spark Core: The foundation of Spark, responsible for tasks such as task scheduling, memory management, and fault recovery. It also provides APIs for creating and manipulating RDDs.Spark SQL: A module that allows Spark to work seamlessly with structured data using SQL-like queries. It enables users to interact with structured data through the familiar SQL language.Spark Streaming: Enables real-time stream processing, making it possible to process and analyze data in near real-time as it arrives in the system.MLlib (Machine Learning Library): A scalable machine learning library built on top of Spark, offering a wide range of machine learning algorithms and tools.GraphX: A graph processing library that provides abstractions for efficiently manipulating graph-structured data.Spark DataFrame: A higher-level abstraction on top of RDDs, providing a structured and more optimized way to work with data. DataFrames offer optimization opportunities, enabling Spark's Catalyst optimizer to perform query optimization and code generation.Spark's distributed computing architecture enables it to achieve high performance and scalability. It employs a master/worker architecture where a central driver program coordinates tasks across multiple worker nodes. Data is distributed across these nodes, and tasks are executed in parallel on the distributed data.We will be diving into two different types of interaction with Spark, SparkSQL, and Spark Data Frame. Apache Spark is a distributed computing framework with Spark SQL as one of its modules for structured data processing. Spark DataFrame is a distributed collection of data organized into named columns, offering a programming abstraction similar to data frames in R or Python but optimized for distributed processing. It provides a functional programming API, allowing operations like select(), filter(), and groupBy(). On the other hand, Spark SQL allows users to run unmodified SQL queries on Spark data, integrating seamlessly with DataFrames and offering a bridge to BI tools through JDBC/ODBC.Both Spark DataFrame and Spark SQL leverage the Catalyst optimizer for efficient query execution. While DataFrames are preferred for programmatic APIs and functional capabilities, Spark SQL is ideal for ad-hoc querying and users familiar with SQL. The choice between them often hinges on the specific use case and the user's familiarity with either SQL or functional programming.In the next sections, we will explore how LangChain complements Spark's capabilities by introducing natural language interactions through agents.Introducing Spark Agent to LangChainLangChain, a dynamic library built upon the foundations of modern Language Model (LLM) technologies, is a pivotal addition to the world of data analysis. It bridges the gap between the power of Spark and the ease of human language interaction.LangChain harnesses the capabilities of advanced LLMs like ChatGPT and HuggingFace-hosted Models. These language models have proven their prowess in understanding and generating human-like text. LangChain capitalizes on this potential to enable users to interact with data and code through natural language queries.Empowering Data AnalysisThe introduction of the Spark Agent to LangChain brings about a transformative shift in data analysis workflows. Users are now able to tap into the immense analytical capabilities of Spark through simple daily language. This innovation opens doors for professionals from various domains to seamlessly explore datasets, uncover insights, and derive value without the need for deep technical expertise.LangChain acts as a bridge, connecting the technical realm of data processing with the non-technical world of language understanding. It empowers individuals who may not be well-versed in coding or data manipulation to engage with data-driven tasks effectively. This accessibility democratizes data analysis and makes it inclusive for a broader audience.The integration of LangChain with Spark involves a thoughtful orchestration of components that work in harmony to bring human-language interaction to the world of data analysis. At the heart of this integration lies the collaboration between ChatGPT, a sophisticated language model, and PythonREPL, a Python Read-Evaluate-Print Loop. The workflow is as follows:ChatGPT receives user queries in natural language and generates a Python command as a solution.The generated Python command is sent to PythonREPL for execution.PythonREPL executes the command and produces a result.ChatGPT takes the result from PythonREPL and translates it into a final answer in natural language.This collaborative process can repeat multiple times, allowing users to engage in iterative conversations and deep dives into data analysis.Several keynotes ensure a seamless interaction between the language model and the code execution environment:Initial Prompt Setup: The initial prompt given to ChatGPT defines its behavior and available tooling. This prompt guides ChatGPT on the desired actions and toolkits to employ.Connection between ChatGPT and PythonREPL: Through predefined prompts, the format of the answer is established. Regular expressions (regex) are used to extract the specific command to execute from ChatGPT's response. This establishes a clear flow of communication between ChatGPT and PythonREPL.Memory and Conversation History: ChatGPT does not possess a memory of past interactions. As a result, maintaining the conversation history locally and passing it with each new question is essential to maintaining context and coherence in the interaction.In the upcoming sections, we'll explore practical use cases that illustrate how this integration manifests in the real world, including interactions with Spark SQL and Spark DataFrames.The Spark SQL AgentIn this section, we will walk you through how to interact with Spark SQL using natural language, unleashing the power of Spark for querying structured data.Let's walk through a few hands-on examples to illustrate the capabilities of the integration:Exploring Data with Spark SQL Agent:Querying the dataset to understand its structure and metadata.Calculating statistical metrics like average age and fare.Extracting specific information, such as the name of the oldest survivor.Analyzing Dataframes with Spark DataFrame Agent:Counting rows to understand the dataset size.Analyzing the distribution of passengers with siblings.Computing descriptive statistics like the square root of average age.By interacting with the agents and experimenting with natural language queries, you'll witness firsthand the seamless fusion of advanced data processing with user-friendly language interactions. These examples demonstrate how Spark and LangChain can amplify your data analysis efforts, making insights more accessible and actionable.Before diving into the magic of Spark SQL interactions, let's set up the necessary environment. We'll utilize LangChain's SparkSQLToolkit to seamlessly bridge between Spark and natural language interactions. First, make sure you have your API key for OpenAI ready. You'll need it to integrate the language model.from langchain.agents import create_spark_sql_agent from langchain.agents.agent_toolkits import SparkSQLToolkit from langchain.chat_models import ChatOpenAI from langchain.utilities.spark_sql import SparkSQL import os # Set up environment variables for API keys os.environ['OPENAI_API_KEY'] = 'your-key'Now, let's get hands-on with Spark SQL. We'll work with a Titanic dataset, but you can replace it with your own data. First, create a Spark session, define a schema for the database, and load your data into a Spark DataFrame. We'll then create a table in Spark SQL to enable querying.from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() schema = "langchain_example" spark.sql(f"CREATE DATABASE IF NOT EXISTS {schema}") spark.sql(f"USE {schema}") csv_file_path = "titanic.csv" table = "titanic" spark.read.csv(csv_file_path, header=True, inferSchema=True).write.saveAsTable(table) spark.table(table).show() Now, let's initialize the Spark SQL Agent. This agent acts as your interactive companion, enabling you to query Spark SQL tables using natural language. We'll create a toolkit that connects LangChain, the SparkSQL instance, and the chosen language model (in this case, ChatOpenAI).from langchain.agents import AgentType spark_sql = SparkSQL(schema=schema) llm = ChatOpenAI(temperature=0, model="gpt-4-0613") toolkit = SparkSQLToolkit(db=spark_sql, llm=llm, handle_parsing_errors="Check your output and make sure it conforms!") agent_executor = create_spark_sql_agent(    llm=llm,    toolkit=toolkit,    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,    verbose=True,    handle_parsing_errors=True)Now comes the exciting part—querying Spark SQL tables using natural language! With your Spark SQL Agent ready, you can ask questions about your data and receive insightful answers. Let's try a few examples:# Describe the Titanic table agent_executor.run("Describe the titanic table") # Calculate the square root of the average age agent_executor.run("whats the square root of the average age?") # Find the name of the oldest survived passenger agent_executor.run("What's the name of the oldest survived passenger?") With these simple commands, you've tapped into the power of Spark SQL using natural language. The Spark SQL Agent makes data exploration and querying more intuitive and accessible than ever before.The Spark DataFrame AgentIn this section, we'll dive into another facet of LangChain's integration with Spark—the Spark DataFrame Agent. This agent leverages the power of Spark DataFrames and natural language interactions to provide an engaging and insightful way to analyze data.Before we begin, make sure you have a Spark session set up and your data loaded into a DataFrame. For this example, we'll use the Titanic dataset. Replace csv_file_path with the path to your own data if needed.from langchain.llms import OpenAI from pyspark.sql import SparkSession from langchain.agents import create_spark_dataframe_agent spark = SparkSession.builder.getOrCreate() csv_file_path = "titanic.csv" df = spark.read.csv(csv_file_path, header=True, inferSchema=True) df.show()Initializing the Spark DataFrame AgentNow, let's unleash the power of the Spark DataFrame Agent! This agent allows you to interact with Spark DataFrames using natural language queries. We'll initialize the agent by specifying the language model and the DataFrame you want to work with.# Initialize the Spark DataFrame Agent agent = create_spark_dataframe_agent(llm=OpenAI(temperature=0), df=df, verbose=True)With the agent ready, you can explore your data using natural language queries. Let's dive into a few examples:# Count the number of rows in the DataFrame agent.run("how many rows are there?") # Find the number of people with more than 3 siblings agent.run("how many people have more than 3 siblings") # Calculate the square root of the average age agent.run("whats the square root of the average age?")Remember that the Spark DataFrame Agent under the hood uses generated Python code to interact with Spark. While it's a powerful tool for interactive analysis, ensures that the generated code is safe to execute, especially in a sensitive environment.In this final section, let's tie everything together and showcase how Spark and LangChain work in harmony to unlock insights from data. We've covered the Spark SQL Agent and the Spark DataFrame Agent, so now it's time to put theory into practice.In conclusion, the combination of Spark and LangChain transcends the traditional boundaries of technical expertise, enabling data enthusiasts of all backgrounds to engage with data-driven tasks effectively. Through the Spark SQL Agent and Spark DataFrame Agent, LangChain empowers users to interact, explore, and analyze data using the simplicity and familiarity of natural language. So why wait? Dive in and unlock the full potential of your data analysis journey with the synergy of Spark and LangChain.ConclusionIn this article, we've delved into the world of Apache Spark and LangChain, two technologies that synergize to transform how we interact with and analyze data. By bridging the gap between technical data processing and human language understanding, Spark and LangChain enable users to derive meaningful insights from complex datasets through simple, natural language queries. The Spark SQL Agent and Spark DataFrame Agent presented here demonstrate the potential of this integration, making data analysis more accessible to a wider audience. As both technologies continue to evolve, we can expect even more powerful capabilities for unlocking the true potential of data-driven decision-making. So, whether you're a data scientist, analyst, or curious learner, harnessing the power of Spark and LangChain opens up a world of possibilities for exploring and understanding data in an intuitive and efficient manner.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 15281

article-image-unleashing-the-power-of-wolfram-alpha-api-with-python-and-chatgpt
Alan Bernardo Palacio
31 Aug 2023
6 min read
Save for later

Unleashing the Power of Wolfram Alpha API with Python and ChatGPT

Alan Bernardo Palacio
31 Aug 2023
6 min read
IntroductionIn the ever-evolving landscape of artificial intelligence, a groundbreaking collaboration has emerged between Wolfram Alpha and ChatGPT, giving birth to an extraordinary plugin: the AI Advantage. This partnership bridges the gap between ChatGPT's proficiency in natural language processing and Wolfram Alpha's computational prowess. The result? A fusion that unlocks an array of new possibilities, revolutionizing the way we interact with AI. In this hands-on tutorial, we're embarking on a journey to explore the power of the Wolfram Alpha API, demonstrate its integration with Python and ChatGPT, and empower you to tap into this dynamic duo for tasks ranging from complex calculations to real-time data retrieval.Understanding Wolfram Alpha APIImagine having an intelligent assistant at your fingertips, capable of not only understanding your questions but also providing detailed computational insights. That's where Wolfram Alpha shines. It's more than just a search engine; it's a computational knowledge engine. Whether you need to solve a math problem, retrieve real-time data, or generate visual content, Wolfram Alpha has you covered. Its unique ability to compute answers based on structured data sets it apart from traditional search engines.So, how can you tap into this treasure trove of computational knowledge? Enter the Wolfram Alpha API. This API exposes Wolfram Alpha's capabilities for developers to harness in their applications. Whether you're building a chatbot, a data analysis tool, or an educational resource, the Wolfram Alpha API can provide you with instant access to accurate and in-depth information. The API supports a wide range of queries, from straightforward calculations to complex data retrievals, making it a versatile tool for various use cases.Integrating Wolfram Alpha API with ChatGPTChatGPT's strength lies in its ability to understand and generate human-like text based on input. However, when it comes to intricate calculations or pulling real-time data, it benefits from a partner like Wolfram Alpha. By integrating the two, you create a dynamic synergy where ChatGPT can effortlessly tap into Wolfram Alpha's computational engine to provide accurate and data-driven responses. This collaboration bridges the gap between language understanding and computation, resulting in a well-rounded AI interaction.Before we dive into the technical implementation, let's get you set up to take advantage of the Wolfram Alpha plugin for ChatGPT. First, ensure you have access to ChatGPT+. To enable the Wolfram plugin, follow these steps:Open the ChatGPT interface.Navigate to "Settings."Look for the "Beta Features" section.Enable "Plugins" under the GPT-4 options.Once "Plugins" is enabled, locate and activate the Wolfram plugin.With the plugin enabled you're ready to harness the combined capabilities of ChatGPT and Wolfram Alpha API, making your AI interactions more robust and informative.In the next sections, we'll dive into practical applications and walk you through implementing the integration using Python and ChatGPT.Practical Applications with Code ExamplesLet's start by exploring how the Wolfram Alpha API can assist with complex mathematical tasks. Below are code examples that demonstrate the integration between ChatGPT and Wolfram Alpha to solve intricate math problems. In these scenarios, ChatGPT serves as the bridge between you and Wolfram Alpha, seamlessly delivering accurate solutions.Before diving into the code implementation, let's ensure your environment is ready to go. Follow these steps to set up the necessary components:Install the required packages: Make sure you have the necessary Python packages installed. You can use pip to install them:pip install langchain openai wolframalphaNow, let's walk through implementing the code example you provided earlier. This code integrates the Wolfram Alpha API with ChatGPT to provide accurate and informative responses:Wolfram Alpha can solve simple arithmetic queries:# User input question = "Solve for x: 2x + 5 = 15" # Let ChatGPT interact with Wolfram Alpha response = agent_chain.run(input=question) # Extracting and displaying the result from the response result = response['text'] print("Solution:", result)Or mode complex ones like calculating integrals:# User input question = "Calculate the integral of x^2 from 0 to 5" # Let ChatGPT interact with Wolfram Alpha response = agent_chain.run(input=question) # Extracting and displaying the result from the response print("Integral:", response) Real-time Data RetrievalIncorporating real-time data into conversations can greatly enhance the value of AI interactions. Here are code examples showcasing how to retrieve up-to-date information using the Serper API and integrate it seamlessly into the conversation:# User input question = "What's the current exchange rate between USD and EUR?" # Let ChatGPT interact with Wolfram Alpha response = agent_chain.run(input=question) # Extracting and displaying the result from the response print("Exchange Rate:", response)We can also ask for the current weather forecast:# User input question = "What's the weather forecast for London tomorrow?" # Let ChatGPT interact with Wolfram Alpha response = agent_chain.run(input=question) # Extracting and displaying the result from the response print("Weather Forecast:", response)Now we can put everything together into a single block including all the required library imports and use both real time data with Serper and use the reasoning skills of Wolfram Alpha.# Import required libraries from langchain.agents import load_tools, initialize_agent from langchain.llms import OpenAI from langchain.memory import ConversationBufferMemory from langchain.chat_models import ChatOpenAI # Set environment variables import os os.environ['OPENAI_API_KEY'] = 'your-key' os.environ['WOLFRAM_ALPHA_APPID'] = 'your-key' os.environ["SERPER_API_KEY"] = 'your-key' # Initialize the ChatGPT model llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo") # Load tools and set up memory tools = load_tools(["google-serper", "wolfram-alpha"], llm=llm) memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) # Initialize the agent agent_chain = initialize_agent(tools, llm, handle_parsing_errors=True, verbose=True, memory=memory) # Interact with the agent response_weather = agent_chain.run(input="what is the weather in Amsterdam right now in celcius? Don't make assumptions.") response_flight = agent_chain.run(input="What's a good price for a flight from JFK to AMS this weekend? Express the price in Euros. Don't make assumptions.")ConclusionIn this tutorial, we've delved into the exciting realm of integrating the Wolfram Alpha API with Python and ChatGPT. We've explored how this collaboration empowers you to tackle complex mathematical tasks and retrieve real-time data seamlessly. By harnessing the capabilities of both Wolfram Alpha and ChatGPT, you've unlocked a powerful synergy that's capable of transforming your AI interactions. As you continue to explore and experiment with this integration, you'll discover new ways to enhance your interactions and leverage the strengths of each tool. So, why wait? Start your journey toward more informative and engaging AI interactions today.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 35223

article-image-transformer-building-blocks
Saeed Dehqan
29 Aug 2023
22 min read
Save for later

Transformer Building Blocks

Saeed Dehqan
29 Aug 2023
22 min read
IntroductionTransformers employ potent techniques to preprocess tokens before sequentially inputting them into a neural network, aiding in the selection of the next token. At the transformer's apex is a basic neural network, the transformer head. The text generator model processes input tokens and generates a probability distribution for subsequent tokens. Context length, termed context length or block size, is recognized as a hyperparameter denoting input token count. The model's primary aim is to predict the next token based on input tokens (referred to as context tokens or context windows). Our goal with n tokens is to predict the subsequent fitting token following previous ones. Thus, we rely on these n tokens to anticipate the next. As humans, we attempt to grasp the conversation's context - our location and a loose foresight of the path's culmination. Upon gathering pertinent insights, relevant words emerge, while irrelevant ones fade, enabling us to choose the next word with precision. We occasionally err but backtrack, a luxury transformers lack. If they incorrectly predict (an irrelevant token), they persist, though exceptions exist, like beam search. Unlike us, transformers can't forecast. Revisiting n prior tokens, our human assessment involves inspecting them individually, and discerning relationships from diverse angles. By prioritizing pivotal tokens and disregarding superfluous ones, we evaluate tokens within various contexts. We scrutinize all n prior tokens individually, ready to prognosticate. This embodies the essence of the multihead attention mechanism in transformers. Consider a context window with 5 tokens. Each wears a distinct mask, predicting its respective next token:"To discern the void amidst, we must first grasp the fullness within." To understand what token is lacking, we must first identify what we are and possess. We need communication between tokens since tokens don’t know each other yet and in order to predict their own next token, they first need to know each other well and pair together in such a way that tokens with similar characteristics stay near each other (technically having similar vectors). Each token has three vectors that represent:●    What tokens they are looking for (known as query)●    What they really have (known as key)●    What they are (known as value)Each token with its query starts looking for similar keys, finds each other, and starts to know one another by adding up their values:Similar tokens find each other and if a token is somehow dissimilar, here Token 4, other tokens don’t consider it much. But please note that every token (much or less) has its own effect on other tokens. Also, in self-attention, all tokens ask all other tokens with their query and keys to find familiar tokens, but not the future tokens, named masked self-attention. We prohibit tokens from communicating to future tokens. After exchanging information between tokens and mixing up their values, similar tokens become more similar:As you can see, the color of similar tokens becomes more similar(in action, their vectors become more similar). Since tokens in the group wear a mask, we cannot access the true tokens’ values. We just know and distinguish them from their mask(value). This is because every token has different characteristics in different contexts, and they don’t show their true essence.So far so good; we have finished the self-attention process and now, the group is ready to predict their next tokens. This is because individuals are aware of each other very well, and as a result, they can guess the next token better. Now, each token separately needs to go to a nonlinear network and then to the transformer head, to predict its own next token. We ask each one of the tokens separately to tell their opinion about the probability of what token comes next. Finally, we collect the probability distributions of all tokens in the context window. A probability distribution sums up to 100, or actually in action to 1. We give probability to every token the model has in its vocabulary. The simplest method to extract the next token from probability distributions is to select the one with the highest probability:As you can see, each token goes to the neural network and the network returns a probability distribution. The result is the following sentence: “It looks like a bug”.Voila! We managed to go through a simple Transformer model.Let’s recap everything we’ve said. A transformer receives n tokens as input, does some stuff (like self-attention, layer normalization, etc.) and feed-forward them into a neural network to get probability distributions of the next token. Each token goes to the neural network separately; if the number of tokens is 10, there are 10 probability distributions.At this point, you know intuitively how the main building blocks of a transformer work. But let us better understand them by implementing a transformer model.Clone the repository tiny-transformer:git clone https://github.com/saeeddhqan/tiny-transformerExecute simple_model.py in the repository If you simply want to run the model for training.Create a new file, and import the necessary modules:import math import torch import torch.nn as nn import torch.nn.functional as F Load the dataset and write the tokenizer: with open('shakespeare.txt') as fp: text = fp.read() chars = sorted(list(set(text))) vocab_size = len(chars) stoi = {c:i for i,c in enumerate(chars)} itos = {i:c for c,i in stoi.items()} encode = lambda s: [stoi[x] for x in s] decode = lambda e: ''.join([itos[x] for x in e])●    Open the dataset, and define a variable that is a list of all unique characters in the text.●    The set function splits the text character by character and then removes duplicates, just like sets in set theory. list(set(myvar)) is a way of removing duplicates in a list or string.●    vocab_size is the number of unique characters (here 65). ●    stoi is a dictionary where its keys are characters and values are their indices.●    itos is used to convert indices to characters. ●    encode function receives a string and returns indices of characters. ●    decode receives a list of indices and returns a string. Split the dataset into test and train and write a function that returns data for training:device = 'cuda' if torch.cuda.is_available() else 'cpu' torch.manual_seed(1234) data = torch.tensor(encode(text), dtype=torch.long).to(device) train_split = int(0.9 * len(data)) train_data = data[:train_split] test_data = data[train_split:] def get_batch(split='train', block_size=16, batch_size=1) -> 'Create a random batch and returns batch along with targets': data = train_data if split == 'train' else test_data ix = torch.randint(len(data) - block_size, (batch_size,)) x = torch.stack([data[i:i + block_size] for i in ix]) y = torch.stack([data[i+1:i + block_size + 1] for i in ix]) return x, y●    Choose a suitable device.●     Set a seed to make the training reproducible.●    Convert the text into a large list of indices with the encode function.●    Since the character indices are integer, we use torch.long data type to make the data suitable for the model. ●    90% for training and 10% for testing.●    If the batch_size is 10, we select 10 chunks or sequences from the dataset and stack them up to process them simultaneously. ●    If the batch_size is 1, get_batch function selects 1 random chunk (n consequence characters) from the dataset and returns x and y, where x is 16 characters’ indices and y is the target characters for x.The shape, value, and decoded version of the selected chunk are as follows:shape x: torch.Size([1, 16]) shape y: torch.Size([1, 16]) value x: tensor([[41, 43, 6, 1, 60, 47, 50, 50, 39, 47, 52, 2, 1, 52, 43, 60]]) value y: tensor([[43, 6, 1, 60, 47, 50, 50, 39, 47, 52, 2, 1, 52, 43, 60, 43]]) decoded x: ce, villain! nev decoded y: e, villain! neveWe usually process multiple chunks or sequences at once with batching in order to speed up the training. For each character, we have an equivalent target, which is its next token. The target for ‘c’ is ‘e’, for ‘e’ is ‘,’, for ‘v’ is ‘i’, and so on. Let us talk a bit about the input shape and output shape of tensors in a transformer model. The model receives a list of token indices like the above(named a sequence, or chunk) and maps them into their corresponding vectors.●    The input shape is (batch_size, block_size).●    After mapping indices into vectors, the data shape becomes (batch_size, block_size, embed_size).●    Then, through the multihead attention and feed-forward layers, the data shape does not change.●    Finally, the data with shape (batch_size, block_size, embed_size) goes to the transformer head (a simple neural network) and the output shape becomes (batch_size, block_size, vocab_size). vocab_size is the number of unique characters that can come next (for the Shakespeare dataset, the number of unique characters is 65).Self-attentionThe communication between tokens happens in the head class; we define the scores variable to save the similarity between vectors. The higher the score is, the more two vectors have in common. We then utilize these scores to do a weighted sum of all the vectors: class head(nn.Module): def __init__(self, embeds_size=32, block_size=16, head_size=8):     super().__init__()     self.key = nn.Linear(embeds_size, head_size, bias=False)     self.query = nn.Linear(embeds_size, head_size, bias=False)     self.value = nn.Linear(embeds_size, head_size, bias=False)     self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))     self.dropout = nn.Dropout(0.1) def forward(self, x):     B,T,C = x.shape     # What am I looking for?     q = self.query(x)     # What do I have?     k = self.key(x)     # What is the representation value of me?     # Or: what's my personality in the group?     # Or: what mask do I have when I'm in a group?     v = self.value(x)     scores = q @ k.transpose(-2,-1) * (1 / math.sqrt(C)) # (B,T,head_size) @ (B,head_size,T) --> (B,T,T)     scores = scores.masked_fill(self.tril[:T, :T] == 0, float('-inf'))     scores = F.softmax(scores, dim=-1)     scores = self.dropout(scores)     out = scores @ v     return outUse three linear layers to transform the vector into key, query, and value, but with a smaller dimension (here same as head_size).Q, K, V: Q and K are for when we want to find similar tokens. We calculate the similarity between vectors with a dot product: q @ k.transpose(-2, -1). The shape of scores is (batch_size, block_size, block_size), which means we have the similarity scores between all the vectors in the block. V is used when we want to do the weighted sum. Scores: Pure dot product scores tend to have very high numbers that are not suitable for softmax since it makes the scores dense. Therefore, we rescale the results with a ratio of (1 / math.sqrt(C)). C is the embedding size. We call this a scaled dot product.Register_buffer: We used register_buffer to register a lower triangular tensor. In this way, when you save and load the model, this tensor also becomes part of the model.Masking: After calculating the scores, we need to replace future scores with -inf to shut them off so that the vectors do not have access to the future tokens. By doing so, these scores effectively become zero after applying the softmax function, resulting in a probability of zero for the future tokens. This process is referred to as masking. Here’s an example of masked scores with a block size 4:[[-0.1710, -inf, -inf, -inf], [ 0.2007, -0.0878, -inf, -inf], [-0.0405, 0.2913, 0.0445, -inf], [ 0.1328, -0.2244, 0.0796, 0.1719]]Softmax: It converts a vector into a probability distribution that sums up to 1. Here’s the scores after softmax:      [[1.0000, 0.0000, 0.0000, 0.0000],      [0.5716, 0.4284, 0.0000, 0.0000],      [0.2872, 0.4002, 0.3127, 0.0000],      [0.2712, 0.1897, 0.2571, 0.2820]]The scores of future tokens are zero; after doing a weighted sum, the future vectors become zero and the vectors receive none data from future vectors(n*0=0)Dropout: Dropout is a regularization technique. It drops some of the numbers in vectors randomly. Dropout helps the model to generalize, not memorize the dataset. We don’t want the model to memorize the Shakespeare model, right? We want it to create new texts like the dataset.Weighted sum: Weighted sum is used to combine different representations or embeddings based on their importance. The scores are calculated by measuring the relevance or similarity between each pair of vectors. The relevance scores are obtained by applying a scaled dot product between the query and key vectors, which are learned during the training process. The resulting weighted sum emphasizes the more important elements and reduces the influence of less relevant ones, allowing the model to focus on the most salient information. We dot product scores with values and the result is the outcome of self-attention.Output: since the embedding size and head size are 32 and 8 respectively, if the input shape is (batch_size, block_size, 32), the output has the shape of (batch_size, block_size, 8).Multihead self-attention“I have multiple personalities(v), tendencies and needs (q), and valuable things (k) in different spaces”. Vectors said.We transform the vectors into small dimensions, and then run self-attention on them; we did this in the previous class. In multihead self-attention, we call the head class four times, and then, concatenate the smaller vectors to have the same input shape. We call this multihead self-attention. For instance, if the shape of input data is (1, 16, 32), we transform it into four (1, 16, 8) tensors and run self-attention on these tensors. Why four times? 4 * 8 = initial shape. By using multihead self-attention and running self-attention multiple times, what we do is consider the different aspects of vectors in different spaces. That’s all!Here is the code:class multihead(nn.Module): def __init__(self, num_heads=4, head_size=8):     super().__init__()     self.multihead = nn.ModuleList([head(head_size) for _ in range(num_heads)])     self.output_linear = nn.Linear(embeds_size, embeds_size)     self.dropout = nn.Dropout(0.1) def forward(self, hidden_state):     hidden_state = torch.cat([head(hidden_state) for head in self.multihead], dim=-1)     hidden_state = self.output_linear(hidden_state)     hidden_state = self.dropout(hidden_state)     return hidden_state●    self.multihead: The variable creates four heads and we do this with nn.ModuleList.●    self.output_linear: Another transformer linear layer we apply at the end of the multihead self-attention process.●    self.dropout: Using dropout on the final results.●    hidden_state 1: Concatenating the output of heads so that we have the same shape as input. Heads transform data into different spaces with smaller dimensions, and then do the self-attention.●    hidden_state 2: After doing communication between tokens with self-attention, we use the self.output_linear projector to let the model adjust vectors further based on the gradients that flow through the layer.●    dropout: Run dropout on the output of the projection with a 10% probability of turning off values (make them zero) in the vectors.Transformer blockThere are two new techniques, including layer normalization and residual connection, that need to be explained:class transformer_block(nn.Module): def __init__(self, embeds_size=32, num_heads=8):     super().__init__()     self.head_count = embeds_size // num_heads     self.n_heads = multihead(num_heads, self.head_count)     self.ffn = nn.Sequential(         nn.Linear(embeds_size, 4 * embeds_size),         nn.ReLU(),         nn.Linear(4 * embeds_size, embeds_size),         nn.Dropout(drop_prob),     )     self.ln1 = nn.LayerNorm(embeds_size)     self.ln2 = nn.LayerNorm(embeds_size) def forward(self, hidden_state):     hidden_state = hidden_state + self.n_heads(self.ln1(hidden_state))     hidden_state = hidden_state + self.ffn(self.ln2(hidden_state))     return hidden_state self.head_count: Calculates the head size. The number of heads should be divisible by the embedding size so that we can concatenate the output of heads.self.n_heads: The multihead self-attention layer. self.ffn: This is the first time that we have non-linearity in our model. Non-linearity helps the model to capture complex relationships and patterns in the data. By introducing non-linearity through ReLU activation functions, or GLUE, the model can make a correlation for the data. As a result, it better models the intricacies of the input data. Non-linearity is like “you go to the next layer”, “You don’t go to the next layer”, or “Create y from x for the next layer”. The recommended hidden layer size is a number four times bigger than the embedding size. That’s why “4 * embeds_size”. You can also try SwiGLU as the activation function instead of ReLU.self.ln1 and self.ln2: Layer normalizers make the model more robust and they also help the model to converge faster. Layer normalization rescales the data in such a way that the mean is zero and the standard deviation is one. hidden_state 1: Normalize the vectors with self.ln1 and forward the vectors to the multihead attention. Next, we add the input to the output of multihead attention. It helps the model in two ways:○    First, the model has some information from the original vectors. ○    Second, when the model becomes deep, during backpropagation, the gradients will be weak for earlier layers and the model will converge too slowly. We recognize this effect as gradient vanishing. Adding the input helps to enrich the gradients and mitigate the gradient vanishing. We recognize it as a residual connection.hidden_state 2: Hidden_state 1 goes to a layer normalization and then to a nonlinear network. The output will be added to the hidden state with the aim of keeping gradients for all layers.The modelAll the necessary parts are ready, let us stack them up to make the full model:class transformer(nn.Module): def __init__(self):     super().__init__()     self.stack = nn.ModuleDict(dict(         tok_embs=nn.Embedding(vocab_size, embeds_size),         pos_embs=nn.Embedding(block_size, embeds_size),         dropout=nn.Dropout(drop_prob),         blocks=nn.Sequential(             transformer_block(),             transformer_block(),             transformer_block(),             transformer_block(),             transformer_block(),         ),         ln=nn.LayerNorm(embeds_size),         lm_head=nn.Linear(embeds_size, vocab_size),     ))●    self.stack: A list of all necessary layers.●    tok_embs: This is a learnable lookup table that receives a list of indices and returns their vectors.●    pos_embs: Just like tok_embs, it is also a learnable look-up table, but for positional embedding. It receives a list of positions and returns their vectors.●    dropout: Dropout layer.●    blocks: We create multiple transformer blocks sequentially.●    ln: A layer normalization.●    lm_heas: Transformer head receives a token and returns probabilities of the next token. To change the model to be a classifier, or a sentimental analysis model, we just need to change this layer and remove masking from the self-attention layer.The forward method of the transformer class:    def forward(self, seq, targets=None):     B, T = seq.shape     tok_emb = self.stack.tok_embs(seq) # (batch, block_size, embed_dim) (B,T,C)     pos_emb = self.stack.pos_embs(torch.arange(T, device=device))     x = tok_emb + pos_emb     x = self.stack.dropout(x)     x = self.stack.blocks(x)     x = self.stack.ln(x)     logits = self.stack.lm_head(x) # (B, block_size, vocab_size)     if targets is None:         loss = None     else:         B, T, C = logits.shape         logits = logits.view(B * T, C)         targets = targets.view(B * T)         loss = F.cross_entropy(logits, targets)     return logits, loss●  tok_emb: Convert token indices into vectors. Given the input (B, T), the output is (B, T, C), where C is the embeds_size.●  pos_emb: Given the number of tokens in the context window or block_size, it returns the positional embedding of each position.●  x 1: Add up token embeddings and position embeddings. A little bit lossy but it works just fine.●  x 2: Run dropout on embeddings.●  x 3: The embeddings go through all the transformer blocks, and multihead self-attention. The input is (B, T, C) and the output is (B, T, C).●  x 4: The outcome of transformer blocks goes to the layer normalization.●  logits: We usually recognize the unnormalized values extracted from the language model head as logits :)●  if-else block: Were the targets specified, we calculated cross-entropy loss. Otherwise, the loss will be None. Before calculating loss in the else block, we change the shape as the cross entropy function expects.●  Output: The method returns logits with shape (batch_size, block_size, vocab_size) and loss if any.For generating a text, add this to the transformer class:    def autocomplete(self, seq, _len=10):        for _ in range(_len):            seq_crop = seq[:, -block_size:] # crop it            logits, _ = self(seq_crop)            logits = logits[:, -1, :] # we care about the last token            probs = F.softmax(logits, dim=-1)            next_char = torch.multinomial(probs, num_samples=1)            seq = torch.cat((seq, next_char), dim=1)        return seq●  autocomplete: Given a tokenized sequence, and the number of tokens that need to be created, this method returns _len tokens.●  seq_crop: Select the last n tokens in the sequence to give it to the model. The sequence length might be larger than the block_size and it causes an error if we don’t crop it.●  logits 1: Forward the sequence into the model to receive the logits.●  logits 2: Select the last logit that will be used to select the next token.●  probs: Run the softmax on logits to get a probability distribution.●  next_char: Multinomial selects one sample from the probs. The higher the probability of a token, the higher the chance of being selected.●  seq: Add the selected character to the sequence.TrainingThe rest of the code is downstream tasks such as training loops, etc. The codes that are provided here are slightly different from the tiny-transformer repository. I trained the model with the following hyperparameters:block_size = 256 learning_rate = 9e-4 eval_interval = 300 # Every n step, we do an evaluation. iterations = 5000 # Like epochs batch_size = 64 embeds_size = 195 num_heads = 5 num_layers = 5 drop_prob = 0.15And here’s the generated text:If you need to improve the quality, increase embeds_size, num_layers, and heads.ConclusionThe article explores transformers' text generation role, detailing token preprocessing through self-attention and neural network heads. Transformers predict tokens using context length as a hyperparameter. Human context comprehension is paralleled, highlighting relevant word emergence and fading of irrelevant words for precise selection. Transformers lack human foresight and backtracking. Key components—self-attention, multihead self-attention, and transformer blocks—are explained, and supported by code snippets. Token and positional embeddings, layer normalization, and residual connections are detailed. The model's text generation is exemplified via the autocomplete method. Training parameters and text quality enhancement are addressed, showcasing transformers' potential.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 17357
article-image-exploring-token-generation-strategies
Saeed Dehqan
28 Aug 2023
8 min read
Save for later

Exploring Token Generation Strategies

Saeed Dehqan
28 Aug 2023
8 min read
IntroductionThis article discusses different methods for generating sequences of tokens using language models, specifically focusing on the context of predicting the next token in a sequence. The article explains various techniques to select the next token based on the predicted probability distribution of possible tokens.Language models predict the next token based on n previous tokens. Models try to extract information from n previous tokens as far as they can. Transformer models aggregate information from all the n previous tokens. Tokens in a sequence communicate with one another and exchange their information. At the end of the communication process, tokens are context-aware and we use them to predict their own next token. Each token separately goes to some linear/non-linear layers and the output is unnormalized logits. Then, we apply Softmax on logits to convert them into probability distributions. Each token has its own probability distribution over its next token:Exploring Methods for Token SelectionWhen we have the probability distribution of tokens, it’s time to pick one token as the next token. There are four methods for selecting the suitable token from probability distribution:●    Greedy or naive method: Simply select the token that has the highest probability from the list. This is a deterministic method.●    Beam search: It receives a parameter named beam size and based on it, the algorithm tries to use the model to predict multiple times to find a suitable sentence, not just a token. This is a deterministic method.●    Top-k sampling: Select the top k most probable tokens and shut off other tokens (make their probability -inf) and sample from top k tokens. This is a sampling method.●    Nucleus sampling: Select the top most probable tokens and shut off other tokens but with a difference that is a dynamic selection of most probable tokens. Not just a crisp k.Greedy methodThis is a simple and fast method and only needs one prediction. Just select the most probable token as the next token. Greedy methods can be efficient on arithmetic tasks. But, it tends to get stuck in a loop and repeat tokens one after another. It also kills the diversity of the model by selecting the tokens that occur frequently in the training dataset.Here’s the code that converts unnormalized logits(simply the output of the network) into probability distribution and selects the most probable next token:probs = F.softmax(logits, dim=-1) next_token = probs.argmax() Beam searchBeam search produces better results and is slower because it runs the model multiple times so that it can create n sequences, where n is beam size. This method selects top n tokens and adds them to the current sequence and runs the model on the made sequences to predict the next token. And this process continues until the end of the sequence. Computationally expensive, but more quality. Based on this search, the algorithm returns two sequences:Then, how do we select the final sequence? We sum up the loss for all predictions and select the sequence with the lowest loss.Simple samplingWe can select tokens randomly based on their probability. The more the probability, the more the chance of being selected. We can achieve this by using multinomial method:logits = logits[:, -1, :] probs = F.softmax(logits, dim=-1) next_idx = torch.multinomial(probs, num_samples=1)This is part of the model we implemented in the “transformer building blocks” blog and the code can be found here. The torch.multinomial receives the probability distribution and selects n samples. Here’s an example:In [1]: import torch In [2]: probs = torch.tensor([0.3, 0.6, 0.1]) In [3]: torch.multinomial(probs, num_samples=1) Out[3]: tensor([1]) In [4]: torch.multinomial(probs, num_samples=1) Out[4]: tensor([0]) In [5]: torch.multinomial(probs, num_samples=1) Out[5]: tensor([1]) In [6]: torch.multinomial(probs, num_samples=1) Out[6]: tensor([0]) In [7]: torch.multinomial(probs, num_samples=1) Out[7]: tensor([1]) In [8]: torch.multinomial(probs, num_samples=1) Out[8]: tensor([1])We ran the method six times on probs, and as you can see it selects 0.6 four times and 0.3 two times because 0.6 is higher than 0.3.Top-k samplingIf we want to make the previous sampling method better, we need to limit the sampling space. Top-k sampling does this. K is a parameter that Top-k sampling uses to select top k tokens from the probability distribution and sample from these k tokens. Here is an example of top-k sampling:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: logit Out[3]: tensor([-1.1147, 0.5769, 0.3831, -0.5841, 1.7528, -0.7718, -0.4438, 0.6529, 0.1500, 1.2592]) In [4]: topk_values, topk_indices = torch.topk(logit, 3) In [5]: topk_values Out[5]: tensor([1.7528, 1.2592, 0.6529]) In [6]: logit[logit < topk_values[-1]] = float('-inf') In [7]: logit Out[7]: tensor([ -inf, -inf, -inf, -inf, 1.7528, -inf, -inf, 0.6529, -inf, 1.2592]) In [8]: probs = logit.softmax(0) In [9]: probs Out[9]: tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.5146, 0.0000, 0.0000, 0.1713, 0.0000, 0.3141]) In [10]: torch.multinomial(probs, num_samples=1) Out[10]: tensor([9]) In [11]: torch.multinomial(probs, num_samples=1) Out[11]: tensor([4]) In [12]: torch.multinomial(probs, num_samples=1) Out[12]: tensor([9])●    We first create a fake logit with torch.randn. Supposedly logit is the raw output of a network.●    We use torch.topk to select the top 3 values from logit. torch.topk returns top 3 values along with their indices. The values are sorted from top to bottom.●    We use advanced indexing to select logit values that are lower than the last top 3 values. When we say logit < topk_values[-1] we mean all the numbers in logit that are lower than topk_values[-1] (0.6529). ●    After selecting those numbers, we replace their value to float(‘-inf’), which is a negative infinite number. ●    After replacement, we run softmax over the logit to convert it into probabilities. ●    Now, we use torch.multinomial to sample from the probs.Nucleus samplingNucleus sampling is like Top-k sampling but with a dynamic selection of top tokens instead of selecting k tokens. The dynamic selection is better when we are unsure of selecting a suitable k for Top-k sampling. Nucleus sampling has a hyperparameter named p, let us say it is 0.9, and this method selects tokens from descending order and adds up their probabilities and when we reach a cumulative sum of p, we stop. What is the cumulative sum? Here’s an example of cumulative sum:In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.0652, 0.0330, 0.0609, 0.0436, 0.2365, 0.1738, 0.0651, 0.0692, 0.0495, 0.2031]) In [5]: [probs[:x+1].sum() for x in range(probs.size(0))] Out[5]: [tensor(0.0652), tensor(0.0983), tensor(0.1592), tensor(0.2028), tensor(0.4394), tensor(0.6131), tensor(0.6782), tensor(0.7474), tensor(0.7969), tensor(1.)]I hope you understand how cumulative sum works from the code. We just add up n previous prob values. We can also use torch.cumsum and get the same result:In [9]: torch.cumsum(probs, dim=0) Out[9]: tensor([0.0652, 0.0983, 0.1592, 0.2028, 0.4394, 0.6131, 0.6782, 0.7474, 0.7969, 1.0000]) Okay. Here’s a nucleus sampling from scratch: In [1]: import torch In [2]: logit = torch.randn(10) In [3]: probs = logit.softmax(0) In [4]: probs Out[4]: tensor([0.7492, 0.0100, 0.0332, 0.0078, 0.0191, 0.0370, 0.0444, 0.0553, 0.0135, 0.0305]) In [5]: sprobs, indices = torch.sort(probs, dim=0, descending=True) In [6]: sprobs Out[6]: tensor([0.7492, 0.0553, 0.0444, 0.0370, 0.0332, 0.0305, 0.0191, 0.0135, 0.0100, 0.0078]) In [7]: cs_probs = torch.cumsum(sprobs, dim=0) In [8]: cs_probs Out[8]: tensor([0.7492, 0.8045, 0.8489, 0.8860, 0.9192, 0.9497, 0.9687, 0.9822, 0.9922, 1.0000]) In [9]: selected_tokens = cs_probs < 0.9 In [10]: selected_tokens Out[10]: tensor([ True, True, True, True, False, False, False, False, False, False]) In [11]: probs[indices[selected_tokens]] Out[11]: tensor([0.7492, 0.0553, 0.0444, 0.0370]) In [12]: probs = probs[indices[selected_tokens]] In [13]: torch.multinomial(probs, num_samples=1) Out[13]: tensor([0])●    Convert the logit to probabilities and sort it with descending order so that we can select them from top to bottom.●    Calculate cumulative sum.●    Using advanced indexing, we filter out values.●    Then, we sample from a limited and better space.Please note that you can use a combination of top-k and nucleus samplings. It is like selecting k tokens and doing nucleus sampling on these k tokens. You can also use top-k, nucleus, and beam search.ConclusionUnderstanding these methods is crucial for anyone working with language models, natural language processing, or text generation tasks. These techniques play a significant role in generating coherent and diverse sequences of text. Depending on the specific use case and desired outcomes, readers can choose the most appropriate method to employ. Overall, this knowledge can contribute to improving the quality of generated text and enhancing the capabilities of language models.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 25908

article-image-text-classification-with-transformers
Saeed Dehqan
28 Aug 2023
9 min read
Save for later

Text Classification with Transformers

Saeed Dehqan
28 Aug 2023
9 min read
IntroductionThis blog aims to implement binary text classification using a transformer architecture. If you're new to transformers, the "Transformer Building Blocks" blog explains the architecture and its text generation implementation. Beyond text generation and translation, transformers serve classification, sentiment analysis, and speech recognition. The transformer model comprises two parts: an encoder and a decoder. The encoder extracts features, while the decoder processes them. Just as a painter with tree features can draw, describe, visualize, categorize, or write about a tree, transformers encode knowledge (encoder) and apply it (decoder). This dual-part process is pivotal for text classification with transformers, allowing them to excel in diverse tasks like sentiment analysis, illustrating their transformative role in NLP.Deep Dive into Text Classification with TransformersWe train the model on the IMDB dataset. The dataset is ready and there’s no preprocessing needed. The model is vocab-based instead of character-based so that the model can converge faster. I limited the dataset vocabs to the 20000 most frequent vocabs. I also reduced the sequence to 200 so we can train faster. I tried to simplify the model and use torch.nn.MultiheadAttention it instead of writing the Multihead-attention ourselves. It makes the model faster since the nn.MultiheadAttention uses scaled_dot_product_attention under the hood. But if you want to know how MultiheadAttention works you can study the transformer building blocks blog or see the code here.Okay, now, let us add the feature extractor part:class transformer_block(nn.Module): def __init__(self):     super(block, self).__init__()     self.attention = nn.MultiheadAttention(embeds_size, num_heads, batch_first=True)     self.ffn = nn.Sequential(         nn.Linear(embeds_size, 4 * embeds_size),         nn.LeakyReLU(),         nn.Linear(4 * embeds_size, embeds_size),     )     self.drop1 = nn.Dropout(drop_prob)     self.drop2 = nn.Dropout(drop_prob)     self.ln1 = nn.LayerNorm(embeds_size, eps=1e-6)     self.ln2 = nn.LayerNorm(embeds_size, eps=1e-6) def forward(self, hidden_state):     attn, _ = self.attention(hidden_state, hidden_state, hidden_state, need_weights=False)     attn = self.drop1(attn)     out = self.ln1(hidden_state + attn)     observed = self.ffn(out)     observed = self.drop2(observed)     return self.ln2(out + observed)●    hidden_state: A tensor with a shape (batch_size, block_size, embeds_size) goes to the transformer_block and a tensor with the same shape goes out of it.●    self.attention: The transformer block tries to combine the information of tokens so that each token is aware of its neighbors or other tokens in the context. We may call this part the communication part. That’s what the nn.MultiheadAttention does. nn.MultiheadAttention is a ready multihead attention layer that can be faster than implementing it from scratch, just like what we did in the “Transformer Building Blocks” blog. The parameters of nn.MultiheadAttention are as follows:     ○    embeds_size: token embedding size     ○    num_heads: multihead, as the name suggests, consists of multiple heads and each head works on different parts of token embeddings. Suppose, your input data has shape (B,T,C) = (10, 32, 16). The token embedding size for this data is 16. If we specify the num_heads parameter to 2(divisible by 16), the multi-head splits data into two parts with shape (10, 32, 8). The first head works on the first part and the second head works on the second part. This is because transforming data into different subspaces can help the model to see different aspects of the data. Please note that the num_heads should be divisible by the embedding size so that at the end we can concatenate the split parts.    ○    batch_first: True means the first dimension is batch.●    Dropout: After the attention layer, the communication between tokens is closed and computations on tokens are done individually. We run a dropout on tokens. Dropout is a method of regularization. Regularization helps the training process to be based on generalization, not memorization. Without regularization, the model tries to memorize the training set and has poor performance on the test set. The dropout method turns off features with a probability of drop_prob.●    self.ln1: Layer normalization normalizes embeddings so that they have zero mean and standard deviation one.●    Residual connection: hidden_state + attn: Observe that before normalization, we added the input to the output of multihead attention, named residual connection. It has two benefits:   ○    It helps the model to have the unchanged embedding information.   ○    It helps to prevent gradient vanishing, which is common in deep networks where we stack multiple transformer layers.●    self.ffn: After dropout, residual connection, and normalization, we forward data into a simple non-linear neural network to adjust the tokens one by one for better representation.●    self.ln2(out + observed): Finally, another dropout, residual connection, and layer normalization.The transformer block is ready. And here is the final piece:class transformer(nn.Module): def __init__(self):     super(transformer, self).__init__()     self.tok_embs = nn.Embedding(vocab_size, embeds_size)     self.pos_embs = nn.Embedding(block_size, embeds_size)     self.block = block()     self.ln1 = nn.LayerNorm(embeds_size)     self.ln2 = nn.LayerNorm(embeds_size)     self.classifier_head = nn.Sequential(         nn.Linear(embeds_size, embeds_size),         nn.LeakyReLU(),         nn.Dropout(drop_prob),         nn.Linear(embeds_size, embeds_size),         nn.LeakyReLU(),         nn.Linear(embeds_size, num_classes),         nn.Softmax(dim=1),     )     print("number of parameters: %.2fM" % (self.num_params()/1e6,)) def num_params(self):     n_params = sum(p.numel() for p in self.parameters())     return n_params def forward(self, seq):     B,T = seq.shape     embedded = self.tok_embs(seq)     embedded = embedded + self.pos_embs(torch.arange(T, device=device))     output = self.block(embedded)     output = output.mean(dim=1)     output = self.classifier_head(output)     return output●    self.tok_embs: nn.Embedding is like a lookup table that receives a sequence of indices, and returns their corresponding embeddings. These embeddings will receive gradients so that the model can update them to make better predictions.●    self.tok_embs: To comprehend a sentence, you not only need words, you also need to have the order of words. Here, we embed positions and add them to the token embeddings. In this way, the model has both words and their order.●    self.block: In this model, we only use one transformer block, but you can stack more blocks to get better results.●    self.classifier_head: This is where we put the extracted information into action to classify the sequence. We call it the transformer head. It receives a fixed-size vector and classifies the sequence. The softmax as the final activation function returns a probability distribution for each class.●    self.tok_embs(seq): Given a sequence of indices (batch_size, block_size), it returns (batch_size, block_size, embeds_size).●    self.pos_embs(torch.arange(T, device=device)): Given a sequence of positions, i.e. [0,1,2], it returns embeddings of each position. Then, we add them to the token embeddings.●    self.block(embedded): The embedding goes to the transformer block to extract features. Given the embedded shape (batch_size, block_size, embeds_size), the output has the same shape (batch_size, block_size, embeds_size).●    output.mean(dim=1): The purpose of using mean is to aggregate the information from the sequence into a compact representation before feeding it into self.classifier_head. It helps in reducing the spatial dimensionality and extracting the most important features from the sequence. Given the input shape (batch_size, block_size, embeds_size), the output shape is (batch_size, embeds_size). So, one fixed-size vector for each batch.●    self.classifier_head(output): And here we classify.The final code can be found here. The remaining code consists of downstream tasks such as the training loop, loading the dataset, setting the hyperparameters, and optimizer. I used RMSprop instead of Adam and AdamW. I also used BCEWithLogitsLoss instead of cross-entropy loss. BCE(Binary Cross Entropy) is for binary classification models and it combines sigmoid with cross entropy and it is numerically more stable. I also empirically got better accuracy. After 30 epochs, the final accuracy is ~84%.ConclusionThis exploration of text classification using transformers reveals their revolutionary potential. Beyond text generation, transformers excel in sentiment analysis. The encoder-decoder model, analogous to a painter interpreting tree feature, propels efficient text classification. A streamlined practical approach and the meticulously crafted transformer block enhance the architecture's robustness. Through optimization methods and loss functions, the model is honed, yielding an empirically validated 84% accuracy after 30 epochs. This journey highlights transformers' disruptive impact on reshaping AI-driven language comprehension, fundamentally altering the landscape of Natural Language Processing.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 17395

article-image-designing-decoder-only-transformer-models-like-chatgpt
Saeed Dehqan
28 Aug 2023
9 min read
Save for later

Designing Decoder-only Transformer Models like ChatGPT

Saeed Dehqan
28 Aug 2023
9 min read
IntroductionEmbark on an enlightening journey into the ChatGPT stack, a remarkable feat in AI-driven language generation. Unveiling its evolution from inception to a proficient AI assistant, we delve into decoder-only transformers, specialized for crafting Shakespearean verses and informative responses.Throughout this exploration, we dissect the four integral stages that constitute the ChatGPT stack. From exhaustive pretraining to fine-tuned supervised training, we unravel how rewards and reinforcement learning refine response generation to align with context and user intent.In this blog, we will get acquainted briefly with the ChatGPT stack and then implement a simple decoder-only transformer to train on Shakespeare.Creating ChatGPT models consists of four main stages:1.    Pretraining:2.    Supervised Fine Tuning3.    Reward modeling4.    Reinforcement learningThe Pretraining stage takes most of the computational time since we train the language model on trillions of tokens. The following table shows the Data Mixtures used for pretraining of LLaMA Meta Models [0]:The datasets come and mix together, according to the sampling proportion, to create the pretraining data. The table shows the datasets along with their corresponding sampling proportion (What portion of the pre-trained data is the dataset?), epochs (How many times do we train the model on the corresponding datasets?), and dataset size. It is obvious that the epoch of high-quality datasets such as Wikipedia, and Books is high and as a result, the model grasps high-quality datasets better.After we have our dataset ready, the next step is Tokenization before training. Tokenizing data means mapping all the text data into a large list of integers. In language modeling repositories, we usually have two dictionaries for mapping tokens (a token is a sub word. Like ‘wait’, and ‘ing’ are two tokens.) into integers and vice versa. Here is an example:In [1]: text = "it is obvious that the epoch of high .." In [2]: tokens = list(set(text.split())) In [3]: stoi = {s:i for i,s in enumerate(tokens)} In [4]: itos = {i:s for s,i in stoi.items()} In [5]: stoi['it'] Out[5]: 22 In [6]: itos[22] Out[6]: 'it'Now, we can tokenize texts with the following functions:In [7]: encode = lambda text: [stoi[x] for x in text.split()] In [8]: decode = lambda encoded: ' '.join([itos[x] for x in encoded]) In [9]: tokenized = encode(text) In [10]: tokenized Out[10]: [22, 19, 18, 5, ...] In [11]: decode(tokenized) Out[11]: 'it is obvious that the epoch of high ..'Suppose the tokenized variable contains all the tokens converted to integers (say 1 billion tokens). We select 3 chunks of the list randomly that each chunk contains 10 tokens and feed-forward them into a transformer language model to predict the next token. The model’s input has a shape (3, 10), here 3 is batch size and 5 is context length. The model tries to predict the next token for each chunk independently. We select 3 chunks and predict the next token for each chunk to speed up the training process. It is like running the model on 3 chunks of data at once. You can increase the batch size and context length depending on the requirements and resources. Here’s an example:For convenience, we wrote the token indices along with the corresponding tokens. For each chunk or sequence, the model predicts the whole sequence. Let’s see how this works:By seeing the first token (it), the model predicts the next token (is). The context token(s) is ‘it’ and the target token for the model is ‘is’. If the model fails to predict the target token, we do backpropagation to adjust model parameters so the model can predict correctly.During the process, we mask out or hide the future tokens so that the model can’t have access to the future tokens. Because it is kind of cheating. We want the model itself to predict the future by only seeing the past tokens. That makes sense, right? That’s why we used a gray background for future tokens, which means the model is not able to see them.After predicting the second token, we have two tokens [it, is] as context to predict what token comes next in the sequence. Here is the third token (obvious).By using the three previous tokens [it, is, obvious], the model needs to predict the fourth token (that). And as usual, we hide the future tokens (in this case ‘the’).We give [it, is, obvious, that] to the model as the context in order to predict ‘the’. And finally, we give all the sequence as context [it, is, obvious, that, the] to predict the next token.We have five predictions for a sequence with a length of five.After training the model on a lot of randomly selected sequences from the pre-trained dataset, the model should be ready to autocomplete your sequence. Give it a sequence of tokens, and then, it predicts the next token and based on what was predicted plus previous tokens, the model predicts the next tokens one by one. We call it an autoregressive model. That’s it.But, at this stage, the model is not an AI assistant or a chatbot. It only receives a sequence and tries to complete the sequence. That’s how we trained the model. We don’t train it to answer questions and listen to the instructions. We give it context tokens and the model tries to predict the next token based on the context.You give it this:“In order to be irrational, you first need to”And the model continues the sequence:“In order to be irrational, you first need to abandon logical reasoning and disregard factual evidence.”Sometimes, you ask it an instruction:“Write a function to count from 1 to 100.”And instead of trying to write a function, the model answers with more similar instructions:“Write a program to sort an array of integers in ascending order.”“Write a script to calculate the factorial of a given number.”“Write a method to validate a user's input and ensure it meets the specified criteria.”“Write a function to check if a string is a palindrome or not.”That’s where prompt engineering came in. People tried to use some tricks to get the answer to a question out of the model.Give the model the following prompt:“London is the capital of England.Copenhagen is the capital of Denmark.Oslo is the capital of”The model answers like this:“Norway.”So, we managed to get something helpful out of it with prompt engineering. But we don’t want to provide examples every time. We want to ask it a question and receive an answer. To prepare the model to be an AI assistant, we need further training named Supervised Fine Tuning for instructional purposes.In the Supervised Fine-Tuning stage, we make the model instructional. To achieve this goal the model needs training on a high quality 15k-100K of prompt and response dataset. Here’s an example of it: { "instruction": "When was the last flight of Concorde?", "context": "", "response": "On 26 November 2003", "category": "open_qa" }This example was taken from the databricks-dolly-15k dataset that is an open-source dataset for Supervised/Instruction Fine Tuning[1]. You can download the dataset from here. Instructions have seven categorizations including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This is because we want to train the model in different tasks. For instance, the above instruction is open QA, meaning the question is a general one and does not require reasoning abilities. It teaches the model to answer general questions. Closed QA requires reasoning abilities. During Instruction fine-tuning, nothing will change algorithmically. We do the same process as the previous stage (Pretraining). We gave instructions as context tokens and we want the model to continue the sequence with response.We continue this process for thousands of examples and then, the model is ready to be instructional. But that’s not the end of the story of the model behind ChatGPT. OpenAI designed a supervised reward modeling that returns a reward for the sequences that were made by the base model for the same input prompt. They give the model a prompt and run the model four times, for instance, to have four different answers for the same prompt. The model produces different answers each time because of the sampling method they use. Then, the reward model receives the input prompt and the produced answers to get a reward score for each answer. The better the answer, the better the reward score is. The model requires ground-truth scores to be trained and these scores came from labelers who worked for OpenAI. Labelers were given prompt text and model responses and they ranked them from the best to the worst.At the final stage, the ChatGPT uses Reinforcement Learning with Human Feedback (RLHF) to generate responses that get the best scores from the rewarding model. RL is an architecture that tries to find the best way of achieving a goal. The goal can be checkmate in chess or creating the best answer for the input prompt. The RL learning process is like doing an action and getting a reward or penalty for the action. And we do not take actions that end up penalizing. RLHF is what made ChatGPT so good:The PPO-ptx shows the win rate of GPT + RLHF compared to SFT (Supervised Fine-Tuned model), GPT with prompt engineering, and GPT base.ConclusionIn summation, the ChatGPT stack exemplifies AI's potent fusion with language generation. From inception to proficient AI assistant, we've traversed core stages – pretraining, fine-tuning, and reinforcement learning. Decoder-only transformers have enlivened Shakespearean text and insights.Tokenization's role in enabling ChatGPT's prowess concludes our journey. This AI evolution showcases technology's synergy with creative text generation.ChatGPT's ascent highlights AI's potential to emulate human-like language understanding. With ongoing refinement, the future promises versatile conversational AI that bridges artificial intelligence and language's artistry, fostering human-AI understanding.Author BioSaeed Dehqan trains language models from scratch. Currently, his work is centered around Language Models for text generation, and he possesses a strong understanding of the underlying concepts of neural networks. He is proficient in using optimizers such as genetic algorithms to fine-tune network hyperparameters and has experience with neural architecture search (NAS) by using reinforcement learning (RL). He implements models starting from data gathering to monitoring, and deployment on mobile, web, cloud, etc. 
Read more
  • 0
  • 0
  • 11874
article-image-generative-fill-with-adobe-firefly-part-ii
Joseph Labrecque
24 Aug 2023
9 min read
Save for later

Generative Fill with Adobe Firefly (Part II)

Joseph Labrecque
24 Aug 2023
9 min read
Adobe Firefly OverviewAdobe Firefly is a new set of generative AI tools which can be accessed via https://firefly.adobe.com/ by anyone with an Adobe ID. To learn more about Firefly… have a look at their FAQ.  Image 1: Adobe FireflyFor more information about the usage of Firefly to generate images, text effects, and more… have a look at the previous articles in this series:      Animating Adobe Firefly Content with Adobe Animate      Exploring Text to Image with Adobe Firefly      Generating Text Effects with Adobe Firefly       Adobe Firefly Feature Deep Dive       Generative Fill with Adobe Firefly (Part I)This is the conclusion of a two-part article. You can catch up by reading Generative Fill with Adobe Firefly (Part I). In this article, we’ll continue our exploration of Firefly with the Generative fill module by looking at how to use the Insert and Replace features… and more.Generative Fill – Part I RecapIn part I of our Firefly Generative fill exploration, we uploaded a photograph of a cat, Poe, to the AI and began working with the various tools to remove the background and replace it with prompt-based generative AI content.Image 2: The original photograph of PoeNote that the original photograph includes a set of electric outlets exposed within the wall. When we remove the background, Firefly recognizes that these objects are distinct from the general background and so retains them.Image 3: A set of backgrounds is generated for us to choose fromYou can select any of the four variations that were generated from the set of preview thumbnails beneath the photograph.Again, if you’d like to view these processes in detail – check out Generative Fill with Adobe Firefly (Part I).Insert and Replace with Generative FillWe covered generating a background for our image in part I of this article. Now we will focus on other aspects of Firefly Generative fill, including the Remove and Insert tools.Consider the image above and note that the original photograph included a set of electric outlets exposed within the wall. When we removed the background in part I, Firefly recognized that they were distinct from the general background and so retained them. The AI has taken them into account when generating the new background… but we should remove them.This is where the Remove tool comes into play.Image 4: The Remove toolSwitching to the Remove tool will allow you to brush over an area of the photograph you’d like to remove. It fills in the removed area with pixels generated by the AI to create seamless removal.1.               Select the Remove tool now. Note that when switching between the Insert and Remove tools, you will often encounter a save prompt as seen below. If there are no changes to save, this prompt will not appear!Image 5: When you switch tools… you may be asked to save your work2.               Simply click the Save button to continue – as choosing the Cancel button will halt the tool selection.3.               With the Remove tool selected, you can adjust the Brush Settings from the toolbar below the image, at the bottom of the screen.Image 6: The Brush Settings overlay4.               Zoom in closer to the wall outlet and brush over the area by clicking and dragging with your mouse. The size of your brush, depending upon brush settings, will appear as a circular outline. You can change the size of the brush by tapping the [ or] keys on your keyboard.Image 7: Brushing over the wall outlet with the Remove tool5.               Once you are happy with the selection you’ve made, click the Remove button within the toolbar at the bottom of the screen.Image 8: The Remove button appears within the toolbar6.               The Firefly AI uses Generative fill to replace the brushed-over area with new content based upon the surrounding pixels. A set of four variations appears below the photograph. Click on each one to preview – as they can vary quite a bit.Image 9: Selecting a fill variant7.               Klick the Keep button in the toolbar to save your selection and continue editing. Remember – if you attempt to switch tools before saving… Firefly will prompt you to save your edits via a small overlay prompt.The outlet has now been removed and the wall is all patched up.Aside from the removal of objects through Generative fill, we can also perform insertions based on text prompts. Let’s add some additional elements to our photograph using these methods.  1.               Select the Insert tool from the left-hand toolbar.2.               Use it in a similar way as we did the Remove tool to brush in a selection of the image. In this case, we’ll add a crown to Poe’s head – so brush in an area that contains the top of his head and some space above it. Try and visualize a crown shape as you do this.3.               In the prompt input that appears beneath the photograph, type in a descriptive text prompt similar to the following: “regal crown with many jewels”Image 10: A selection is made, and a text prompt inserted4.               Click the Generate button to have the Firefly AI perform a Generative fill insertion based upon our text prompt as part of the selected area.Image 11: Poe is a regal cat5.               A crown is generated in accordance with our text prompt and the surrounding area. A set of four variations to choose from appears as well. Note how integrated they appear against the original photographic content.6.               Click the Keep button to commit and save your crown selection.7.               Let’s add a scepter as well. Brush the general form of a scepter across Poe’s body extending from his paws to his shoulder.8.               Type in the text prompt: “royal scepter”Image 12: Brushing in a scepter shape9.               Click the Generate button to have the Firefly AI perform a Generative fill insertion based upon our text prompt as part of the selected area.Image 13: Poe now holds a regal scepter in addition to his crown10.            Remember to choose a scepter variant and click the Keep button to commit and save your scepter selection.Okay! That should be enough regalia to satisfy Poe. Let’s download our creation for distribution or use in other software.Downloading your ImageClick the Download button in the upper right of the screen to begin the download process for your image.Image 14: The Download buttonAs Firefly begins preparing the image for download, a small overlay dialog appears.Image 15: Content credentials are applied to the image as it is downloadedFirefly applies metadata to any generated image in the form of content credentials and the image download process begins.Once the image is downloaded, it can be viewed and shared just like any other image file.Image 16: The final image from our exploration of Generative fillAlong with content credentials, a small badge is placed upon the lower right of the image which visually identifies the image as having been produced with Adobe Firefly.That concludes our set of articles on using Generative fill to remove and insert objects into your images using the Adobe Firefly AI. We have a number of additional articles on Firefly procedures on the way… including Generative recolor for vector artwork!Author BioJoseph Labrecque is a Teaching Assistant Professor, Instructor of Technology, University of Colorado Boulder / Adobe Education Leader / Partner by DesignJoseph is a creative developer, designer, and educator with nearly two decades of experience creating expressive web, desktop, and mobile solutions. He joined the University of Colorado Boulder College of Media, Communication, and Information as faculty with the Department of Advertising, Public Relations, and Media Design in Autumn 2019. His teaching focuses on creative software, digital workflows, user interaction, and design principles and concepts. Before joining the faculty at CU Boulder, he was associated with the University of Denver as adjunct faculty and as a senior interactive software engineer, user interface developer, and digital media designer.Labrecque has authored a number of books and video course publications on design and development technologies, tools, and concepts through publishers which include LinkedIn Learning (Lynda.com), Peachpit Press, and Adobe. He has spoken at large design and technology conferences such as Adobe MAX and for a variety of smaller creative communities. He is also the founder of Fractured Vision Media, LLC, a digital media production studio and distribution vehicle for a variety of creative works.Joseph is an Adobe Education Leader and member of Adobe Partners by Design. He holds a bachelor’s degree in communication from Worcester State University and a master’s degree in digital media studies from the University of Denver.Author of the book: Mastering Adobe Animate 2023
Read more
  • 0
  • 0
  • 11558

article-image-generative-fill-with-adobe-firefly-part-i
Joseph Labrecque
24 Aug 2023
8 min read
Save for later

Generative Fill with Adobe Firefly (Part I)

Joseph Labrecque
24 Aug 2023
8 min read
Adobe Firefly AI Overview Adobe Firefly is a new set of generative AI tools that can be accessed via https://firefly.adobe.com/ by anyone with an Adobe ID. To learn more about Firefly… have a look at their FAQ.    Image 1: Adobe Firefly For more information about the usage of Firefly to generate images, text effects, and more… have a look at the previous articles in this series:  Animating Adobe Firefly Content with Adobe Animate  Exploring Text to Image with Adobe Firefly  Generating Text Effects with Adobe Firefly  Adobe Firefly Feature Deep Dive In the next two articles, we’ll continue our exploration of Firefly with the Generative fill module. We’ll begin with an overview of accessing Generative fill from a generated image and then explore how to use the module on our own personal images.  Recall from a previous article Exploring Text to Image with Adobe Firefly that when you hover your mouse cursor over a generated image – overlay controls will appear.  Image 2: Generative fill overlay control from Text to image  One of the controls in the upper right of the image frame will invoke the Generative fill module and pass the generated image into that view.   Image 3: The generated image is sent to the Generative fill module Within the Generative fill module, you can use any of the tools and workflows that are available when invoking Generative fill from the Firefly website. The only difference is that you are passing in a generated image rather than uploading an image from your local hard drive.  Keep this in mind as we continue to explore the basics of Generative fill in Firefly – as we’ll begin the process from scratch. Generative Fill When you first enter the Firefly web experience, you will be presented with the various workflows available.  These appear as UI cards and present a sample image, the name of the procedure, a procedure description, and either a button to begin the process or a label stating that it is “in exploration”. Those which are in exploration are not yet available to general users. We want to locate the Generative fill module and click the Generate button to enter the experience.   Image 4: The Generative fill module card From there, you’ll be taken to a view that prompts you to upload an image into the module. Firefly also presents a set of sample images you can load into the experience.    Image 5: Generative fill getting started promptly Clicking the Upload image button summons a file browser for you to locate the file you want to use Generative fill on. In my example, I’ll be using a photograph of my cat, Poe. You can download the photograph of Poe [[ NOTE – LINK TO FILE Poe.jpg ]] to work with as well.   Image 6: The photograph of Poe, a cat Once the image file has been uploaded into Firefly, you will be taken to the Generative fill user experience and the photograph will be visible. Note that this is exactly the same experience as when entering Generative fill from a prompt-generated image as we saw above. The only real difference is how we get to this point.   Image 7: The photograph is loaded into Generative fill You will note that there are two sets of tools available within the experience. One set is along the left side of the screen and includes Insert, Remove, and Pan tools.   Image 8: Insert, Remove, and Pan Switching between the Insert and Remove tools changes the function of the current process. The Pan tool allows you to pan the image around the view.  Along the bottom of the screen is the second set of tools – which are focused on selections. This set contains the Add and Subtract tools, access to Brush Settings, a Background removal process, and a selection Invert toggle.   Image 9: Add, Subtract, Brush Settings, Background removal, and selection Invert Let’s perform some Generative fill work on the photograph of Poe.  In the larger overlay along the bottom of the view, locate and click the Background option. This is an automated process that will detect and remove the background from the image loaded into Firefly.   Image 10: The background is removed from the selected photograph 2. A prompt input appears directly beneath the photograph. Type in the following prompt: “a quiet jungle at night with lots of mist and moonlight”  Image 11: Entering a prompt into the prompt input control 3. If desired, you can view and adjust the settings for the generative AI by clicking the Settings icon in the prompt input control. This summons the Settings overlay.  Image 12: The generative AI Settings overlay Within the Settings overlay, you will find there are three items that can be adjusted to influence the AI:  Match shape: You have two choices here – freeform or conform.  Preserve content: A slider that can be set to include more of the original content or produce new content. Guidance strength: A slider that can be set to provide more strength to the original image or the given prompt. I suggest leaving these at the default setting for now. 4. Click the Settings icon again to dismiss the overlay. 5. Click the Generate button to generate a background based upon the entered prompt. A new background is generated from our prompt, and it now appears as though Poe is visiting a lush jungle at night.   Image 13: Poe enjoying the jungle at night Note that the original photograph included a set of electric outlets exposed within the wall. When we removed the background, Firefly recognized that they were distinct from the general background and so retained them. The AI has taken them into account when generating the new background and has interestingly propped them up with a couple of sticks. It also has gone through and rendered a realistic shadow cast by Poe.  Before moving on, click the Cancel button to bring the transparent background back. Clicking the Keep button will commit the changes – and we do not want that as we wish to continue exploring other options. Clear out the prompt you previously wrote within the prompt input control so that there is no longer any prompt present.   Image 14: Click the Generate button with no prompt present 3. Click the Generate button without a text prompt in place. The photograph receives a different background from the one generated with a text prompt. When clicking the Generate button with no text prompt, you are basically allowing the Firefly AI to make all the decisions based solely on the visual properties of the image.   Image 15: A set of backgrounds is generated based on the remaining pixels present You can select any of the four variations that were generated from the set of preview thumbnails beneath the photograph. If you’d like Firefly to generate more variations – click the More button. Select the one you like best and click the Keep button. Okay! That’s pretty good but we are not done with Generative fill yet. We haven’t even touched the Insert and Remove functions… and there are Brush Settings to manipulate… and much more. In the next article, we’ll explore the remaining Generative fill tools and options to further manipulate the photograph of Poe.  Author BioJoseph Labrecque is a Teaching Assistant Professor, Instructor of Technology, University of Colorado Boulder / Adobe Education Leader / Partner by DesignJoseph is a creative developer, designer, and educator with nearly two decades of experience creating expressive web, desktop, and mobile solutions. He joined the University of Colorado Boulder College of Media, Communication, and Information as faculty with the Department of Advertising, Public Relations, and Media Design in Autumn 2019. His teaching focuses on creative software, digital workflows, user interaction, and design principles and concepts. Before joining the faculty at CU Boulder, he was associated with the University of Denver as adjunct faculty and as a senior interactive software engineer, user interface developer, and digital media designer.Labrecque has authored a number of books and video course publications on design and development technologies, tools, and concepts through publishers which include LinkedIn Learning (Lynda.com), Peachpit Press, and Adobe. He has spoken at large design and technology conferences such as Adobe MAX and for a variety of smaller creative communities. He is also the founder of Fractured Vision Media, LLC; a digital media production studio and distribution vehicle for a variety of creative works.Joseph is an Adobe Education Leader and member of Adobe Partners by Design. He holds a bachelor’s degree in communication from Worcester State University and a master’s degree in digital media studies from the University of Denver.Author of the book: Mastering Adobe Animate 2023
Read more
  • 0
  • 0
  • 13776
Modal Close icon
Modal Close icon