How-To Tutorials

article-image-ai-distilled-23-apples-gen-ai-nvidias-eureka-ai-agent-qualcomms-snapdragon-elite-x-chips-dalle-3-in-chatgpt-plus-pytorch-edges-executorch-rl-with-cloud-tpus

27 Oct 2023

12 min read

AI_Distilled #23: Apple’s Gen AI, Nvidia's Eureka AI Agent, Qualcomm’s Snapdragon Elite X chips, DALL·E 3 in ChatGPT Plus, PyTorch Edge’s ExecuTorch, RL with Cloud TPUs

27 Oct 2023

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello ,Welcome to another scintillating edition of AI_Distilled, featuring recent advancements in training and fine-tuning LLMs, GPT and AI models for enhanced business outcomes. Let’s get started with this week’s news and analysis with an industry expert’s opinion. “For me, the biggest opportunity we have is AI. Just like the cloud transformed every software category, we think AI is one such transformational shift. Whether it's in search or our Office software.” - Satya Nadella, CEO, Microsoft. AI is indeed the biggest opportunity for mankind, a paradigm shift that can fundamentally redefine everything we know across industries. Recent reports suggest Apple’s deployment of cloud-based and on-device edge AI in iPhones and iPads in 2024. Qualcomm’s newly unveiled Snapdragon Elite X chips will find use in Microsoft Windows “AI PCs” for AI acceleration of tasks ranging from email summarization to image creation. It’s remarkable how AI has disrupted even PC environments for everyday users. This week, we’ve brought you industry developments including DALL·E 3 unveiling for ChatGPT Plus and Enterprise users, Universal Music Group suing Anthropic over copyrighted lyrics distribution, OpenAI in talks for $86 billion valuation, surpassing leading tech firms, and Mojo SDK’s availability for Macs, unleashing AI power on Apple Silicon. Look out for our curated collection of AI secret knowledge and tutorials on PyTorch Edge unveiling ExecuTorch for on-device inference, scaling reinforcement learning with cloud TPUs, building an IoT sensor network with AWS IoT Core and Amazon DocumentDB, and deploying embedding models with Hugging Face inference endpoints. 📥 Feedback on the Weekly EditionWhat do you think of this issue and our newsletter?Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free!Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content! Cheers, Merlyn Shelley Editor-in-Chief, Packt SignUp | Advertise | Archives⚡ TechWave: AI/GPT News & Analysis👉 Apple Aims to Introduce Generative AI to iPhone and iPad in Late 2024: Tech analyst Jeff Pu suggests that Apple is planning to integrate generative AI into its devices, beginning as early as late 2024. Apple is expected to deploy a combination of cloud-based and on-device edge AI. This move is aimed at letting users automate complex tasks and enhance Siri's capabilities, possibly starting with iOS 18. Apple remains cautious about privacy and responsible use of AI, acknowledging potential biases and hallucinations. 👉 DALL·E 3 Unveiled for ChatGPT Plus and Enterprise Users: OpenAI has introduced DALL·E 3 in ChatGPT, offering advanced image generation capabilities for Plus and Enterprise users. This feature allows users to describe their desired images, and DALL·E 3 creates a selection of visuals for them to refine and iterate upon within the chat. OpenAI has incorporated safety measures to prevent the generation of harmful content. Moreover, they are researching a provenance classifier to identify AI-generated images. 👉 Universal Music Group Sues AI Company Anthropic Over Copyrighted Lyrics Distribution: Universal Music Group and music publishers have filed a lawsuit against Anthropic for distributing copyrighted lyrics through its AI model Claude 2. The complaint alleges that Claude 2 can generate lyrics closely resembling copyrighted songs without proper licensing, even when not explicitly prompted to do so. The music publishers claim that while other lyric distribution platforms pay to license lyrics, Anthropic omits essential copyright management information. 👉 Nvidia's Eureka AI Agent, Powered by GPT-4, Teaches Robots Complex Skills: Nvidia Research has introduced Eureka, an AI agent driven by GPT-4 from OpenAI, capable of autonomously training robots in intricate tasks. Eureka can independently craft reward algorithms and has successfully instructed robots in various activities, including pen-spinning tricks and opening drawers. It also published the Eureka library of AI algorithms, allowing experimentation with Nvidia Isaac Gym. This innovative work leverages the potential of LLMs and Nvidia's GPU-accelerated simulation technologies, marking a significant step in advancing reinforcement learning methods. 👉 OpenAI in Talks for $86 Billion Valuation, Surpassing Leading Tech Firms: OpenAI, the company responsible for ChatGPT, is reportedly in discussions to offer its employees' shares at an astounding $86 billion valuation, surpassing tech giants like Stripe and Shein. This tender offer is in negotiation with potential investors, although final terms remain unconfirmed. With Microsoft holding a 49% stake, OpenAI is on its way to achieving an annual revenue of $1 billion. If this valuation holds, it would place OpenAI among the ranks of SpaceX and ByteDance, becoming one of the most valuable privately held firms globally. 👉 Mojo SDK Now Available for Mac: Unleashing AI Power on Apple Silicon: The Mojo SDK, which has seen considerable success on Linux systems, is now accessible for Mac users, specifically Apple Silicon devices. This development comes in response to user feedback and demand. The blog post outlines the steps for Mac users to get started with the Mojo SDK. Additionally, there's a Visual Studio Code extension for Mojo, offering a seamless development experience. The Mojo SDK's remarkable speed and performance on Mac, taking full advantage of hardware capabilities, is highlighted. 👉 Qualcomm Reveals Snapdragon Elite X Chip for AI-Enhanced Laptops: Qualcomm introduced the Snapdragon Elite X chip for Windows laptops, optimized for AI tasks like email summarization and text generation. Google, Meta, and Microsoft plan to use these features in their devices, envisioning a new era of "AI PCs." Qualcomm aims to rival Apple's chips, claiming superior performance and energy efficiency. With the ability to handle AI models with 13 billion parameters, this chip appeals to creators and businesses seeking AI capabilities. 🔮 Expert Insights from Packt Community Deep Learning with TensorFlow and Keras - Third Edition - By Amita Kapoor, Antonio Gulli, Sujit Pal Prediction using linear regression Linear regression is one of the most widely known modeling techniques. Existing for more than 200 years, it has been explored from almost all possible angles. Linear regression assumes a linear relationship between the input variable (X) and the output variable (Y). If we consider only one independent variable and one dependent variable, what we get is a simple linear regression. Consider the case of house price prediction, defined in the preceding section; the area of the house (A) is the independent variable, and the price (Y) of the house is the dependent variable. We import the necessary modules. It is a simple example, so we’ll be using only NumPy, pandas, and Matplotlib: import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import pandas as pd Next, we generate random data with a linear relationship. To make it more realistic, we also add a random noise element. You can see the two variables (the cause, area, and the effect, price) follow a positive linear dependence: #Generate a random data np.random.seed(0) area = 2.5 * np.random.randn(100) + 25 price = 25 * area + 5 + np.random.randint(20,50, size = len(area)) data = np.array([area, price]) data = pd.DataFrame(data = data.T, columns=['area','price']) plt.scatter(data['area'], data['price']) plt.show() Now, we calculate the two regression coefficients using the equations we defined. You can see the result is very much near the linear relationship we have simulated: W = sum(price*(area-np.mean(area))) / sum((area-np.mean(area))**2) b = np.mean(price) - W*np.mean(area) print("The regression coefficients are", W,b) ----------------------------------------------- The regression coefficients are 24.815544052284988 43.4989785533412 Let us now try predicting the new prices using the obtained weight and bias values: y_pred = W * area + b Next, we plot the predicted prices along with the actual price. You can see that predicted prices follow a linear relationship with the area: plt.plot(area, y_pred, color='red',label="Predicted Price") plt.scatter(data['area'], data['price'], label="Training Data") plt.xlabel("Area") plt.ylabel("Price") plt.legend() This content is from the book “Deep Learning with TensorFlow and Keras - Third Edition” by Amita Kapoor, Antonio Gulli, Sujit Pal (Oct 2022). Start reading a free chapter or access the entire Packt digital library free for 7 days by signing up now. To learn more, click on the button below.Read through the Chapter 1 unlocked here... 🌟 Secret Knowledge: AI/LLM Resources📀 The Advantages of Small LLMs: Smaller LLMs are easier to debug and don't require specialized hardware, which is crucial in today's chip-demanding market. They are cost-effective to run, expanding their applicability. Additionally, they exhibit lower latency, making them suitable for low-latency environments and edge computing. Deploying small LLMs is more straightforward, and they can even be ensembled for improved performance. 📀 PyTorch Edge Unveils ExecuTorch for On-Device Inference: The PyTorch Edge team has introduced ExecuTorch, a solution that empowers on-device inference on mobile and edge devices with the support of industry leaders like Arm, Apple, and Qualcomm Innovation Center. ExecuTorch aims to address the fragmentation in the on-device AI ecosystem by offering extension points for third-party integration to accelerate ML models on specialized hardware. 📀 AI-Boosted Software Development Journey: AI assistance simplifies design, code generation, debugging, and impact analysis, streamlining workflows and enhancing productivity. From idea to production, this post takes you through various stages of development, starting with collaborative design sessions aided by AI tools like Gmail's help me write and Google Lens. Duet AI for Google Cloud assists in code generation, error handling, and even test case creation. This AI assistance extends to operations, service health monitoring, and security. 📀 Scaling Reinforcement Learning with Cloud TPUs: Learn how Cloud TPUs are revolutionizing Reinforcement Learning by enhancing the training process for AI agents. This article explores the significant impact of TPUs on RL workloads, using the DeepPCB case as an example. Thanks to TPUs, DeepPCB achieved a remarkable 235x boost in throughput and a 90% reduction in training costs, significantly improving the quality of PCB routings. The Sebulba architecture, optimized for TPUs, is presented as a scalable solution for RL systems, offering reduced communication overhead, high parallelization, and improved scalability. 💡 Masterclass: AI/LLM Tutorials🎯 Building an IoT Sensor Network with AWS IoT Core and Amazon DocumentDB: Learn how to create an IoT sensor network solution for processing IoT sensor data via AWS IoT Core and storing it using Amazon DocumentDB (with MongoDB compatibility). This guide explores the dynamic nature of IoT data, making Amazon DocumentDB an ideal choice due to its support for flexible schemas and scalability for JSON workloads. 🎯 Building Conversational AI with Generative AI for Enhanced Employee Productivity: Learn how to develop a lifelike conversational AI agent using Google Cloud's generative AI capabilities. This AI agent can significantly improve employee productivity by helping them quickly find relevant information from internal and external sources. Leveraging Dialogflow and Google enterprise search, you can create a conversational AI experience that understands employee queries and provides them with precise answers. 🎯 A Step-by-Step Guide to Utilizing Feast for Enhanced Product Recommendations: In this comprehensive guide, you will learn how to leverage Feast, a powerful ML feature store, to build effective product recommendation systems. Feast simplifies the storage, management, and serving of features for machine learning models, making it a valuable tool for organizations. This step-by-step tutorial will walk you through configuring Feast with BigQuery and Cloud Bigtable, generating features, ingesting data, and retrieving both offline and online features. 🎯 Constructing a Mini GPT-Style Model from Scratch: In this tutorial, you’ll explore model architecture, demonstrating training and inference processes. Know the essential components, such as data processing, vocabulary construction, and data transformation functions. Key concepts covered include tokens, vocabulary, text sequences, and vocabulary indices. The article also introduces the Self-Attention module, a crucial component of transformer-based models. 🎯 Deploy Embedding Models with Hugging Face Inference Endpoints: In contrast to LLMs, embedding models are smaller and faster for inference, which is valuable for updating models or improving fine-tuning. The post guides you through deploying open-source embedding models on Hugging Face Inference Endpoints. It also covers running large-scale batch requests. Learn about the benefits of Inference Endpoints, Text Embeddings Inference, and how to deploy models efficiently. 🚀 HackHub: Trending AI Tools🔨 xlang-ai/OpenAgents: Open platform with Data, Plugins, and Web Agents for data analysis, versatile tool integration, and web browsing, featuring a user-friendly chat interface. 🔨 AI-Citizen/SolidGPT: Technology business boosting framework allowing developers to interact with their code repository, ask code-related questions, and discuss requirements. 🔨 SkalskiP/SoM: Unofficial implementation of Set-of-Mark (SoM) tools. Developers can use it by running Google Colab to work with this implementation, load images, and label objects of interest.🔨 zjunlp/factchd: Code for detecting fact-conflicting hallucinations in text for developers to evaluate factuality within text produced by LLMs, aiding in the detection of factual errors and enhancing credibility in text generation.

0
0
11200

article-image-evaluating-large-language-models

Vivekanandan Srinivasan

27 Oct 2023

8 min read

Evaluating Large Language Models

Vivekanandan Srinivasan

27 Oct 2023

8 min read

0
0
19498

article-image-vector-datastore-in-azure-machine-learning-promptflow

Karthik Narayanan Venkatesh

27 Oct 2023

10 min read

Vector Datastore in Azure Machine Learning Promptflow

Karthik Narayanan Venkatesh

27 Oct 2023

10 min read

0
0
14159

Chaitanya Yadav

26 Oct 2023

7 min read

ChatGPT for Excel

Chaitanya Yadav

26 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe ChatGPT chatbot from OpenAI is a large language model that can be used for text writing, translation, content creation, and answers to your questions in an informative way. It's still developing, but has learned to perform many tasks, such as helping with Excel. Using ChatGPT with Excel can be done in several ways. Access to ChatGPT on the OpenAI website is one way of doing this. Another way to do this would be by using a third-party add-in such as the ListenData ChatGPT for Excel Addin. Access to ChatGPT from Excel via this add-in will allow you to do that at any time.What can ChatGPT do for Excel users?ChatGPT can be used to help Excel users with a variety of tasks, including:Learning Excel concepts: It is possible to describe Excel concepts clearly and succinctly by using ChatGPT. It may be of use to newcomers as well as users with a lot of experience.Writing formulas and functions: In Excel, you can use the ChatGPT program to write sophisticated formulas and functions. It's also capable of explaining how the formulas and functions work.Analyzing data: Excel data analysis can be helped by ChatGPT. It's been able to identify trends, patterns, and outliers. Reports and charts may also be generated.Automating tasks: In Excel, you can use the ChatGPT program to perform tasks automatically. There's a lot of time and effort that can be saved.Best Practices for Using ChatGPT for ExcelBe clear and concise in your prompts: ChatGPT is very good at understanding natural language, but it is important to be as specific as possible in your requests. For example, instead of saying "Can you help me with this Excel spreadsheet?", you could say "Can you help me to write a formula to calculate the average sales for each product category?".Provide context: If you are asking ChatGPT to help you with a specific task, it is helpful to provide some context. For example, if you are asking ChatGPT to write a formula to calculate the average sales for each product category, you could provide a sample of your spreadsheet data.Break down complex tasks into smaller steps: If you have a complex task that you need help with, it is often helpful to break it down into smaller, more manageable steps.Be patient: ChatGPT is still under development, and it may not always be able to provide the perfect answer. If you are not satisfied with the response that you receive, try rephrasing your prompt or providing more context.Generating Formulas and FunctionsTo generate Excel formulas and functions, you can use ChatGPT. It may be useful when you have no idea how to create a particular formula or function, or if you need any assistance with the way formulas and functions work.You can create a function or formula with ChatGPT by simply typing the description of what you want it to do. For example, you have a spreadsheet with the following data:You want to generate a formula that will calculate the average daily sales growth rate for the five days, but excluding the weekend days (Saturday and Sunday).Steps:1. Go to ChatGPT and enter the following prompt:Write an Excel formula to calculate the average daily sales growth rate for the following data, but excluding the weekend days (Saturday and Sunday):2. ChatGPT will generate the following formula and steps:=IF(WEEKDAY(A4,2)=7,"",IF(WEEKDAY(A4,2)=1,"",(B4-B3)/B3*100))3. Copy and paste the formula into cell D3 of your Excel spreadsheet.4. Press Enter.5. The formula will calculate the average daily sales growth rate for the five days, excluding the weekend days, which is 20%.Explanation:The formula works by first checking the day of the week for the date in cell A3. If the day of the week is Saturday or Sunday, the formula returns a blank value. Otherwise, the formula calculates the difference in sales between the second and third days, divides it by the sales value in cell B2, and multiplies it by 100 to express the result as a percentage.Data Standardization, Conditional Formatting, and Dynamic Filtering in Excel with ChatGPT1. Data StandardizationWhile analyzing data during data analysis, data standardization plays an important role as the raw data that we might extract from resources may not be in a uniform way. So, we need to ask ChatGPT perfectly to make out data in a standardized manner.For Example:Question: “I have a dataset with names in highly varied formats (e.g., 'John Smith,' 'Smith, John,' 'Dr. Jane Doe'). How can I standardize them to 'First Name Last Name' in Excel while preserving titles and suffixes?"ChatGPT Response: The above image shows that once you apply the formula given by ChatGPT for your query, you will get the result in standardized form.2. Conditional FormattingA feature that enables Excel to automatically format cells according to their value or content is conditional formatting. You can look at any cells that contain a value and color code them according to the range in which they are valued, e.g. You can use any of the options available to make your data more attractive and comprehensible.For Example:Question: "I have a list of sales data in Excel, and I want to highlight cells where the sales are above $1,000 in green and below $500 in red. How can I set up conditional formatting for this?"ChatGPT Response: As you can see that once we perform the stepwise procedure given by ChatGPT, we will be successfully able to get the correct results.3. Data Sorting and FilteringData sorting and filtering are two powerful features in Excel that can help you organize and analyze your data more efficiently. Sorting allows you to arrange your data in a specific order, such as alphabetically, numerically, or by date. This can be useful for finding specific information or for identifying trends in your data. Filtering allows you to display only the data that meets certain criteria. For example, you could filter your data to show only the rows that contain a certain value in a certain column. This can be useful for focusing on the data that is most important to you or for identifying outliers.Question: "I have a large dataset in Excel, and I want to sort it by a specific column in ascending order and then apply a filter to show only rows where the value in column B is greater than 50. What's the code to do this?"ChatGPT Response: The code will display only rows where the value in column B is greater than 50, by sorting data with ascending values and filtering them.ConclusionIn conclusion, the integration of ChatGPT with Excel provides a valuable service to all users whether they are simply starting out and trying to learn Microsoft's concepts or experienced users that need assistance for specific tasks. The ChatGPT is able to help you with a variety of aspects of the use of Excel, such as making complex formulas, analyzing data, standardizing data for consistency, using configurable formatting, and automated tasks.In addition, a practical example of what ChatGPT can do for users to achieve Excel-related goals is given in the report on Data Standardization, Conditional Formatting, Data Sorting, and Filtering with ChatGPT. Overall, ChatGPT has proved to be an invaluable tool for Excel users that enables them to free up time, improve data analysis, and streamline their tasks in a more rapid and engaging way.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.

0
0
22488

article-image-efficient-data-caching-with-vector-datastore-for-llms

Karthik Narayanan Venkatesh

25 Oct 2023

9 min read

Efficient Data Caching with Vector Datastore for LLMs

Karthik Narayanan Venkatesh

25 Oct 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn the ever-evolving landscape of artificial intelligence, Large Language Models (LLMs) have taken center stage, transforming the way we interact with and understand vast amounts of textual data. With the proliferation of these advanced models, the need for efficient data management and retrieval has become paramount. Enter Vector Datastore, a game-changer in the realm of data caching for LLMs. This article explores how Vector Datastore's innovative approach, based on vector representations and similarity search, empowers LLMs to swiftly access and process data, revolutionizing their performance and capabilities.How does Vector datastore enable data cache for LLMs?With every online source you scan, you will come across terms like chatbots, LLMs, or GPT. Most people are speaking about the large language models, and we can see every new language model gets released every week.Before seeing how vector databases enable data caches for large language models, one must learn about them and their importance to the language models.Vector databases: What are they? Getting an idea of vector embeddings is essential to know about vector databases. It is a data representation that consists of semantic information, helping the artificial intelligence system better understand all the datasets. At the same time, it helps to maintain long-term memory. The most critical element is understanding and remembering, especially if you want to learn anything new.AI models usually generate embeddings. Every large language model consists of a variety of features. Due to this reason, it becomes difficult to manage their representations. With the help of embeddings, one could represent the various dimensions of the data. Therefore, the artificial intelligence models would understand the patterns, relationships, and hidden structures.In this scenario, the vector embeddings that use the traditional scalar-based databases could be challenging. It cannot keep up or handle the scale and complexity of various data. The complexities that often come with vector embeddings would require a specialized database. It is the reason why one would need vector databases. With the help of vector databases, one could get optimized storage and query capabilities of any unique structure presented by vector embeddings. As a result, the user would get high performance along with easy search capabilities, data retrieval, and scalability only by comparing the similarities and values of findings between one another.Though vector databases are difficult to implement, until now, various tech giants companies are not only developing them but also making them manageable. Since they are expensive to implement, one must ensure proper calibration to receive high performance. How it works?Taking the simple example of a link with a large language model like chat GPT, we know it consists of a large volume of data and content. At the same time, the user can only proceed with the chat GPT application. Being the user, one has to improve your queries in the application. Once you complete this step, the query gets inserted into the embedding model. It initiates the process of vector embeddings based on the content that requires indexing. After completing this process, the vector embeddings move into the vector databases. It usually occurs regarding the content that wants to be used for embedding.As a result, you will receive the outcome produced by vector databases. Therefore, the system sends it back to the user as a result.As a user continues making different queries, it goes through the same embedding model that helps create embeddings. It helps in processing the database query for similar—vector embeddings.Let us know the whole process in detail.A vector database incorporates diverse algorithms dedicated to facilitating Approximate Nearest Neighbor (ANN) searches. These algorithms encompass techniques such as hashing, graph-based search, and quantization, which are intricately combined into a structured pipeline for retrieving neighboring vectors concerning a queried input.The outcomes of this search operation are contingent upon the proximity or approximation of the retrieved vectors to the original query. Hence, the pivotal factors under consideration are accuracy and speed. A trade-off exists between the query output's speed and the results' precision; a slower output corresponds to a more accurate outcome.The process of querying a vector database unfolds in three fundamental stages: 1. IndexingA diverse array of algorithms comes into play upon the ingress of the vector embedding into the vector database. These algorithms serve the purpose of mapping the vector embedding onto specific data structures, thus optimizing the search process. This preparatory step significantly enhances the speed and efficiency of subsequent searches. 2. QueryingThe vector database systematically compares the queried vector and the previously indexed vectors. This comparison entails the application of a similarity metric, a crucial determinant in identifying the nearest neighbor amidst the indexed vectors. The precision and efficacy of this phase are paramount to the overall accuracy of the search results. 3. Post ProcessingUpon pinpointing the nearest neighbor, the vector database initiates a post-processing stage. The specifics of this stage may vary based on the particular vector database in use. Post-processing activities may involve refining the final output of the query, ensuring it aligns seamlessly with the user's requirements.Additionally, the vector database might undertake the task of re-ranking the nearest neighbors, a strategic move to enhance the database's future search capabilities. This meticulous post-processing step guarantees that the delivered results are accurate and optimized for subsequent reference, thereby elevating the overall utility of the vector database in complex search scenarios.Implementing vector data stores in LLM Let us consider an example to understand how a vector data store can be installed or implemented in a large language model. Before we can start with the implementation, one has to install the vector datastore library. pip install vectordatastore Assuming you have the data set containing the text snippets, you will get the following in code format. # Sample dataset dataset = { "1": "Text snippet 1", "2": "Text snippet 2", # ... more data points ... } # Initialize Vector Datastore from vectordatastore import VectorDatastore vector_datastore = VectorDatastore() # Index data into Vector Datastore for key, text in dataset.items(): vector_datastore.index(key, text) # Query Vector Datastore from LLM query = "Query text snippet" similar_texts = vector_datastore.query(query) # Process similar_texts in the LLM # ...In this example, one can see that the vector data store efficiently indexes the datasets while utilizing vector representations. When the large language model requires retrieving data similar to the query text, it often uses a vector datastore to obtain the relevant snippets quickly.Process of enabling data caches in LLMsVector Datastore enables efficient data caching for Large Language Models (LLMs) through its unique approach to handling data. Traditional caching mechanisms store data based on keys, and retrieving data involves matching these keys. However, LLMs often work with complex and high-dimensional data, such as text embeddings, which are not easily indexed or retrieved using traditional key-value pairs. Vector Datastore addresses this challenge by leveraging vector representations of data points.Process of how Vector Datastore enables data cache for LLMs 1. Vector Representation:Vector Datastore stores data points in vectorized form. Each data point, whether a text snippet or any other type of information, is transformed into a high-dimensional numerical vector. This vectorization process captures the semantic meaning and relationships between data points. 2. Similarity Search: Instead of relying on exact matches of keys, Vector Datastore performs similarity searches based on vector representations. When an LLM needs specific data, it translates its query into a vector representation using the same method employed during data storage. This query vector is then compared against the stored vectors using similarity metrics like cosine similarity or Euclidean distance. 3. Efficient Retrieval: By organizing data as vectors and employing similarity searches, Vector Datastore can quickly identify the most similar vectors to the query vector. This efficient retrieval mechanism allows LLMs to access relevant data points without scanning the entire dataset, significantly reducing the time required for data retrieval. 4. Adaptive Indexing: Vector Datastore dynamically adjusts its indexing strategy based on the data and queries it receives. As the dataset grows or the query patterns change, Vector Datastore adapts its indexing structures to maintain optimal search efficiency. This adaptability ensures the cache remains efficient even as the data and query patterns evolve. 5. Scalability: Vector Datastore is designed to handle large-scale datasets commonly encountered in LLM applications. Its architecture allows horizontal scaling, efficiently distributing the workload across multiple nodes or servers. This scalability ensures that Vector Datastore can accommodate the vast amount of data processed by LLMs without compromising performance.Vector Datastore's ability to work with vectorized data and perform similarity searches based on vector representations enables it to serve as an efficient data cache for Large Language Models. By avoiding the limitations of traditional key-based caching mechanisms, Vector Datastore significantly enhances the speed and responsiveness of LLMs, making it a valuable tool in natural language processing.ConclusionThe development of LLM is one of the crucial technological advancements of our time. Not only does it have the potential to revolutionize various aspects of our lives, but at the same time, it is imperative on our part to utilize them ethically and responsibly to retrieve all its benefits.Author BioKarthik Narayanan Venkatesh (aka Kaptain), founder of WisdomSchema, has multifaceted experience in the data analytics arena. He has been associated with the data analytics domain since the early 2000s, with a ringside view of transformations in this industry. He has led teams that architected and built scalable data platform solutions across the technology spectrum.As a niche consulting provider, he bridged the gap between business and technology and drove BI adoption through innovative approaches in an agnostic manner. He is a sought-after speaker who has presented many lectures on SAP, Analytics, Snowflake, AWS, and GCP technologies.

0
0
11036

article-image-detecting-and-mitigating-hallucinations-in-llms

Ryan Goodman

25 Oct 2023

10 min read

Detecting and Mitigating Hallucinations in LLMs

Ryan Goodman

25 Oct 2023

10 min read

0
0
15296

article-image-getting-started-with-ai-builder

Adeel Khan

23 Oct 2023

9 min read

Getting Started with AI Builder

Adeel Khan

23 Oct 2023

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Introduction AI is transforming the way businesses operate, enabling them to improve efficiency, reduce costs, and enhance customer satisfaction. However, building and deploying AI solutions can be challenging, even at times for pro developers due to the inherent complexity of traditional tools. That’s where Microsoft AI Builder comes in. AI Builder is a low-code AI platform that empowers pro developers to infuse AI into business workflows without writing a single line of code. AI Builder is integrated with Microsoft Power Platform, a suite of tools that allows users to build apps, automate processes, and analyze data. With AI Builder, users can leverage pre-built or custom AI models to enhance their Power Apps and Power Automate solutions. One of the most powerful features of AI Builder is the prediction model, which allows users to create AI models that can predict outcomes based on historical data. The prediction model can be used to predict the following outcomes, Binary outcome, choice between one value. An example would be booking status, canceled/redeemed. Multiple outcomes, a choice between multiple yet fixed outcomes. An example would be the Stage of Delivery, early/on-time/delayed/escalated. Numeric outcomes, a number value. An example would be revenue per customer. In this blog post, we will show you how to create and use a prediction model with AI Builder using our business data. We will focus on Numeric outcomes and use the example mentioned above, we will attempt to predict the possible revenue we can generate from customers in a lifetime. Let’s get started! Getting Data Ready The process of building a model begins with data. We will not cover the AI builder prerequisites in the chapter but you can easily find them at Microsoft learn. The data in focus is sample data of customer profiles from the retailer system. The data include basic profile details such as (education, marital status, customer since, kids at home, teens at home), interaction data (participation in the campaign), and transaction summary (purchases both online and offline, product categories) The data needs to be either imported in Dataverse or already existing. In this case, we will import the file “Customer_profile_sample.xls”. To import the data, the user should perform the following actions. 1. Open http://make.powerapps.com and log in to your power platform environment. 2. Select the right environment, and recommend performing these actions in a development environment. 3. From the left menu pan select table . 4. Now select the option upload from excel. This will start a data import process. Figure 1 Upload data in dataverse from excel file. 5. Upload the Excel file mentioned above “Customer_profile_sample.xls.” The system will read the file content and give a summary of the data in the file. Note if your environment has the copilot feature on, you will see a GPT in action where it will not only get the details of the file but also choose the table name and add descriptions to columns as well. Figure 2 Copilot in action with file Summary 6. Verify the details, make sure the table is selected as “Customer Profile” and the Primary column is “ID.” Once verified, click Create and let the system upload the data into this new table. The system will move you to the table view screen. Figure 3 Table View Screen 7. In this screen, lets click on Columns under the Schema section. This will take us to the column list. here we need to scroll down and find a column called “Revenue.” Right-click the column and select edit. Figure 4 Updating column information. 8. Let's check the feature searchable and save the changes. 9. We will move back all the way to the table list, by clicking on Table in the left navigation. Here we will select our “Customer Profile” table and choose Publish from the top menu. This will apply to the change made in step 8. We will wait till we see a green bar with the message “Publish completed.” This concludes our first part of getting the sample data imported. Creating a Model Now that we have our data ready and available in dataverse, let's start building our model. We will follow the next set of actions to deliver the model with this low code / no code tool. 1. The first step is to open AI Builder. To open AI Builder Studio, let go to http://make.powerapps.com. 2. From the left navigation, click on AI Models . This will open the AI model studio. 3. Select from the top navigation bar. There are many OOB models for various business use cases that developers can choose but this time we will select a prediction model from the options. Figure 5 Prediction Model Icon 4. The next pop-up screen will provide details about the prediction model feature and how it can used. Select to begin the model creation process. The model creation process is a step journey that we will explain one by one. 5. The first action is to select the historical outcome. Here we need to select the table we created in the above section “Customer Profile” and the column (Label) we want the model to predict, in this case, “revenue.” Figure 6 Step one - Historical Outcome Selection 6. The next step is the critical step in any classification model. it is called the feature selection. In this step, we will select the columns to make sure we provide enough information to our AI model so it can assess the impact and influence of these features and train itself. The table has now 33 columns (27 we imported from the sample file and 5 added as part of the dataverse process). We will select 27 columns again as the most important feature for this model. The ones we will not select are. Created On: it is a date column created by dataverse to track the record creation date. Not relevant in predicting revenue. ID: it is a numerical sequential number, again we can decide with confidence that it is not going to be relevant in predicting our label “revenue.” Record Created On: Dataverse added column. Revenue (base): a base currency value. UTC Conversion Time zone: Dataverse added column. Before moving to next step make sure that you can see 27 columns selected. Figure 7 Selecting Features / Columns 7. The next step is to choose the training data with business logic. If you would have noticed, our original imported data contains some rows where the revenue field is empty. Such data would not be helpful to train the model. Hence, we would like a model to train on rows that have revenue information available. We can do so by selecting “Filter the Data” and then adding the condition row as shown in the below figure. Figure 8 Selecting the right dataset. 8. Finally, we are our last step of verification, here will perform one last action before training the model, that is to give this model proper name. let's click on an icon to change the name of the model. We shall name the model “Prediction – Revenue.” Figure 9 Renaming the Model 9. Let’s click on and begin model training. Evaluation of model The ultimate step of any model creation is the assessment of the model. Once our model is ready and trained, the system will generate model performance details. These details can be accessed by clicking on the model from AI Studio. Let's evaluate and read into our model. Figure 10 Model Performance Summary PerformanceAI builder grade models based on model R-squared (goodness of fit). An R-squared value of 88% for a model means that 88% of the variation in revenue can be explained by the model’s inputs. The remaining 12% could be due to other factors not included in the model. For a set of information provided, it is a good start and, in some cases, an acceptable outcome as well. Most Influential data The model also explains the most influential feature to our outcome “revenue.” In this case, Monthly Wine purchase (MntWines) is the highest weighted and suggests the highest association with revenue an organization can make from a customer. These weights can trigger a lot of business ideation and improve business KPIs further. WarningsIn the detail section, you can also view the warnings the system has generated. In this case, it has identified a few columns that we intentionally selected in our earlier steps as having no association with revenue. This information can be used to further fine-tune and remove unnecessary features from our training and feature selection that were explained earlier. Figure 11 Warning Tab in Details ConclusionThis marks the completion of our model preparation. Once we are satisfied with the model performance, we can choose to Publish this model. The model then can be used either through Power Apps or Power Automate to predict the revenue and reflect in dataverse. This feature of AI Builder opens the door to so many possibilities and the ability to deliver it in a short duration of time makes it extremely useful. Keep experimenting and keep learning. Author BioMohammad Adeel Khan is a Senior Technical Specialist at Microsoft. A seasoned professional with over 19 years of experience with various technologies and digital transformation projects. At work , he engages with enterprise customers across geographies and helps them accelerate digital transformation using Microsoft Business Applications , Data, and AI solutions. In his spare time, he collaborates with like-minded and helps solve business problems for Nonprofit organizations using technology. Adeel is also known for his unique approach to learning and development. During the COVID-19 lockdown, he introduced his 10-year-old twins to Microsoft Learn. The twins not only developed their first Microsoft Power Platform app—an expense tracker—but also became one of the youngest twins to earn the Microsoft Power Platform certification.

0
0
13840

article-image-chatgpt-prompts-for-project-managers

Anshul Saxena

23 Oct 2023

10 min read

ChatGPT Prompts for Project Managers

Anshul Saxena

23 Oct 2023

10 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionStarting a project requires good tools to keep things running smoothly. That's where our guide, combining ChatGPT with PMBOK, comes in handy. We'll walk you through each step, from beginning your project to making detailed plans. With easy-to-use templates and clear examples, we aim to make things simpler for you. In short, our guide brings together the best of ChatGPT and PMBOK to help you manage projects better. Let's get started!First, let’s have a look at the steps defined under PMBOK for project management planningStep 1: Initiating the Project1. Objective: Set the foundation for your project.2. Actions: - 1.1 Identify the need or problem the project aims to address. - 1.2 Develop the Project Charter: - Define the project's purpose, objectives, and scope. - Identify primary stakeholders. - Outline initial budget estimates. - 1.3 Identify all stakeholders, including those who can influence or are impacted by the project.3. Outcome: A Project Charter that provides a high-level overview and project stakeholders are identified.Step 2: Planning the Project1. Objective: Develop a comprehensive roadmap for your project.2. Actions: - 2.1 Define success criteria. - 2.2 Detail the project's scope and boundaries. - 2.3 List out deliverables. - 2.4 Break down the project into tasks and set timelines. - 2.5 Create a budget, detailing estimated costs for tasks. - 2.6 Develop sub-plans such as: - Human Resource Plan - Quality Management Plan - Risk Management Plan - Procurement Management Plan - 2.7 Document change management procedures.Now let’s have a look at a generic template and an example for each step defined aboveStep 1.1: Initiating the ProjectGeneric Prompt: “As a project manager, I'm looking to address an underlying need or problem within [specific domain/area, e.g., 'our software development lifecycle']. Based on recent data, stakeholder feedback, market trends, and any other relevant information available in this domain, can you help identify the primary challenges or gaps that our project should target? The ultimate goal is to [desired outcome, e.g., 'improve efficiency and reduce bug counts']. Please provide a comprehensive analysis of potential problems and their implications."Prompt Example: “In our organization, managing vendors has become an increasingly complex task, with multiple touchpoints and communication channels. Given the crucial role vendors play in our supply chain and service delivery, there's an urgent need to streamline our vendor management processes. As a software solution is desired, can you help identify the primary requirements, challenges, and functionalities that our vendor management software should address? The primary objective is to enhance vendor communication, monitor performance metrics, ensure contract compliance, and facilitate swift issue resolution. Please provide a detailed analysis that can serve as a starting point for our software development."Response:Step 1.2: Develop the Project CharterGeneric Prompt: “For our objective of [specific domain or objective, e.g., 'customer relationship management'], draft a concise project charter. Address the phases of [list main stages/phases, e.g., 'identifying customer needs and feedback collection'], aiming to [primary goal, e.g., 'enhance customer satisfaction']. Given the importance of [contextual emphasis, e.g., 'customer relationships'], and involving stakeholders like [stakeholders involved, e.g., 'sales teams and customer support'], define a methodology that captures the essence of our goal."Prompt Example: "For our vendor management objective, draft a succinct project charter for a System Development Life Cycle (SDLC). The SDLC should cover phases from identifying vendor needs to termination or renewal processes, with an aim to enhance cost-efficiency and service reliability. Given our organization's growing dependency on vendors and the involvement of stakeholders like procurement and legal teams, outline a process that ensures optimal vendor relationship management."Response:2.1 Define success criteriaGeneric Prompt: "In light of the complexities in project management, having lucid success criteria is paramount. Can you delineate general success criteria pivotal for any project management initiative? This will gauge the project's triumph throughout its lifecycle, aligning with stakeholder aspirations and company objectives.Prompt Example: "Considering the intricacies of crafting vendor management software, establishing precise success criteria is crucial. To align the software with our goals and stakeholder demands, can you list and elaborate on success criteria tailored for this task? These standards will evaluate the software's efficiency throughout its phases, from design to updates. Supply a list specific to vendor management software, adaptable for future refinementsOutput:2.2 Detail the project's scope and boundariesGeneric Prompt: "Given the intricacies of today's projects, clear scope and boundaries are vital. Can you elucidate our project's scope, pinpointing its main objectives, deliverables, and focal areas? Additionally, specify what it won't encompass to avoid scope creep. Offer a clear outline demarcating the project's inclusions and exclusions, ensuring stakeholder clarity on its scope and constraintsPrompt Example: "In light of the complexities in vendor management software development, clear scope and boundaries are essential. Can you describe the scope of our software project, highlighting its main objectives, deliverables, and key features? Also, specify any functionalities it won't include to avert scope creep. Furnish a list that distinctly differentiates the software's capabilities from its exclusions, granting stakeholders a clear perspective."Output: 2.3 & 2.4: List out deliverables & Break down the project into tasks and set timelinesGeneric Prompt: “For our upcoming project, draft a clear roadmap. List the key deliverables encompassing objectives, functionalities, and related documentation. Then, dissect each deliverable into specific tasks and suggest timelines for each. Based on this, provide a structured breakdown suitable for a Gantt chart representation."Prompt Example: "For our vendor management software project, provide a succinct roadmap. Enumerate the key deliverables, encompassing software functionalities and associated documentation. Subsequently, dissect these deliverables into specific tasks, suggesting potential timelines. This breakdown should be structured to facilitate the creation of a Gantt chart for visual timeline representation."Output:2.5 Create a budget, detailing estimated costs for tasksGeneric Prompt: Can you draft a budgetary outline detailing the estimated costs associated with each major task and deliverable identified? This should consider potential costs for [list some generic cost categories, e.g., personnel, equipment, licenses, operational costs], and any other relevant expenditures. A clear financial breakdown will aid in the effective management of funds and ensure the project remains within its financial boundaries. Please provide a comprehensive budget plan suitable for [intended audience, e.g., stakeholders, team members, upper management]."Prompt Example: "Can you draft a budgetary outline detailing the estimated costs associated with each major task and deliverable identified in the project? This should include anticipated costs for personnel, software and hardware resources, licenses, testing, and any other potential expenditures. Remember, a clear financial breakdown will help in managing funds and ensuring the project remains within the set financial parameters. Please provide a comprehensive budget plan that can be presented to stakeholders for approval."Output:2.6 Develop sub-plans such asHuman Resource PlanQuality Management PlanRisk Management PlanProcurement Management PlanGeneric prompt: "In light of the requirements for comprehensive project management, it's crucial to have detailed sub-plans addressing specific areas. Could you assist in formulating a [specific sub-plan, e.g., 'Human Resource'] plan? This plan should outline the primary objectives, strategies, and actionable steps relevant to [specific domain, e.g., 'staffing and team development']. Additionally, consider potential challenges and mitigation strategies within this domain. Please provide a structured outline that can be adapted and refined based on the unique nuances of our project and stakeholder expectations."By replacing the placeholders (e.g., [specific sub-plan]) with the desired domain (Human Resource, Quality Management, etc.), this prompt can be tailored for various sub-plans.By filling in the "[specific project or objective]" placeholder with details pertaining to your specific project, this prompt layout can be tailored to various projects or initiatives.Have a glimpse at the output generated for various sub-plans in the context of the Vendor Management Software projectHuman Resource PlanQuality Management PlanRisk Management PlanProcurement Management Plan2.7 Document change management proceduresGeneric Prompt: “As a project manager, outline a Document Change Management procedure for a project. Ensure you cover change initiation, review, approval, implementation, communication, version control, auditing, and feedback."Prompt Example: "As the project manager of a Vendor Management Software deployment, design a Document Change Management procedure. Keeping in mind the dynamic nature of vendor integrations and software updates, outline the process for initiating, reviewing, approving, and implementing changes in documentation. Also, address communication with stakeholders, version control mechanisms, auditing frequency, and feedback integration from both team members and vendors. Aim for consistency and adaptability in your procedure."Output:ConclusionWrapping things up, effective project planning is foundational for success. Our guide has combined the best of ChatGPT and PMBOK to simplify this process for you. We've delved into creating a clear project roadmap, from setting success markers to managing changes effectively. By detailing scope, listing deliverables, breaking tasks down, budgeting, and designing crucial sub-plans, we've covered the essentials of project planning. Using our straightforward templates and examples, you're equipped to navigate project management with clarity and confidence. As we conclude, remember: proper planning today paves the way for smoother project execution tomorrow. Let's put these tools to work and achieve those project goals!Author BioDr. Anshul Saxena is an author, corporate consultant, inventor, and educator who assists clients in finding financial solutions using quantum computing and generative AI. He has filed over three Indian patents and has been granted an Australian Innovation Patent. Anshul is the author of two best-selling books in the realm of HR Analytics and Quantum Computing (Packt Publications). He has been instrumental in setting up new-age specializations like decision sciences and business analytics in multiple business schools across India. Currently, he is working as Assistant Professor and Coordinator – Center for Emerging Business Technologies at CHRIST (Deemed to be University), Pune Lavasa Campus. Dr. Anshul has also worked with reputed companies like IBM as a curriculum designer and trainer and has been instrumental in training 1000+ academicians and working professionals from universities and corporate houses like UPES, CRMIT, and NITTE Mangalore, Vishwakarma University, Pune & Kaziranga University, and KPMG, IBM, Altran, TCS, Metro CASH & Carry, HPCL & IOC. With a work experience of 5 years in the domain of financial risk analytics with TCS and Northern Trust, Dr. Anshul has guided master's students in creating projects on emerging business technologies, which have resulted in 8+ Scopus-indexed papers. Dr. Anshul holds a PhD in Applied AI (Management), an MBA in Finance, and a BSc in Chemistry. He possesses multiple certificates in the field of Generative AI and Quantum Computing from organizations like SAS, IBM, IISC, Harvard, and BIMTECH.Author of the book: Financial Modeling Using Quantum Computing

0
0
17992

article-image-large-language-models-llms-in-education

Chaitanya Yadav

23 Oct 2023

8 min read

Large Language Models (LLMs) in Education

Chaitanya Yadav

23 Oct 2023

8 min read

3
0
22202

article-image-testing-large-language-models-llms

20 Oct 2023

7 min read

Testing Large Language Models (LLMs)

20 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Machine learning has become ubiquitous, with models powering everything from search engines and recommendation systems to chatbots and autonomous vehicles. As these models grow more complex, testing them thoroughly is crucial to ensure they behave as expected. This is especially true for large language models like GPT-4 that generate human-like text and engage in natural conversations.In this article, we will explore strategies for testing machine learning models, with a focus on evaluating the performance of LLMs.IntroductionMachine learning models are notoriously challenging to test due to their black-box nature. Unlike traditional code, we cannot simply verify the logic line-by-line. ML models learn from data and make probabilistic predictions, so their decision-making process is opaque.While testing methods like unit testing and integration testing are common for traditional software, they do not directly apply to ML models. We need more specialized techniques to validate model performance and uncover unexpected or undesirable behavior.Testing is particularly crucial for large language models. Since LLMs can generate free-form text, it's hard to anticipate their exact responses. Flaws in the training data or model architecture can lead to Hallucinations, biases, and errors that only surface during real-world usage. Rigorous testing provides confidence that the model works as intended.In this article, we will cover testing strategies to evaluate LLMs. The key techniques we will explore are:Similarity testingColumn coverage testingExact match testingVisual output testingLLM-based evaluationBy combining these methods, we can thoroughly test LLMs along multiple dimensions and ensure they provide coherent, accurate, and appropriate responses.Testing Text Output with Similarity SearchA common output from LLMs is text. This could be anything from chatbot responses to summaries generated from documents. A robust way to test quality of text output is similarity testing.The idea is simple - we define an expected response and compare the model's actual response to determine how similar they are. The higher the similarity score, the better.Let's walk through an example using our favorite LLM. Suppose we give it the prompt:Prompt: What is the capital of Italy?The expected response would be:Expected: The capital of Italy is Rome.Now we can pass this prompt to the LLM and get the actual response:prompt = "What is the capital of Italy?" actual = llm.ask(prompt) Let's say actual contains:Actual: Rome is the capital of Italy.While the wording is different, the meaning is the same. To quantify this similarity, we can use semantic search libraries like SentenceTransformers. It represents sentences as numeric vectors and computes similarity using cosine distance.from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') expected_embedding = model.encode(expected) actual_embedding = model.encode(actual) similarity = cosine_similarity([expected_embedding], [actual_embedding])[0][0] This yields a similarity score of 0.85, indicating the responses are highly similar in meaning.We can establish a threshold for the minimum acceptable similarity, like 0.8. Responses below this threshold fail the test. By running similarity testing over many prompt-response pairs, we can holistically assess the textual coherence of an LLM.Testing Tabular Outputs with Column CoverageIn addition to text, LLMs can output tables or data frames. For testing these, we need different techniques that account for structure.A good validation is column coverage - checking what percentage of columns in the expected output are present in the actual output.Consider the LLM answering questions about movies:Prompt: What are the top 3 highest grossing movies of all time?Expected:MovieWorldwide GrossRelease YearAvatar$2,789,679,7942009Titanic$2,187,463,9441997Star Wars Ep. VII$2,068,223,6242015Now we can test the LLM’s actual output:prompt = "What are the top 3 highest grossing movies of all time?" actual = llm.ask(prompt) Actual:MovieGlobal RevenueYearAvatar$2.789 billion2009Titanic$2.187 billion1997Star Wars: The Force Awakens$2.068 billion2015Here, actual contains the same 3 columns as expected - Movie, Gross, Release Year. So even though the headers and cell values differ slightly, we can pair them with cosine similarity and we will have 100% column coverage.We can formalize this in code:expected_cols = set(expected.columns) actual_cols = set(actual.columns) column_coverage = len(expected_cols & actual_cols) / len(expected_cols) # column_coverage = 1.0 For tables with many columns, we may only need say 90% coverage to pass the test. This validation ensures the critical output columns are present while allowing variability in column names or ancillary data.Exact Match for Numeric OutputsWhen LLMs output a single number or statistic, we can use simple exact match testing.Consider this prompt:Prompt: What was Apple's total revenue in 2021?Expected: $365.82 billionWe get the LLM’s response:prompt = "What was Apple's total revenue in 2021?" actual = llm.ask(prompt) Actual: $365.82 billionIn this case, we expect an exact string match:is_match = (actual == expected) # is_match = True For numerical outputs, precision is important. Exact match testing provides a straightforward way to validate this.Screenshot Testing for Visual OutputsBuilding PandasAI, we sometimes need to test generated charts. Testing these outputs requires verifying the visualized data is correct.One method is screenshot testing - comparing screenshots of the expected and actual visuals. For example:Prompt: Generate a bar chart comparing the revenue of FAANG companies.Expected: [Expected_Chart.png]Actual: [Actual_Chart.png]We can then test if the images match:from PIL import Image, ImageChops expected_img = Image.open("./Expected_Chart.png") actual_img = Image.open("./Actual_Chart.png") diff = ImageChops.difference(expected_img, actual_img) is_match = diff.getbbox() is None // is_match = True if images matchFor more robust validation, we could use computer vision techniques like template matching to identify and compare key elements: axes, bars, labels, etc.Screenshot testing provides quick validation of visual output without needing to interpret the raw chart data.LLM-Based EvaluationAn intriguing idea for testing LLMs is to use another LLM!The concept is to pass the expected and actual outputs to a separate "evaluator" LLM and ask if they match.For example:Expected: Rome is the capital of Italy.Actual: The capital of Italy is Rome.We can feed this to the evaluator model:Prompt: Do these two sentences convey the same information? Answer YES or NOSentence 1: Rome is the capital of Italy.Sentence 2: The capital of Italy is Rome.Evaluator: YESThe evaluator LLM acts like a semantic similarity scorer. This takes advantage of the natural language capabilities of LLMs.The downside is it evaluates one black box model using another black box model. Errors or biases in the evaluator could lead to incorrect assessments. So LLM-based evaluation should complement other testing approaches, not act as the sole method.ConclusionTesting machine learning models thoroughly is critical as they grow more ubiquitous and impactful. Large language models pose unique testing challenges due to their free-form textual outputs.Using a combination of similarity testing, column coverage validation, exact match, visual output screening, and even LLM-based evaluation, we can rigorously assess LLMs along multiple dimensions.A comprehensive test suite combining these techniques will catch more flaws and flaws than any single method alone. This builds essential confidence that LLMs behave as expected in the real world.Testing takes time but prevents much larger problems down the road. The strategies covered in this article will add rigor to the development and deployment of LLMs, helping ensure these powerful models benefit humanity as intended.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.

0
0
24150

Chaitanya Yadav

20 Oct 2023

10 min read

ChatGPT for SQL Queries

Chaitanya Yadav

20 Oct 2023

10 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT is an efficient language that may be used in a range of tasks, including the creation of SQL queries. In this article, you will get to know how effectively you will be able to use SQL queries by using ChatGPT to optimize and craft them correctly to get perfect results.It is necessary to have sufficient SQL knowledge before you can use ChatGPT for the creation of SQL queries. The language that the databases are communicating with is SQL. This is meant to be used for the production, reading, updating, and deletion of data from databases. SQL is the most specialized language in this domain. It's one of the main components in a lot of existing applications because it deals with structured data that can be retrieved from tables.There are a number of different SQL queries, but some more common ones include the following:SELECT: It will select data from a database.INSERT: It will insert new data into a database.UPDATE: This query will update the existing data in a database.DELETE: This query is used to delete data from a database.Using ChatGPT to write SQL queriesOnce you have a basic understanding of SQL, you can start using ChatGPT to write SQL queries. To do this, you need to provide ChatGPT with a description of the query that you want to write. After that, ChatGPT will generate the SQL code for you.For example, you could just give ChatGPT the query below to write an SQL query to select all of the customers in your database.Select all of the customers in my databaseFollowing that, ChatGPT will provide the SQL code shown below:SELECT * FROM customers;The customer table's entire set of columns will be selected by this query. Additionally, ChatGPT can be used to create more complex SQL statements.How to Use ChatGPT to Describe Your IntentionsNow let’s have a look at some examples where we will ask ChatGPT to generate SQL code by asking it queries from our side.For Example:We'll be creating a sample database for ChatGPT, so we can ask them to set up restaurant databases and two tables.ChatGPT prompt:Create a sample database with two tables: GuestInfo and OrderRecords. The GuestInfo table should have the following columns: guest_id, first_name, last_name, email_address, and contact_number. The OrderRecords table should have the following columns: order_id, guest_id, product_id, quantity_ordered, and order_date.ChatGPT SQL Query Output:We requested that ChatGPT create a database and two tables in this example. After it generated a SQL query. The following SQL code is to be executed on the Management Studio software for SQL Server. As we are able to see the code which we got from ChatGPT successfully got executed in the SSMS Database software.How ChatGPT Can Be Used for Optimizing, Crafting, and Debugging Your QueriesSQL is an efficient tool to manipulate and interrogate data in the database. However, in particular, for very complex datasets it may be difficult to write efficient SQL queries. The ChatGPT Language Model is a robust model to help you with many tasks, such as optimizing SQL queries.Generating SQL queriesThe creation of SQL queries from Natural Language Statements is one of the most common ways that ChatGPT can be used for SQL optimization. Users who don't know SQL may find this helpful, as well as users who want to quickly create the query for a specific task.For example, you could ask for ChatGPT in the following way:Generate an SQL query to select all customers who have placed an order in the last month.ChatGPT would then generate the following query:SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH;Optimizing existing queriesThe optimization of current SQL queries can also be achieved with ChatGPT. You can do this by giving ChatGPT the query that you want improved performance of and it will then suggest improvements to your query.For example, you could ask for ChatGPT in the following way:SELECT * FROM products WHERE product_name LIKE '%shirt%';ChatGPT might suggest the following optimizations:Add an index to the products table on the product_name column.Use a full-text search index on the product_name column.Use a more specific LIKE clause, such as WHERE product_name = 'shirt' if you know that the product name will be an exact match.Crafting queriesBy providing an interface between SQL and Natural Language, ChatGPT will be able to help with the drafting of complicated SQL queries. For users who are not familiar with SQL and need to create a quick query for a specific task, it can be helpful.For Example:Let's say we want to know which customers have placed an order within the last month, and spent more than $100 on it, then write a SQL query. The following query could be generated by using ChatGPT:SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH AND order_total > 100;This query is relatively easy to perform, but ChatGPT can also be used for the creation of more complicated queries. For example, to select all customers who have placed an order in the last month and who have purchased a specific product, we could use ChatGPT to generate a query.SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH AND order_items LIKE '%product_name%';Generating queries for which more than one table is involved can also be done with ChatGPT. For example, to select all customers who have placed an order in the last month and have also purchased a specific product from a specific category, we could use ChatGPT to generate a query.SELECT customers.*FROM customersINNER JOIN orders ON customers.id = orders.customer_idINNER JOIN order_items ON orders.id = order_items.order_idWHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTHAND order_items_product_id = (SELECT id FROM products WHERE product_name = 'product_name')AND product_category_id = (SELECT id FROM product_categories WHERE category_name = 'category_name');The ChatGPT tool is capable of providing assistance with the creation of complex SQL queries. The ChatGPT feature facilitates users' writing efficient and accurate queries by providing an interface to SQL in a natural language.Debugging SQL queriesFor debugging SQL queries, the ChatGPT can also be used. To get started, you can ask ChatGPT to deliver a query that does not return the anticipated results. It will try to figure out why this is happening.For example, you could ask for ChatGPT in the following way:SELECT * FROM customers WHERE country = 'United States';Let's say that more results than expected are returned by this query. If there are multiple rows in a customer table, or the country column isn't being populated correctly for all clients, ChatGPT may suggest that something is wrong.How ChatGPT can help diagnose SQL query errors and suggest potential fixesYou may find that ChatGPT is useful for diagnosing and identifying problems, as well as suggesting possible remedies when you encounter errors or unexpected results in your SQL queries.To illustrate how ChatGPT could help you diagnose and correct SQL queries, we'll go over a hands-on example.Scenario: You'll be working with a database for Internet store transactions. The 'Products' table is where you would like to see the total revenue from a specific product named "Laptop". But you'll get unexpected results while running a SQL query.Your SQL Query:SELECT SUM(price) AS total_revenue FROM Products WHERE product_name = 'Laptop'; Issue: The query is not providing the expected results. You're not sure what went wrong.ChatGPT Assistance:Diagnosing the Issue:You can ask ChatGPT something like, "What could be the issue with my SQL query to calculate the total revenue of 'Laptop' from the Products table?"ChatGPT’s Response:The ChatGPT believes that the problem may arise from a WHERE clause. It suggests that because the names of products may not be distinctive, and there might be a lot of entries called 'Laptops', it is suggested to use ProductID rather than the product name. This query could be modified as follows:SELECT SUM(price) AS total_revenue FROM Products WHERE product_id = (SELECT product_id FROM Products WHERE product_name = 'Laptop');Explanation and Hands-on Practice:The reasoning behind this adjustment is explained by ChatGPT. In order to check if the revised query is likely to lead to an expected overall profit for a 'Laptop' product, you can then try running it.SELECT SUM(price) AS total_revenue FROM Products WHERE product_id = (SELECT product_id FROM Products WHERE product_name = 'Laptop');We have obtained the correct overall revenue from a 'Laptop' product with this query, which has resolved your unanticipated results issue.This hands-on example demonstrates how ChatGPT can help you diagnose and resolve your SQL problems, provide tailored suggestions, explain the solutions to fix them, and guide you through the process of strengthening your SQL skills by using practical applications.ConclusionIn conclusion, this article provides insight into the important role that ChatGPT plays when it comes to generating efficient SQL queries. In view of the key role played by SQL in database management for structured data, which is essential to modern applications, it stressed that there should be a solid knowledge base on SQL so as to effectively use ChatGPT when creating queries. We explored how ChatGPT could help you generate, optimize, and analyze SQL queries by presenting practical examples and use cases.It explains to users how ChatGPT is able to diagnose SQL errors and propose a solution, which in the end can help them solve unforeseen results and improve their ability to use SQL. In today's data-driven world where effective data manipulation is a necessity, ChatGPT becomes an essential ally for those who seek to speed up the SQL query development process, enhance accuracy, and increase productivity. It will open up new possibilities for data professionals and developers, allowing them to interact more effectively with databases.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.

0
0
33339

article-image-ai-powered-data-visualization-with-snowflake

Shankar Narayanan

19 Oct 2023

8 min read

AI-Powered Data Visualization with Snowflake

Shankar Narayanan

19 Oct 2023

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models, also known as LLM and generative Artificial Intelligence (AI), are revolutionizing various enterprises and businesses' productivity. One can expect the benefits of automation of the fast generation of insights and repetitive tasks from a large data pool.Pursuing insights has developed cutting-edge data storage solutions, including the Snowflake data cloud. This has the capabilities of artificial intelligence in visualizing data.Let us explore the synergy between Snowflake and AI, which facilitates data exploration while empowering businesses to acquire profound insights.Snowflake Data Cloud: the foundation for modern data warehousingEven before we start with our exploration, it is imperative to understand how Snowflake plays a significant role in modern data warehousing. It is a cloud-based data warehousing platform known for performance, ease of use, and scalability. As it provides a flexible and secure environment for analyzing and storing data, it is an ideal choice for every enterprise that deals with diverse and large data sets.Key featuresSome of the critical elements of the snowflake data cloud are mentioned below.● Separates computing and storage.Snowflake's unique architecture helps scale the organization's computing resources independently from the storage. It helps to result in performance optimization and cost efficiency.● Data SharingWith the help of seamless data sharing, Snowflake helps enterprises to share data between organizations that can foster data monetization opportunities and collaboration.● Multi-cloud supportOne must know that Snowflake is compatible with most of the cloud providers. Therefore, it allows businesses to leverage their preferred cloud infrastructure.Unleash the potential of AI-Powered Data VisualizationWhen you have understood the concept of Snowflake, it is time that you get introduced to a game changer: AI-powered data visualization. The AI algorithm has undoubtedly evolved. They help to assist in the analyses and exploration of complex data sets while revealing insights and patterns that can be challenging to discover through traditional methods.Role of AI in data visualisationAI in data visualization plays a significant role. Some of these are:● Predictive analyticsThe machine learning models help forecast anomalies and trends, thus enabling businesses and enterprises to make proactive decisions.● Automated InsightsArtificial intelligence can analyze data sets quickly. It helps reduce the time required for manual analyses and extracts meaningful insights.● Natural Language ProcessingNatural Language Processing or NLP algorithms can help to turn the textual data into visual representation. This process makes the unstructured data readily accessible.Harness the power of AI and SnowflakeLet us explore how one can collaborate with snowflakes and artificial intelligence to empower their business in gaining deeper insights.● Data integrationThe ease of integration presented by Snowflake allows the organization to centralize the data. It does not matter whether the businesses consolidate their data from IOT devices, external partners, or internal sources. The unified data repository eventually becomes the foundation for AI-powered exploration.Example:Creating a Snowflake database and warehouse-- -- Create a new Snowflake database CREATE DATABASE my_database; -- Create a virtual warehouse for query processing CREATE WAREHOUSE my_warehouse WITH WAREHOUSE_SIZE = 'X-SMALL' AUTO_SUSPEND = 600 AUTO_RESUME = TRUE; 2. Loading data into Snowflake-- Create an external stage for data loading CREATE OR REPLACE STAGE my_stage URL = 's3://my-bucket/data/' CREDENTIALS = (AWS_KEY_ID = 'your_key_id' AWS_SECRET_KEY = 'your_secret_key'); -- Copy data from the stage into a Snowflake table COPY INTO my_table FROM @my_stage FILE_FORMAT = (TYPE = CSV) ON_ERROR = 'CONTINUE';● AI-driven code generationOne of the exciting aspects of collaborating AI and Snowflake happens to be the ability of artificial intelligence to generate code for data visualization. Here is how the process works.● Data preprocessingAI algorithms can prepare data for visualization while reducing the burden of the data engineers. At the same time, it has the capability of cleaning and transforming the data for visualization● Visualization suggestions Artificial intelligence helps to analyze data while suggesting appropriate visualization types, including scatter plots, charts, bars, and more. It analyses based on the characteristics presented by the data set● Automated code generationAfter choosing the visualization type, artificial intelligence helps generate the code needed to create interactive visualization. Hence, the process becomes accessible to every non-technical user.Let us know this with the help of the below example.from sklearn.cluster import KMeans from yellowbrick.cluster import KElbowVisualizer # Using AI to determine the optimal number of clusters (K) in K-means model = KMeans() visualizer = KElbowVisualizer(model, k=(2, 10)) visualizer.fit(scaled_data) visualizer.show() ● Interactive data explorationWith the help of AI-generated visualization, one can interact with the data effortlessly. The business can drill down, explore, and filter its data dynamically while gaining deeper insights into the real-time scenario. Such a level of interactivity empowers every business user to make informed data-driven decisions without heavily relying on IT teams or data analysts.Examples: import dash import dash_core_components as dcc import dash_html_components as html from dash.dependencies import Input, Output import plotly.express as px app = dash.Dash(__name__) # Define the layout of the web app app.layout = html.Div([ dcc.Graph(id='scatter-plot'), dcc.Dropdown( id='x-axis', options=[ {'label': 'Feature 1', 'value': 'feature1'}, {'label': 'Feature 2', 'value': 'feature2'} ], value='feature1' ) ]) # Define callback to update the scatter plot @app.callback( Output('scatter-plot', 'figure'), [Input('x-axis', 'value')] ) def update_scatter_plot(selected_feature): fig = px.scatter(data_frame=scaled_data, x=selected_feature, y='target', title='Scatter Plot') fig.update_traces(marker=dict(size=5)) return fig if __name__ == '__main__': app.run_server(debug=True) From this web application, the users can interactively explore data.Benefits of AI and Snowflake for Enterprises● Faster decision makingWith the help of code, generation, and data preprocessing automation, the business can enable faster decision-making. Also, the real-time interactive exploration helps in reducing the time it takes to derive specific insights from data.● Democratize the data access.The AI-generated visualization helps every non-technical user explore data while democratizing access to insights. It reduces the bottleneck faced by the data science team and data analyst.● Enhance predictive capabilitiesThe AI-powered predictive analytics present within Snowflake helps uncover hidden patterns and trends. It enables every enterprise and business to stay ahead of the competition and make proactive decisions.● Cost efficiency and scalability The AI-driven automation and Snowflake's scalability ensures that business can handle large data sets without breaking the bank.Conclusion In summary, the combination of Snowflake Data Cloud and data visualization powered by AI is the game changer for enterprises and businesses looking to gain insights from their data. Aiding with automating code creation, simplifying data integration, and facilitating exploration, such collaboration empowers companies to make informed decisions based on data. As we progress in the field of data analytics, it will be crucial for organizations to embrace these technologies to remain competitive and unlock the potential of their data.With Snowflake and AI working together, exploring data evolves from being complicated and time-consuming to becoming interactive, enlightening, and accessible for everyone. Ultimately, this transformation revolutionizes how enterprises utilize the power of their data.Is this code a prompt or does the reader have to manually type? If the reader has to type, please share the text code so they can copy and paste it for convenience.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.

0
0
7324

article-image-chatgpt-prompting-basics-finding-your-ip-address

Clint Bodungen

18 Oct 2023

6 min read

ChatGPT Prompting Basics: Finding Your IP Address

Clint Bodungen

18 Oct 2023

6 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, ChatGPT for Cybersecurity Cookbook, by Clint Bodungen. Master ChatGPT and the OpenAI API, and harness the power of cutting-edge generative AI and large language models to revolutionize the way you perform penetration testing, threat detection, and risk assessment.IntroductionIn this article, we will explore the basics of ChatGPT prompting using the ChatGPT interface, which is different from the OpenAI Playground we used in the previous recipe. The advantage of using the ChatGPT interface is that it does not consume account credits and is better suited for generating formatted output, such as writing code or creating tables. Getting ready To use the ChatGPT interface, you will need to have an active OpenAI account. If you haven't already, please set up your ChatGPT account. How to do it… In this recipe, we'll guide you through using the ChatGPT interface to generate a Python script that retrieves a user's public IP address. By following these steps, you'll learn how to interact with ChatGPT in a conversation-like manner and receive context-aware responses, including code snippets. Now, let's proceed with the steps in this recipe: 1. In your browser, go to https://chat.openai.com and click “Log in” 2. Log in using your OpenAI credentials. 3. Once you are logged in, you will be taken to the ChatGPT interface. The interface is similar to a chat application, with a text box at the bottom where you can enter your prompts. Figure – The ChatGPT interface 4. ChatGPT uses a conversation-based approach, so you can simply type your prompt as a message and press "Enter" or click the button to receive a response from the model. For example, you can ask ChatGPT to generate a piece of Python code to find the public IP address of a user: Figure – Entering a prompt ChatGPT will generate a response containing the requested Python code, along with a thorough explanation. Figure – ChatGPT response with code 5. Continue the conversation by asking follow-up questions or providing additional information, and ChatGPT will respond accordingly. Figure – ChatGPT contextual follow-up response 6. Run the ChatGPT generated code by clicking on “Copy code”, paste it into your code editor of choice (I personally use Visual Studio Code), save it as a “.py” Python script, and run from a terminal. PS D:\GPT\ChatGPT for Cybersecurity Cookbook> python .\my_ip.py Your public IP address is: Your local network IP address is: 192.168.1.105 Figure – Running the ChatGPT generated script How it works… By using the ChatGPT interface to enter prompts, you can generate context-aware responses and content that continues over the course of an entire conversation like a chatbot. The conversation-based approach allows for more natural interactions and the ability to ask follow-up questions or provide additional context. The responses can even include complex formatting such as code snippets or tables (more on tables later). There’s more… As you become more familiar with ChatGPT, you can experiment with different prompt styles, instructions, and contexts to obtain the desired output for your cybersecurity tasks. You can also compare the results generated through the ChatGPT interface and the OpenAI Playground to determine which approach best fits your needs. Tip:You can further refine the generated output by providing very clear and specific instructions or using roles. It also helps to divide complex prompts into several smaller prompts, giving ChatGPT one instruction per prompt, building on the previous prompts as you go. In the upcoming recipes, we will delve into more advanced prompting techniques that utilize these techniques to help you get the most accurate and detailed responses from ChatGPT. As you interact with ChatGPT, your conversation history is automatically saved in the left panel of the ChatGPT interface. This feature allows you to easily access and review your previous prompts and responses. By leveraging the conversation history feature, you can keep track of your interactions with ChatGPT and quickly reference previous responses for your cybersecurity tasks or other projects. Figure – Conversation history in the ChatGPT interface To view a saved conversation, simply click on the desired conversation in the left panel. You can also create new conversations by clicking on the "+ New chat" button located at the top of the conversation list. This enables you to separate and organize your prompts and responses based on specific tasks or topics. Caution Keep in mind that when you start a new conversation, the model loses the context of the previous conversation. If you want to reference any information from a previous conversation, you will need to include that context in your new prompt. ConclusionIn conclusion, this article has unveiled the power of ChatGPT and its conversation-driven approach, making complex tasks like retrieving your public IP address a breeze. With step-by-step guidance, you've learned to harness ChatGPT's capabilities and enjoy context-aware responses, all while keeping your account credits intact. As you dive deeper into the world of ChatGPT, you'll discover its versatility in various applications and the potential to optimize your cybersecurity endeavors. By mastering ChatGPT's conversational prowess, you're on the path to seamless, productive interactions and a future filled with AI-driven possibilities.Author BioClint Bodungen is a cybersecurity professional with 25+ years of experience and the author of Hacking Exposed: Industrial Control Systems. He began his career in the United States Air Force and has since many of the world's largest energy companies and organizations, working for notable cybersecurity companies such as Symantec, Kaspersky Lab, and Booz Allen Hamilton. He has published multiple articles, technical papers, and training courses on cybersecurity and aims to revolutionize cybersecurity education using computer gaming (“gamification”) and AI technology. His flagship product, ThreatGEN® Red vs. Blue, is the world’s first online multiplayer cybersecurity simulation game, designed to teach real-world cybersecurity.

0
0
18545

article-image-make-your-own-siri-with-openai-whisper-and-bark

Louis Owen

18 Oct 2023

7 min read

Make your own Siri with OpenAI Whisper and Bark

Louis Owen

18 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT has earned its reputation as a versatile and capable assistant. From helping you craft the perfect piece of writing, planning your next adventure, aiding your coding endeavors, or simply engaging in light-hearted conversations, ChatGPT can do it all. It's like having a digital Swiss Army knife at your fingertips. But have you ever wondered what it would be like if ChatGPT could communicate with you not just through text, but also through speech? Imagine the convenience of issuing voice commands and receiving spoken responses, just like your own personal Siri. Well, the good news is, that this is now possible thanks to the remarkable combination of OpenAI Whisper and Bark.Bringing the power of voice interaction to ChatGPT is a game-changer. Instead of typing out your queries and waiting for text-based responses, you can seamlessly converse with ChatGPT, making your interactions more natural and efficient. Whether you're a multitasking enthusiast, a visually impaired individual, or someone who prefers spoken communication, this development holds incredible potential.So, how is this magic achieved? The answer lies in the fusion of two crucial components: Speech-to-Text (STT) and Text-to-Speech (TTS) modules.STT, as the name suggests, is the technology responsible for converting spoken words into text. OpenAI's Whisper is a groundbreaking pre-trained model for Automatic Speech Recognition (ASR) and speech translation. The model has been trained on an astonishing 680,000 hours of labeled data, giving it an impressive ability to adapt to a variety of datasets and domains without the need for fine-tuning.Whisper comes in two flavors: English-only and multilingual models. The English-only models are trained for the specific task of speech recognition, where they accurately predict transcriptions in the same language as the spoken audio. The multilingual models, on the other hand, are trained to handle both speech recognition and speech translation. In this case, the model predicts transcriptions in a language different from the source audio, adding an extra layer of versatility. Imagine speaking in one language and having ChatGPT instantly respond in another - Whisper makes it possible.On the other side of the conversation, we have Text-to-Speech (TTS) technology. This essential component converts ChatGPT's textual responses into lifelike speech. Bark, an open-source model developed by Suno AI, is a transformer-based text-to-speech marvel. It's what makes ChatGPT's spoken responses sound as engaging and dynamic as Siri's.Just like with Whisper, Bark is a reliable choice for its remarkable ability to turn text into speech, creating a human-like conversational experience. ChatGPT now not only thinks like a human but speaks like one too, thanks to Bark.The beauty of this integration is that it doesn't require you to be a tech genius. HuggingFace, a leading platform for natural language processing, supports both the TTS and STT pipeline. In simpler terms, it streamlines the entire process, making it accessible to anyone.You don't need to be a master coder or AI specialist to make it work. All you have to do is select the model you prefer for STT (Whisper) and another for TTS (Bark). Input your commands and queries, and let HuggingFace take care of the rest. The result? An intelligent, voice-activated ChatGPT can assist you with whatever you need.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize both Whisper and Bark along with OpenAI GPT-3.5-Turbo to create your own Siri!Building the STTOpenAI Whisper is a powerful ASR/STT model that can be seamlessly integrated into your projects. It has been pre-trained on an extensive dataset, making it highly capable of recognizing and transcribing spoken language.Here's how you can use OpenAI Whisper for STT with HuggingFace pipeline. Note that the `sample_audio` here will be the user’s command to the ChatGPT.from transformers import pipeline stt = pipeline( "automatic-speech-recognition", model="openai/whisper-medium", chunk_length_s=30, device=device, ) text = stt(sample_audio, return_timestamps=True)["text"]The foundation of any AI model's prowess lies in the data it's exposed to during its training. Whisper is no exception. This ASR model has been trained on a staggering 680,000 hours of audio data and the corresponding transcripts, all carefully gathered from the vast landscape of the internet.Here's how that massive amount of data is divided:● English Dominance (65%): A substantial 65% of the training data, which equates to a whopping 438,000 hours, is dedicated to English-language audio and matched English transcripts. This abundance of English data ensures that Whisper excels in transcribing English speech accurately.● Multilingual Versatility (18%): Whisper doesn't stop at English. About 18% of its training data, roughly 126,000 hours, focuses on non-English audio paired with English transcripts. This diversity makes Whisper a versatile ASR model capable of handling different languages while still providing English transcriptions.● Global Reach (17%): The remaining 17%, which translates to 117,000 hours, is dedicated to non-English audio and the corresponding transcripts. This extensive collection represents a stunning 98 different languages. Whisper's proficiency in transcribing non-English languages is a testament to its global reach.Getting the LLM ResponseWith the user's speech command now transcribed into text, the next step is to harness the power of ChatGPT or GPT-3.5-Turbo. This is where the real magic happens. These advanced language models have achieved fame for their diverse capabilities, whether you need help with writing, travel planning, coding, or simply engaging in a friendly conversation.There are several ways to integrate ChatGPT into your system:LangChain: LangChain offers a seamless and efficient way to connect with ChatGPT. It enables you to interact with the model programmatically, making it a preferred choice for developers.OpenAI Python Client: The OpenAI Python client provides a user-friendly interface for accessing ChatGPT. It simplifies the integration process and is a go-to choice for Python developers.cURL Request: For those who prefer more direct control, cURL requests to the OpenAI endpoint allow you to interact with ChatGPT through a RESTful API. This method is versatile and can be integrated into various programming languages.No matter which method you choose, ChatGPT will take your transcribed speech command and generate a thoughtful, context-aware text-based response, ready to assist you in any way you desire. We’ll not deep dive into this in this article since there are numerous articles explaining this already.Building the TTSThe final piece of the puzzle is Bark, an open-source TTS model. Bark works its magic by converting ChatGPT's textual responses into lifelike speech, much like Siri talks to you. It adds that crucial human touch to the conversation, making your interactions with ChatGPT feel more natural and engaging.Again, we can build the TTS pipeline very easily with the help of HuggingFace pipeline. Here's how you can use Bark for TTS with HuggingFace pipeline. Note that the `text` here will be the ChatGPT response to the user’s command.from transformers import pipeline tts = pipeline("text-to-speech", model="suno/bark-small") response = tts(text) from IPython.display import Audio Audio(response["audio"], rate=response["sampling_rate"])You can see the example quality of the Bark model in this Google Colab notebook.ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned how to build your own Siri with the help of OpenAI Whisper, ChatGPT, and Bark. Hope the best for your experiment in creating your own Siri and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

0
0
7844

article-image-chatgpt-for-power-developers

Jakov Semenski

17 Oct 2023

7 min read

ChatGPT for Power Developers

Jakov Semenski

17 Oct 2023

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionWhat Power Developers Know About ChatGPT's Capabilities That You Don't?You've tinkered with ChatGPT, got some fun replies, and maybe even used it for some quick Q&A.But there's a feeling of missing out, isn't there?ChatGPT feels like a vast ocean, and you've only skimmed the surface.Deep down, you know there's more. What's the secret sauce?It's like having a sports car and only driving in the first gear. ChatGPT is built for more, way more.Hold on to your coding hat, because there's a blueprint, a set of hidden levers and buttons that power users are pressing.Ready to get in on the secret?Envision a world where you're not just using ChatGPT but mastering it.Every challenge, every coding puzzle, you've got a secret weapon.Welcome to the world of Power Developers.Here are 3 advanced prompts you can use to up your AI skills so you can harness ChatGPT like never beforePowerPointYou are about to experience how to create customized, memorable presentations.I will show you how to use ChatGPT to automate your presentation outline generation and generate jaw-dropping content that keeps your viewers engaged.Instead of starting off with blank slides, we will use a format from one of the best Presentation trainers Jason Teteak.Here is the full megaprompt , now don’t get overwhelmed with the length. You just need to replace the TOPIC and AUDIENCE parts.TOPIC= Why do we need Spring framework AUDIENCE= Junor developers who know Java Create a presentation outline for {TOPIC} and {AUDIENCE} by using Famous presentation framework from Jason Teteak from his book Rule the room Make sure to Identify what Audience Wants • What are your biggest concerns or worries? • What are the biggest challenges you have with those areas? • What are the problems they are causing? • What's your ideal outcome? • What would getting that outcome do for vou? Use takeaways Start with an action verb. The trick to doing this is to mentally insert the words "As a result of my presentation, you will be able to..." at the beginning of the phrase. • Use seven words or less. A string of seven items is the maximum number people can hold in their short-term memorv. • Use familiar words. Avoid what I call cliquespeak-using words or assuming a grasp of concepts people new to or unfamiliar to vour field won't understand Identify pain and pleasure pointes, and say how the takleways relieve pain points and enhance pleasure points Define how the takeaways offer happiness, success and/or freedom Create title according to formula Start with an action verb, use 7 words or less, and use familiar words Use the following format For slides use markdown Title is h1 Content is using bulletpoints For what you say use italic and add "You say:" Give your credentials Tell the audience how what you do will help them. Example: "I help community bankers find new income sources. Deliver the main hook Example: "I'm going to offer you a new source of income with less risk plus the expertise you need to expand services to old customers and attract new ones." Main Agenda slide - Complete list of takeaways Highlighted Takeway #1 slide Task Slide #1 - Complete list of tasks for takeaway #1 What you say: Takeway #1 hook sentence Example slide What you say Highlighted Takeway #2 slide Task Slide #2 - Complete list of tasks for takeaway #2 What you say: Takeway #2 hook sentence Highlighted Takeway #3 slide Task Slide #3 - Complete list of tasks for takeaway #3 What you say: Takeway #3 hook sentence Example slide Summary Slide - Complete list of takeaways What you say: Takeway #3 hook sentence Final Slide What you say - offer to stay for individual questions - Thank the audience - add a pleasantry to conclude the presentation (e.g. Have a great day) Here is the full conversation: https://chat.openai.com/share/e116d8c4-b267-466e-9d9e-39799f073e24Here is what you can get from this prompt:Simulate running an appLet’s imagine you want to demo a backend running up.You need to present it to coworkers, or just verify how the final app might work.You would need:have a working coderunning server (locally or in the cloud)running storage (e.g. database)and tools to engage (create GET or POST requests to interact)What if I told you that ChatGPT can do all for you with only 1 prompt?Here is a full prompt, you can just replace the APP part:APP: Spring rest application that persist list of conferences in mysql database, it exposes GET and POST mapping Imagine there is mysql database already running with conferences table. An application can be accessed by invoking GET or POST requests I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. Imagine for a given {APP} we are in the directory where directory which contains full application code. I want you to only reply with the terminal output inside one unique code block, and nothing else. Do no write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd. Here is the chat: https://chat.openai.com/share/74dad74d-8a59-43e8-8c5c-042dfcecda99You get an output of starting an app, or making a POST request to add a conference.ChatGPT did not actually run the code, but frankly, it did an excellent job of simulating everything.Creating Educational OutlineEver noticed how most educational content out there feels like it’s either too basic or way over your head?It's like there's no middle ground.Endless hours scrolling, and reading, but in the end, you're still at square one.That's not learning; that's a wild goose chase.But wait, what if there's a different way?A formula, perhaps, to craft content that resonates, educates, and empowers?Imagine diving into educational material that sparks curiosity, drives understanding, and equips you with actionable insights.It’s time to revolutionize educational content for developers.Be authentic, be clear, and always keep the learner at the heart of your content.Now replace COURSE NAME and AUDIENCE according to your needs.COURSE NAME= How to start writing that are fun and easy Java tests AUDIENCE= Junior developers You are an expert developer in crafting authentic, clear training outline that always keeps the learner at the heart of your content. It sparks curiosity, drives understanding, and equips you with actionable insights. I need you to create an outline for a 5-part educational course called {COURSE NAME} Give this course 3 examples of compelling course names For context, this audience are {AUDIENCE} Your output should be formatted like this: # NAME OF THE COURSE with 3 examples ## PART OF THE COURSE ### Idea 1 - Sub point 1 - Sub point 2 - Sub point 3 ### Idea 2 - Sub point 1 - Sub point 2 - Sub point 3 ### Idea 3 - Sub point 1 - Sub point 2 - Sub point 3 Every PART should be a headline for the respective part Every Idea is one Heading inside that PART Every Sub point is supportive of the above idea Here is the link: https://chat.openai.com/share/096f48c4-8886-4d4c-a051-49eb1516b730And screenshot of the outputConclusionIn conclusion, ChatGPT holds the key to a new realm of coding mastery. By delving into the advanced prompts and hidden techniques, you're poised to become a true Power Developer. Embrace this journey, unleash ChatGPT's potential, and pave the way for a future where you're not just using AI but shaping it to your advantage. With a mix of storytelling, real-world examples, and interactivity, you can craft content that developers crave.Author BioJakov Semenski is an IT Architect working at IBMiX with almost 20 years of experience.He is also a ChatGPT Speaker at the WeAreDevelopers conference and shares valuable tech stories on LinkedIn.

0
0
11306

AI_Distilled #23: Apple’s Gen AI, Nvidia's Eureka AI Agent, Qualcomm’s Snapdragon Elite X chips, DALL·E 3 in ChatGPT Plus, PyTorch Edge’s ExecuTorch, RL with Cloud TPUs

Evaluating Large Language Models

Vector Datastore in Azure Machine Learning Promptflow

ChatGPT for Excel

Efficient Data Caching with Vector Datastore for LLMs

Detecting and Mitigating Hallucinations in LLMs

Getting Started with AI Builder

ChatGPT Prompts for Project Managers

Large Language Models (LLMs) in Education

Testing Large Language Models (LLMs)

Trending Topics

ChatGPT for SQL Queries

AI-Powered Data Visualization with Snowflake

ChatGPT Prompting Basics: Finding Your IP Address

Make your own Siri with OpenAI Whisper and Bark

ChatGPT for Power Developers

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access