Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - AI Tools

89 Articles
article-image-build-a-powerful-rss-news-fetcher-with-weaviate
Alan Bernardo Palacio
31 Aug 2023
21 min read
Save for later

Build a powerful RSS news fetcher with Weaviate

Alan Bernardo Palacio
31 Aug 2023
21 min read
IntroductionIn today's Crypto rapidly evolving world, staying informed about the latest news and developments is crucial. However, with the overwhelming amount of information available, it's becoming increasingly challenging to find relevant news quickly. In this article, we will delve into the creation of a powerful system that fetches real-time news articles from various RSS feeds and stores them in the Weaviate vector database. We will explore how this application lays the foundation for context-based news retrieval and how it can be a stepping stone for more advanced applications, such as similarity search and contextual understanding.Before we dive into the technical details, let's ensure that you have a basic understanding of the technologies we'll be using. Familiarity with Python and Docker will be beneficial as we build and deploy our applications.Setting up the EnvironmentTo get started, we need to set up the development environment. This environment consists of three primary components: the RSS news fetcher, the Weaviate vector database, and the Transformers Inference API for text vectorization.Our application's architecture is orchestrated using Docker Compose. The provided docker-compose.yml file defines three services: rss-fetcher, weaviate, and t2v-transformers. These services interact to fetch news, store it in the vector database, and prepare it for vectorization.version: '3.4' services: rss-fetcher:    image: rss/python    build:      context: ./rss_fetcher app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - rss-fetcher      - weaviate weaviate:    image: semitechnologies/weaviate:latest    restart: on-failure:0    ports:     - "8080:8080"    environment:      QUERY_DEFAULTS_LIMIT: 20      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'      PERSISTENCE_DATA_PATH: "./data"      DEFAULT_VECTORIZER_MODULE: text2vec-transformers      ENABLE_MODULES: text2vec-transformers      TRANSFORMERS_INFERENCE_API: <http://t2v-transformers:8080>      CLUSTER_HOSTNAME: 'node1' t2v-transformers:    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1    environment:      ENABLE_CUDA: 0 # set to 1 to enable      # NVIDIA_VISIBLE_DEVICES: all # enable if running with CUDAEach service is configured with specific environment variables that define its behavior. In our application, we make use of environment variables like OPENAI_API_KEY to ensure secure communication with external services. We also specify the necessary dependencies, such as the Python libraries listed in the requirements.txt files for the rss-fetcher and weaviate services.Creating the RSS News FetcherThe foundation of our news retrieval system is the RSS news fetcher. This component will actively fetch articles from various RSS feeds, extract essential information, and store them in the Weaviate vector database.This is the Dockerfile of our RSS fetcher:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "-u", "rss_fetcher.py"]Our RSS news fetcher is implemented within the rss_fetcher.py script. This script performs several key tasks, including fetching RSS feeds, parsing articles, and inserting data into the Weaviate database.import feedparser import pandas as pd import time from bs4 import BeautifulSoup import requests import random from datetime import datetime, timedelta import pytz import uuid import weaviate import json import time def wait_for_weaviate():    """Wait until Weaviate is available."""      while True:        try:            # Try fetching the Weaviate metadata without initiating the client here            response = requests.get("<http://weaviate:8080/v1/meta>")            response.raise_for_status()            meta = response.json()                      # If successful, the instance is up and running            if meta:                print("Weaviate is up and running!")                return        except (requests.exceptions.RequestException):            # If there's any error (connection, timeout, etc.), wait and try again            print("Waiting for Weaviate...")            time.sleep(5) RSS_URLS = [    "<https://thedefiant.io/api/feed>",    "<https://cointelegraph.com/rss>",    "<https://cryptopotato.com/feed/>",    "<https://cryptoslate.com/feed/>",    "<https://cryptonews.com/news/feed/>",    "<https://smartliquidity.info/feed/>",    "<https://bitcoinmagazine.com/feed>",    "<https://decrypt.co/feed>",    "<https://bitcoinist.com/feed/>",    "<https://cryptobriefing.com/feed>",    "<https://www.newsbtc.com/feed/>",    "<https://coinjournal.net/feed/>",    "<https://ambcrypto.com/feed/>",    "<https://www.the-blockchain.com/feed/>" ] def get_article_body(link):    try:        headers = {            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.3'}        response = requests.get(link, headers=headers, timeout=10)        response.raise_for_status()        soup = BeautifulSoup(response.content, 'html.parser')        paragraphs = soup.find_all('p')        # Directly return list of non-empty paragraphs        return [p.get_text().strip() for p in paragraphs if p.get_text().strip() != ""]    except Exception as e:        print(f"Error fetching article body for {link}. Reason: {e}")        return [] def parse_date(date_str):    # Current date format from the RSS    date_format = "%a, %d %b %Y %H:%M:%S %z"    try:        dt = datetime.strptime(date_str, date_format)        # Ensure the datetime is in UTC        return dt.astimezone(pytz.utc)    except ValueError:        # Attempt to handle other possible formats        date_format = "%a, %d %b %Y %H:%M:%S %Z"        dt = datetime.strptime(date_str, date_format)        return dt.replace(tzinfo=pytz.utc) def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return df def insert_data(df,batch_size=100):    # Initialize the batch process    with client.batch as batch:        batch.batch_size = 100        # Loop through and batch import the 'RSS_Entry' data        for i, row in df.iterrows():            if i%100==0:                print(f"Importing entry: {i+1}")  # Status update            properties = {                "UUID": row["UUID"],                "Title": row["Title"],                "Link": row["Link"],                "Summary": row["Summary"],                "PublishedDate": row["PublishedDate"],                "Body": row["Body"]            }            client.batch.add_data_object(properties, "RSS_Entry") if __name__ == "__main__":    # Wait until weaviate is available    wait_for_weaviate()    # Initialize the Weaviate client    client = weaviate.Client("<http://weaviate:8080>")    client.timeout_config = (3, 200)    # Reset the schema    client.schema.delete_all()    # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()    # Display the schema    print(json.dumps(schema, indent=4))    print("-"*50)    # Current datetime    now = datetime.now(pytz.utc)    # Fetching articles from the last days    days_ago = 3    print(f"Getting historical data for the last {days_ago} days ago.")    last_week = now - timedelta(days=days_ago)    df_hist =  fetch_rss(last_week)    print("Head")    print(df_hist.head().to_string())    print("Tail")    print(df_hist.head().to_string())    print("-"*50)    print("Total records fetched:",len(df_hist))    print("-"*50)    print("Inserting data")    # insert historical data    insert_data(df_hist,batch_size=100)    print("-"*50)    print("Data Inserted")    # check if there is any relevant news in the last minute    while True:        # Current datetime        now = datetime.now(pytz.utc)        # Fetching articles from the last hour        one_min_ago = now - timedelta(minutes=1)        df =  fetch_rss(one_min_ago)        print("Head")        print(df.head().to_string())        print("Tail")        print(df.head().to_string())              print("Inserting data")        # insert minute data        insert_data(df,batch_size=100)        print("data inserted")        print("-"*50)        # Sleep for a minute        time.sleep(60)Before we start fetching news, we need to ensure that the Weaviate vector database is up and running. The wait_for_weaviate function repeatedly checks the availability of Weaviate using HTTP requests. This ensures that our fetcher waits until Weaviate is ready to receive data.The core functionality of our fetcher lies in its ability to retrieve articles from various RSS feeds. We iterate through the list of RSS URLs, using the feedparser library to parse the feeds and extract key information such as the article's title, link, summary, and published date.To provide context for similarity search and other applications, we need the actual content of the articles. The get_article_body function fetches the article's HTML content, parses it using BeautifulSoup, and extracts relevant text paragraphs. This content is crucial for creating a rich context for each article.After gathering the necessary information, we create data objects for each article and insert them into the Weaviate vector database. Weaviate provides a client library that simplifies the process of adding data. We use the weaviate.Client class to interact with the Weaviate instance and batch-insert articles' data objects.Now that we have laid the groundwork for building our context-based news retrieval system, in the next sections, we'll delve deeper into Weaviate's role in this application and how we can leverage it for similarity search and more advanced features.Weaviate Configuration and SchemaWeaviate, an open-source knowledge graph, plays a pivotal role in our application. It acts as a vector database that stores and retrieves data based on their semantic relationships and vector representations. Weaviate's ability to store text data and create vector representations for efficient similarity search aligns perfectly with our goal of context-based news retrieval. By utilizing Weaviate, we enable our system to understand the context of news articles and retrieve semantically similar content.To structure the data stored in Weaviate, we define a class called RSS_Entry. This class schema includes properties like UUID, Title, Link, Summary, PublishedDate, and Body. These properties capture essential information about each news article and provide a solid foundation for context retrieval. # Define the "RSS_Entry" class    class_obj = {        "class": "RSS_Entry",        "description": "An entry from an RSS feed",        "properties": [            {"dataType": ["text"], "description": "UUID of the entry", "name": "UUID"},            {"dataType": ["text"], "description": "Title of the entry", "name": "Title"},            {"dataType": ["text"], "description": "Link of the entry", "name": "Link"},            {"dataType": ["text"], "description": "Summary of the entry", "name": "Summary"},            {"dataType": ["text"], "description": "Published Date of the entry", "name": "PublishedDate"},            {"dataType": ["text"], "description": "Body of the entry", "name": "Body"}        ],        "vectorizer": "text2vec-transformers"    }    # Add the schema    client.schema.create_class(class_obj)    # Retrieve the schema    schema = client.schema.get()The uniqueness of Weaviate lies in its ability to represent text data as vectors. Our application leverages the text2vec-transformers module as the default vectorizer. This module transforms text into vector embeddings using advanced language models. This vectorization process ensures that the semantic relationships between articles are captured, enabling meaningful similarity search and context retrieval.Real-time and Historical Data InsertionEfficient data insertion is vital for ensuring that our Weaviate-based news retrieval system provides up-to-date and historical context for users. Our application caters to two essential use cases: real-time context retrieval and historical context analysis. The ability to insert real-time news articles ensures that users receive the most recent information. Additionally, historical data insertion enables a broader perspective by allowing users to explore trends and patterns over time.To populate our database with historical data, we utilize the fetch_rss function. This function fetches news articles from the last few days, as specified by the days_ago parameter. The retrieved articles are then processed, and data objects are batch-inserted into Weaviate. This process guarantees that our database contains a diverse set of historical articles.def fetch_rss(from_datetime=None):    all_data = []    all_entries = []      # Step 1: Fetch all the entries from the RSS feeds and filter them by date.    for url in RSS_URLS:        print(f"Fetching {url}")        feed = feedparser.parse(url)        entries = feed.entries        print('feed.entries', len(entries))        for entry in feed.entries:            entry_date = parse_date(entry.published)                      # Filter the entries based on the provided date            if from_datetime and entry_date <= from_datetime:                continue            # Storing only necessary data to minimize memory usage            all_entries.append({                "Title": entry.title,                "Link": entry.link,                "Summary": entry.summary,                "PublishedDate": entry.published            })    # Step 2: Shuffle the filtered entries.    random.shuffle(all_entries)    # Step 3: Extract the body for each entry and break it down by paragraphs.    for entry in all_entries:        article_body = get_article_body(entry["Link"])        print("\\nTitle:", entry["Title"])        print("Link:", entry["Link"])        print("Summary:", entry["Summary"])        print("Published Date:", entry["PublishedDate"])        # Create separate records for each paragraph        for paragraph in article_body:            data = {                "UUID": str(uuid.uuid4()), # UUID for each paragraph                "Title": entry["Title"],                "Link": entry["Link"],                "Summary": entry["Summary"],                "PublishedDate": entry["PublishedDate"],                "Body": paragraph            }            all_data.append(data)    print("-" * 50)    df = pd.DataFrame(all_data)    return dfThe real-time data insertion loop ensures that newly published articles are promptly added to the Weaviate database. We fetch news articles from the last minute and follow the same data insertion process. This loop ensures that the database is continuously updated with fresh content.ConclusionIn this article, we've explored crucial aspects of building an RSS news retrieval system with Weaviate. We delved into Weaviate's role as a vector database, examined the RSS_Entry class schema, and understood how text data is vectorized using text2vec-transformers. Furthermore, we discussed the significance of real-time and historical data insertion in providing users with relevant and up-to-date news context. With a solid foundation in place, we're well-equipped to move forward and explore more advanced applications, such as similarity search and context-based content retrieval, which is what we will be building in the next article. The seamless integration of Weaviate with our news fetcher sets the stage for a powerful context-aware information retrieval system.Author BioAlan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, and Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder of startups, and later on earned a Master's degree from the faculty of Mathematics at the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn   
Read more
  • 0
  • 0
  • 7046

article-image-creating-openai-and-azure-openai-functions-in-power-bi-dataflows
Greg Beaumont
09 Oct 2023
7 min read
Save for later

Creating OpenAI and Azure OpenAI functions in Power BI dataflows

Greg Beaumont
09 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Power BI Machine Learning and OpenAI, by Greg Beaumont. Master core data architecture design concepts and Azure Data & AI services to gain a cloud data and AI architect’s perspective to developing end-to-end solutions IntroductionAs noted earlier, integrating OpenAI and Azure OpenAI with Power Query or dataflows currently requires custom M code. To facilitate this process, we have provided M code for both OpenAI and Azure OpenAI, giving you the flexibility to choose which version to use based on your specific needs and requirements.By leveraging this provided M code, you can seamlessly integrate OpenAI or Azure OpenAI with your existing Power BI solutions. This will allow you to take advantage of the unique features and capabilities offered by these powerful AI technologies, while also gaining insights and generating new content from your data with ease.OpenAI and Azure OpenAI functionsOpenAI offers a user-friendly API that can be easily accessed and utilized from within Power Query or dataflows in Power BI. For further information regarding the specifics of the API, we refer you to the official OpenAI documentation, available at this link: https://platform.openai.com/ docs/introduction/overview.It is worth noting that optimizing and tuning the OpenAI API will likely be a popular topic in the coming year. Various concepts, including prompt engineering, optimal token usage, fine-tuning, embeddings, plugins, and parameters that modify response creativity (such as temperature and top p), can all be tested and fine-tuned for optimal results.While these topics are complex and may be explored in greater detail in future works, this book will focus primarily on establishing connectivity between OpenAI and Power BI. Specifically, we will explore prompt engineering and token limits, which are key considerations that will be incorporated into the API call to ensure optimal performance:Prompts: Prompt engineering, in basic terms, is the English-language text that will be used to preface every API call. For example, instead of sending [Operator] and [Airplane] as values without context, text was added to the request in the previous chapter such that the API will receive Tell me about the airplane model [Aircraft] operated by [Operator] in three sentences:. The prompt adds context to the values passed to the OpenAI model.Tokens: Words sent to the OpenAI model get broken into chunks called tokens. Per the OpenAI website, a token contains about four English language characters. Reviewing the Remarks column in the Power BI dataset reveals that most entries have up to 2,000 characters. (2000 / 4) = 500, so you will specify 500 as the token limit. Is that the right number? You’d need to do extensive testing to answer that question, which goes beyond the scope of this book.Let’s get started with building your OpenAI and Azure OpenAI API calls for Power BI dataflows!Creating OpenAI and Azure OpenAI functions for Power BI dataflowsYou will create two functions for OpenAI in your dataflow named OpenAI. The only difference between the two will be the token limits. The purpose of having different token limits is primarily cost savings, since larger token limits could potentially run up a bigger bill. Follow these steps to create a new function named OpenAIshort:1.      Select Get data | Blank query.2.      Paste in the following M code and select Next. Be sure to replace abc123xyz with your OpenAI API key.Here is the code for the function. The code can also be found as 01 OpenAIshortFunction.M in the Packt GitHub repository at https://github.com/PacktPublishing/ Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/ tree/main/Chapter-13:let callOpenAI = (prompt as text) as text => let jsonPayload = "{""prompt"": """ & prompt & """, ""max_tokens"": " & Text.From(120) & "}", url = "https://api.openai.com/v1/engines/ text-davinci-003/completions", headers = [#"Content-Type"="application/json", #"Authorization"="Bearer abc123xyz"], response = Web.Contents(url, [Headers=headers, Content=Text.ToBinary(jsonPayload)]), jsonResponse = Json.Document(response), choices = jsonResponse[choices], text = choices{0}[text] in text in callOpenAI3.      Now, you can rename the function OpenAIshort. Right-click on the function in the Queries panel and duplicate it. The new function will have a larger token limit.4.      Rename this new function OpenAIlong.5.      Right-click on OpenAIlong and select Advanced editor.6.      Change the section of code reading Text.From(120) to Text.From(500).7.      Click OK.Your screen should now look like this:Figure 13.1 – OpenAI functions added to a Power BI dataflowThese two functions can be used to complete the workshop for the remainder of this chapter. If you’d prefer to use Azure OpenAI, the M code for OpenAIshort would be as follows. Remember to replace PBI_OpenAI_project with your Azure resource name, davinci-PBIML with your deployment name, and abc123xyz with your API key:let callAzureOpenAI = (prompt as text) as text => let jsonPayload = "{""prompt"": """ & prompt & """, ""max_tokens"": " & Text.From(120) & "}" url = "https://" & "PBI_OpenAI_project" & ".openai.azure. com" & "/openai/deployments/" & "davinci-PBIML" & "/ completions?api-version=2022-12-01", headers = [#"Content-Type"="application/json", #"api-key"="abc123xyz"], response = Web.Contents(url, [Headers=headers, Content=Text.ToBinary(jsonPayload)]), jsonResponse = Json.Document(response), choices = jsonResponse[choices], text = choices{0}[text] in text in callAzureOpenAIAs with the previous example, changing the token limit for Text.From(120) to Text.From(500) is all you need to do to create an Azure OpenAI function for 500 tokens instead of 120. The M code to create the dataflows for your OpenAI functions can also be found on the Packt GitHub site at this link: https://github.com/PacktPublishing/Unleashing-Your-Data-withPower-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-13.Now that you have your OpenAI and Azure OpenAI functions ready to go in a Power BI dataflow, you can test them out on the FAA Wildlife Strike data!ConclusionIn conclusion, this article has provided valuable insights into integrating OpenAI and Azure OpenAI with Power BI dataflows using custom M code. By offering M code for both OpenAI and Azure OpenAI, it allows users to seamlessly incorporate these powerful AI technologies into their Power BI solutions. The article emphasizes the significance of prompt engineering and token limits in optimizing the OpenAI API. It also provides step-by-step instructions for creating functions with different token limits, enabling cost-effective customization.With these functions in place, users can harness the capabilities of OpenAI and Azure OpenAI within Power BI, enhancing data analysis and content generation. For further details and code references, you can explore the provided GitHub repository. Now, armed with these tools, you are ready to explore the potential of OpenAI and Azure OpenAI in your Power BI data projects.Author BioGreg Beaumont is a Data Architect at Microsoft; Greg is an expert in solving complex problems and creating value for customers. With a focus on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg has a unique ability to identify and solve complex challenges. He is a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. For more than 15 years, Greg has worked with healthcare customers who strive to improve patient outcomes and find opportunities for efficiencies. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
Read more
  • 0
  • 0
  • 7013

article-image-build-an-irreverent-news-feed-in-python-and-bard-api
Darren Broemmer
14 Jun 2023
8 min read
Save for later

Build an Irreverent News Feed in Python and Bard API

Darren Broemmer
14 Jun 2023
8 min read
Now that we can invoke Bard from our code using this article, let’s build a news feed application that incorporates Generate AI capability. Not only will it grab the top headlines and stories, but it will also use the Bard API to transform them in a comical way. Note that the Bard API is an unofficial, cutting-edge technology in a rapidly changing area and is not production ready at the time of writing. The first step is to get the list of news stories. First, let’s use the Bard API to do this.As seen above, requests to Google Bard yield a natural language response, just like the web interface does. Thus, depending on your use case, you may need to specify a more formal output format so that your program can consume the data.Specify the output format in your prompt.Our news application could use the prompt, What are the top five news stories today?This will give us a text response with a numbered or bulleted list of stories. Unless we have a text-only user interface, our application will need to parse this out into individual stories for display.Thus, to get a computer-readable set of news stories, use a prompt in the form below. prompt = '''What are the top five news stories today?Structure the output as a list of xml objects each called story.Each story element should have child elements called title, content, and url.''' Here is an example of what this prompt returned. Sure, here are the top five news stories today, structured as a list of XML objects with the attributes title, content, and URL:```xml <stories> <story>    <title>US House Votes to Suspend Debt Ceiling and Avoid Default</title>    <content>The US House of Representatives voted on Thursday to suspend the debt ceiling and avoid a default on the country's debt. The vote was 221-212, with all Democrats and 11 Republicans voting in favor. The Senate is expected to vote on the measure later this week.</content> <url>https://www.cnn.com/2023/06/01/politics/us-debt-ceiling-vote/index.html</url> </story> … </stories> ```Note the pre-canned introduction at the beginning of the response. Multiple attempts to instruct Bard to omit this and only return the XML data were unsuccessful. However, Bard used a markdown format so extracting the substring from the response is fairly easy. Here is the code we used that leverages the prompt shown earlier.def get_bard_headlines():   answer = call_bard(prompt)   # We need to filter out the precanned response and get just the xml   # Bard's response beings with something like:   #   # Sure, here are the top five news stories today, structured as a   # list of XML objects with the attributes title, content, and url:   # ```xml   # <news>   # ...   # </news>   # ```   # Get just the json text from the response   delimeter = "```xml"   idx1 = answer.index(delimeter)   idx2 = answer.index("```", idx1 + 6)   # length of substring 1 is added to   # get string from next character   xml_data = answer[idx1 + len(delimeter) + 1: idx2]   article_list = []   # Parse the XML data   root = ET.fromstring(xml_data)   # Iterate over each 'item' element   for item in root.iter('story'):       title = item.find('title').text       content = item.find('content').text       url = item.find('url').text       article_dict = {'title': title, 'content': content, 'url': url}       article_list.append(article_dict)   return article_list  Why use an XML format and not JSON? In our testing with Bard, it did not always create correctly formatted JSON for this use case. It had trouble when the content included single or double quotation marks. See the example below where it incorrectly wrapped text outside of the quotes, resulting in an invalid JSON that caused an exception during parsing. For this news use case, XML provided reliable results that our application code could easily parse. For other use cases, JSON may work just fine. Be sure to perform testing on different data sets to see what is best for your requirements. Even if you are not familiar with parsing XML using Python, you can ask Bard to write the code for you:At this point, we have the basic news stories, but we want to transform them in a comic way. We can either combine the two steps into a single prompt, or we can use a second step to convert the headlines.One thing to keep in mind is that calls to the Bard API are often slower than typical services. If you use the output from Bard in a high-volume scenario, it may be advantageous to use a background process to get content from Bard and cache that for your web application.Another option to get fast, reliable headlines is to use the GNews API. It allows up to 100 free queries per day. Create an account to get your API token. The documentation on the site is very good, or you can ask Bard to write the code to call the API. It has a top headlines endpoint with several categories to choose from (world, business, technology, etc.), or you can perform a query to retrieve headlines using arbitrary keyword(s). Here is the code to get the top headlines.def get_gnews_headlines(category):   url = f"https://gnews.io/api/v4/top-headlines?category={category}&lang=en&country=us&max=10&apikey={gnews_api_key}"   with urllib.request.urlopen(url) as response:       data = json.loads(response.read().decode("utf-8"))       articles = data["articles"]       return articlesUse Bard to create a web interfaceBottle is a lightweight web server framework for Python. It is quick and easy to create web applications. We use the following prompt to have Bard write code for a news feed web page built using the Bottle framework.  Here is a snapshot of the resulting page. It can use some CSS styling, but not bad for a start. The links take you to the underlying news source and the full article. Use Bard to transform headlinesWith the basic news structure in place, Bard can be used for the creative part of transforming the headlines. The prompt and Python functions to do this are shown below. Bard sometimes returns multiple lines of output, so this function only returns the first line of actual output. reword_prompt = '''Rewrite the following news headline to have a comic twist to it. Use the comic style of Jerry Seinfeld. If the story is about really bad news, then do not attempt to rewrite it and simply echo back the headline. ''' def bard_reword_headline(headline):   answer = call_bard(reword_prompt + headline)   # Split response into lines   lines = answer.split('\n')   # Ignore the first line which is Bard’s canned response   lines = lines[1:]   for line in lines:       if len(line) > 0:           return line.strip("\"")   return "No headlines available" We then incorporate this into our web server code:import bottle import requests from gnews import get_gnews_headlines from bard import bard_reword_headline # Define the Bottle application. app = bottle.Bottle() # Define the route to display the news stories. @app.route("/") def index():   # Get the news articles from the gnews.io API.   # Category choices are: general, world, nation, business, technology, entertainment,   # sports, science, and health   articles = get_gnews_headlines("technology")   # Reword each headline   for article in articles:       reworded_headline = bard_reword_headline(article['title'])       article['comic_title'] = reworded_headline   # Render the template with the news stories.   return bottle.template("news_template.html", articles=articles) # Run the Bottle application. if __name__ == "__main__":   app.run(host="0.0.0.0", port=8080) Here is the result. I think Bard did a pretty good job. SummaryIn conclusion, building an irreverent news feed using Python and the Bard API is a practical and engaging project. By harnessing the power of Python, developers can extract relevant news data and employ creative techniques to inject humor and satire into the feed. The Bard API provides a vast collection of literary quotes that can be seamlessly integrated into the news content, adding a unique and entertaining touch. This project not only demonstrates the versatility of Python but also showcases the potential for innovative and humorous news delivery in the digital age.Author BioDarren Broemmer is an author and software engineer with extensive experience in Big Tech and Fortune 500. He writes on topics at the intersection of technology, science, and innovation.LinkedIn  
Read more
  • 0
  • 0
  • 6513

article-image-taskmatrix-bridging-the-gap-between-text-and-visual-understanding
Rohan Chikorde
28 Jun 2023
11 min read
Save for later

TaskMatrix: Bridging the Gap Between Text and Visual Understanding

Rohan Chikorde
28 Jun 2023
11 min read
IntroductionIn the fast-paced digital landscape of today, the fusion of text and visual understanding has become paramount. As technology continues to advance, the integration of text and visuals has become essential for enhancing communication, problem-solving, and decision-making processes. With the advent of technologies like ChatGPT and Visual Foundation Models, we now have the ability to seamlessly exchange images during conversations and leverage their capabilities for various tasks. Microsoft's TaskMatrix system serves as a revolutionary solution that bridges the gap between text and visual understanding, empowering users to harness the combined power of these domains. TaskMatrix is an innovative system developed by Microsoft, designed to facilitate collaboration between ChatGPT and Visual Foundation Models. By seamlessly integrating text and visual inputs, TaskMatrix enables users to enhance their communication, perform image-related tasks, and extract valuable insights from visual data. In this technical blog, we will explore the functionalities, applications, and technical intricacies of TaskMatrix, providing a comprehensive understanding of its potential and the benefits it offers to users. Through an in-depth analysis of TaskMatrix, we aim to shed light on how this system can revolutionize the way we interact with text and visual elements. By harnessing the power of advanced machine learning models, TaskMatrix opens up new possibilities for communication, problem-solving, and decision-making, ultimately leading to improved user experiences and enhanced outcomes. Let us now dive deep into the world of TaskMatrix and uncover its inner workings and capabilities.Understanding TaskMatrixTaskMatrix is an open-source system developed by Microsoft with the aim of bridging the gap between ChatGPT and Visual Foundation Models. It serves as a powerful platform that enables the integration of image-related tasks within conversations, revolutionizing the way we communicate and solve problems. One of the key features of TaskMatrix is its ability to facilitate image editing. Users can now manipulate and modify images directly within the context of their conversations. This functionality opens up new avenues for creative expression and enables a richer visual experience during communication. Furthermore, TaskMatrix empowers users with the capability of performing object detection and segmentation tasks. By leveraging the advanced capabilities of Visual Foundation Models, the system can accurately identify and isolate objects within images. This functionality enhances the understanding of visual content and facilitates better communication by providing precise references to specific objects or regions of interest. The integration of TaskMatrix with ChatGPT is seamless, allowing users to combine the power of natural language processing with visual understanding. By exchanging images and leveraging the domain-specific knowledge of Visual Foundation Models, ChatGPT becomes more versatile and capable of handling diverse tasks effectively. TaskMatrix introduces the concept of templates, which are pre-defined execution flows for complex tasks. These templates facilitate collaboration between different foundation models, enabling them to work together cohesively. With templates, users can execute multiple tasks seamlessly, leveraging the strengths of different models and achieving more comprehensive results. Moreover, TaskMatrix supports both English and Chinese languages, making it accessible to a wide range of users across different linguistic backgrounds. The system is designed to be extensible, welcoming contributions from the community to enhance its functionalities and expand its capabilities.Key Features and FunctionalitiesTaskMatrix provides users with a wide range of powerful features and functionalities that empower them to accomplish complex tasks efficiently. Let's explore some of the key features in detail:Template-based Execution Flows: One of the standout features of TaskMatrix is its template-based approach. Templates are pre-defined execution flows that encapsulate specific tasks. They serve as a guide for executing complex operations involving multiple foundation models. Templates streamline the process and ensure smooth collaboration between different models, making it easier for users to achieve their desired outcomes.Language Support: TaskMatrix supports multiple languages, including English and Chinese. This broad language support ensures that users from various linguistic backgrounds can leverage the system's capabilities effectively. Whether users prefer communicating in English or Chinese, TaskMatrix accommodates their needs, making it a versatile and accessible platform for a global user base.Image Editing: TaskMatrix introduces a unique feature that allows users to perform real-time image editing within the conversation flow. This capability enables users to enhance and modify images seamlessly, providing a dynamic visual experience during communication. From basic edits such as cropping and resizing to more advanced adjustments like filters and effects, TaskMatrix equips users with the tools to manipulate images effortlessly.Object Detection and Segmentation: Leveraging the power of Visual Foundation Models, TaskMatrix facilitates accurate object detection and segmentation. This functionality enables users to identify and locate objects within images, making it easier to reference specific elements during conversations. By extracting valuable insights from visual content, TaskMatrix enhances the overall understanding and communication of complex concepts.Integration with ChatGPT: TaskMatrix seamlessly integrates with ChatGPT, a state-of-the-art language model developed by OpenAI. This integration enables users to combine the power of natural language processing with visual understanding. By exchanging images and leveraging the strengths of both ChatGPT and TaskMatrix, users can address a wide range of tasks and challenges, ranging from creative collaborations to problem-solving scenarios.Technical ImplementationTaskMatrix utilizes a sophisticated technical implementation that combines the power of machine learning models, APIs, SDKs, and specialized frameworks to seamlessly integrate text and visual understanding. Let's take a closer look at the technical intricacies of TaskMatrix.Machine Learning Models: At the core of TaskMatrix are powerful machine learning models such as ChatGPT and Visual Foundation Models. ChatGPT, developed by OpenAI, is a state-of-the-art language model that excels in natural language processing tasks. Visual Foundation Models, on the other hand, specialize in visual understanding tasks such as object detection and segmentation. TaskMatrix leverages the capabilities of these models to process and interpret both text and visual inputs.APIs and SDKs: TaskMatrix relies on APIs and software development kits (SDKs) to integrate with the machine learning models. APIs provide a standardized way for TaskMatrix to communicate with the models and send requests for processing. SDKs offer a set of tools and libraries that simplify the integration process, allowing TaskMatrix to seamlessly invoke the necessary functionalities of the models.Specialized Frameworks: TaskMatrix utilizes specialized frameworks to optimize the execution and resource management of the machine learning models. These frameworks efficiently allocate GPU memory for each visual foundation model, ensuring optimal performance and fast response times, even for computationally intensive tasks. By leveraging the power of GPUs, TaskMatrix can process and analyze images with speed and accuracy.Intelligent Routing: TaskMatrix employs intelligent routing algorithms to direct user requests to the appropriate model. When a user engages in a conversation that involves an image-related task, TaskMatrix analyzes the context and intelligently determines which model should handle the request. This ensures that the right model is invoked for accurate and relevant responses, maintaining the flow and coherence of the conversation.Seamless Integration: TaskMatrix seamlessly integrates the responses from the visual foundation models back into the ongoing conversation. This integration ensures a natural and intuitive user experience, where the information and insights gained from visual analysis seamlessly blend with the text-based conversation. The result is a cohesive and interactive communication environment that leverages the combined power of text and visual understanding.By combining machine learning models, APIs, SDKs, specialized frameworks, and intelligent routing algorithms, TaskMatrix achieves a technical implementation that seamlessly integrates text and visual understanding. This implementation optimizes performance, resource management, and user experience, making TaskMatrix a powerful tool for enhancing communication, problem-solving, and collaboration. System Architecture:Image 1: System ArchitectureGetting Started with TaskMatrixTo get started with TaskMatrix, you can follow the step-by-step instructions and documentation provided in the TaskMatrix GitHub repository. This repository serves as a central hub of information, offering comprehensive guidelines, code samples, and examples to assist users in setting up and utilizing the system effectively. Access the GitHub Repository: Begin by visiting the TaskMatrix GitHub repository, which contains all the necessary resources and documentation. You can find the repository by searching for "TaskMatrix" on the GitHub platform.Follow the Setup Instructions:The repository provides clear instructions on how to set up TaskMatrix. This typically involves installing the required dependencies, configuring the APIs and SDKs, and ensuring the compatibility of the system with your development environment. The setup instructions will vary depending on your specific use case and the programming language or framework you are using.# clone the repo git clone https://github.com/microsoft/TaskMatrix.git # Go to directory cd visual-chatgpt # create a new environment conda create -n visgpt python=3.8 # activate the new environment conda activate visgpt #  prepare the basic environments pip install -r requirements.txt pip install  git+https://github.com/IDEA-Research/GroundingDINO.git pip install  git+https://github.com/facebookresearch/segment-anything.git # prepare your private OpenAI key (for Linux) export OPENAI_API_KEY={Your_Private_Openai_Key} # prepare your private OpenAI key (for Windows) set OPENAI_API_KEY={Your_Private_Openai_Key} # Start TaskMatrix ! # You can specify the GPU/CPU assignment by "--load", the parameter indicates which # Visual Foundation Model to use and where it will be loaded to # The model and device are separated by underline '_', the different models are separated by comma ',' # The available Visual Foundation Models can be found in the following table # For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0 # You can use: "ImageCaptioning_cpu,Text2Image_cuda:0" # Advice for CPU Users python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu # Advice for 1 Tesla T4 15GB  (Google Colab)                      python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"                               # Advice for 4 Tesla V100 32GB                           python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,    Inpainting_cuda:0,ImageCaptioning_cuda:0,    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,    SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3" Explore Code Samples and Examples: The TaskMatrix repository offers code samples and examples that demonstrate how to use the system effectively. These samples showcase various functionalities and provide practical insights into integrating TaskMatrix into your projects. By exploring the code samples, you can better understand the implementation details and gain inspiration for incorporating TaskMatrix into your own applications.Engage with the Community: TaskMatrix has an active community of users and developers who are passionate about the system. You can engage with the community by participating in GitHub discussions, submitting issues or bug reports, and even contributing to the development of TaskMatrix through pull requests. The community is a valuable resource for support, knowledge sharing, and collaboration.DemoExample 1:Image 2: Demo Part 1  Image 3: Demo Part 2Example 2Image 5: Automatically generated description ConclusionTaskMatrix revolutionizes the synergy between text and visual understanding by seamlessly integrating ChatGPT and Visual Foundation Models. By enabling image-related tasks within conversations, TaskMatrix opens up new avenues for collaboration and problem-solving. With its intuitive template-based execution flows, language support, image editing capabilities, and object detection and segmentation functionalities, TaskMatrix empowers users to efficiently tackle diverse tasks.As the fields of natural language understanding and computer vision continue to evolve, TaskMatrix represents a significant step forward in bridging the gap between text and visual understanding. Its potential applications are vast, spanning industries such as e-commerce, virtual assistance, content moderation, and more. Embracing TaskMatrix unlocks a world of possibilities, where the fusion of text and visual elements enhances human-machine interaction and drives innovation to new frontiers.Author BioRohan Chikorde is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.LinkedIn
Read more
  • 0
  • 0
  • 6456

article-image-personalization-at-scale-using-snowflake-and-generative-ai
Shankar Narayanan
27 Sep 2023
9 min read
Save for later

Personalization at Scale: Using Snowflake and Generative AI

Shankar Narayanan
27 Sep 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionImagine your customers still picking your competitors even after your incredible offerings. It sounds strange.We live in an era where customers want tailored offerings or solutions from businesses. Anticipating and meeting customer needs is no longer enough. Companies must exceed their expectations and create more authentic and long-term customer interactions. At every touchpoint of the customer journey, whether online or on your website, they want a tailored experience.However, exceeding these expectations can be daunting. So, how should businesses do it? Organizations need robust data management solutions and cutting-edge technologies to meet customer needs and offer tailored experiences. One such powerful combination is the use of Snowflake Data Cloud for Generative AI, which allows businesses to craft tailored customer experiences at scale.In this blog, we'll explore how Snowflake's Data Cloud and Generative AI can help achieve this and the importance of Snowflake Industry Cloud and Marketplace Data in this context.Before we hop on to it, let’s understand Generative AI and the importance of personalization.Why is Hyper-Personalization Critical for Businesses?Personalization significantly impacts customer satisfaction, loyalty, and retention rates. Tailored experiences are about more than addressing by their name several times in a message. But it's more about understanding customer needs and preferences and personalizing communication and marketing efforts that lead to higher click-through rates, conversations, and customer satisfaction.Personalization creates a more profound customer connection, drives engagement, and increases conversion rates. However, achieving personalization at scale presents significant challenges, primarily because it relies on lots of data and the ability to process and analyze it quickly.How is Generative AI Powering Personalization?Generative AI is an artificial intelligence technology capable of effectively producing various types of content like text, images, and other media. These generative models usually learn the patterns and structure of their input data and then generate new data or results with similar characteristics.This technology has undoubtedly revolutionized many businesses. And hyper-personalization is one of the reasons behind it. Generative AI doesn't just analyze data but also has the potential to achieve unprecedented levels of personalization.In this ever-evolving landscape of businesses, Generative AI models can constantly seek ways to improve customer engagement and sales through personalization. It tailors every aspect of customer experience and leaves room for real-time interactions.Here’s how it can be helpful to businesses in many ways:1. Dynamic content: It can produce different types of content like emailers, newsletters, social media copy, website content, marketing materials, and more.2. Recommendations: These models can understand and analyze customer behavior and preferences to offer personalized recommendations.3. Chatbots and virtual assistants: If businesses want real-time customer assistance, generative AI-powered virtual assistants can come to the rescue.4. Pricing strategy: Generative AI also helps optimize pricing strategies for customers by understanding and analyzing their browsing history, purchasing behavior, market pricing, and overall customer journey.5. Natural Language Processing (NLP): NLP models can understand and respond to customer inquiries and feedback in a personalized manner, enhancing customer service.Snowflake Data Cloud: A Game-Changer for Data ManagementSnowflake isn't just another technology company. Snowflake is a cloud-based data platform that has revolutionized how organizations manage and utilize their data. It offers several key advantages that are critical for personalization at scale: 1. Data IntegrationSnowflake enables seamless data integration from various sources, including structured and semi-structured data. This data consolidation is crucial for creating a holistic view of customer behavior and preferences. 2. ScalabilitySnowflake's architecture allows for elastic scalability, meaning you can effortlessly handle growing datasets and workloads, making it ideal for personalization efforts that need to accommodate a large user base.3. Data SharingSnowflake's data-sharing capabilities make it easy to collaborate with partners and share data securely, which can be valuable for personalization initiatives involving third-party data.4. SecuritySecurity is paramount when dealing with customer data. Snowflake offers robust security features to protect sensitive information and comply with data privacy regulations.5. Real-time Data ProcessingSnowflake's cloud-native architecture supports real-time data processing, a fundamental requirement for delivering personalized experiences in real-time or near-real-time.Also, to measure the effectiveness of personalization, one can conduct the A/B tests. Let us see an example to understand the same.import numpy as np from scipy import stats # A/B test data (conversion rates) group_a = [0, 1, 1, 0, 1, 0, 0, 1, 0, 0] group_b = [1, 1, 1, 0, 0, 1, 0, 1, 1, 0] # Perform a t-test to compare conversion rates t_stat, p_value = stats.ttest_ind(group_a, group_b) if p_value < 0.05:    print("Personalization is statistically significant.") else:    print("No significant difference observed.") In this way, you can analyze the results of A/B tests. It would help to determine if personalization efforts are statistically significant in improving customer experiences.Snowflake Industry Cloud and Marketplace DataWhile Snowflake's core features make it a powerful platform for data management, its Industry Cloud and Marketplace Data offerings take personalization to the next level:1. Industry CloudSnowflake's Industry Cloud solutions provide industry-specific data models and best practices. This means organizations can quickly adopt personalized solutions tailored to their specific sector, whether healthcare, retail, finance, or any other domain.2. Marketplace DataThe Snowflake Marketplace offers many data sources and tools to augment personalization efforts. This includes third-party data, pre-built machine learning models, and analytics solutions, making enriching customer profiles easier and driving better personalization.Personalization at Scale with Snowflake Data CloudGenerative AI and Snowflake Data Cloud play a pivotal role in revolutionizing businesses. And leveraging the capabilities of the Snowflake cloud data platform, along with Generative AI to scale personalization, can be a game-changer for many industries. Here's how you can accomplish this seamlessly and effectively.3. Data Ingestion Snowflake allows you to ingest data from various sources, including your CRM, website, mobile app, and third-party data providers. This data is stored in a central repository, ready for analysis.4. Data Storage Snowflake's data integration capabilities enable you to consolidate this data, creating a comprehensive customer profile that includes historical interactions, purchase history, preferences, and more. It can handle massive amounts of data, so businesses can easily store and collect data per their needs and preferences.Snowflake’s scalability helps one to handle large databases efficiently. In the SQL snippet, let us see how to create a table for storing and loading data from a CSV file.-- Create a table to store customer data CREATE TABLE customers (    customer_id INT,    first_name VARCHAR,    last_name VARCHAR,    email VARCHAR,    purchase_history ARRAY,    last_visit_date DATE ); -- Load customer data into the table COPY INTO customers FROM 's3://your-data-bucket/customer_data.csv' FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1);5. Machine Learning and Generative AI With your data in Snowflake, you can leverage Generative AI models to analyze customer behavior and generate personalized content, recommendations, and predictions.We can understand this using a Python code.import openai # Your OpenAI API key api_key = "your_api_key" # Customer behavior data customer_history = "Customer recently purchased a laptop and smartphone." # Generate personalized recommendations response = openai.Completion.create(    engine="text-davinci-002",    prompt=f"Based on customer data: {customer_history}, recommend products: ",    max_tokens=50,    n = 5,  # Number of recommendations    stop=None,    temperature=0.7,    api_key=api_key ) recommendations = [choice["text"] for choice in response["choices"]] print(recommendations) Using the OpenAI GPT-3 model, we can generate personalized product recommendations, considering customers' purchase history.6. Real-time Processing Snowflake's real-time data processing capabilities ensure that these personalized experiences are delivered in real-time or with minimal latency, enhancing customer engagement. Let us see how we can utilize Snowflake and a hypothetical real-time recommendation engine:-- Create a view to fetch real-time recommendations using a stored procedure CREATE OR REPLACE VIEW real_time_recommendations AS SELECT    c.customer_id,    c.first_name,    c.last_name,    r.recommendation_text FROM    customers c JOIN    real_time_recommendations_function(c.customer_id) r ON    c.customer_id = r.customer_id; 7. Iterative Improvement Personalization is an ongoing process. Snowflake's scalability and flexibility allow you to continuously refine your personalization strategies based on customer feedback and changing preferences.ConclusionCompetition is kicking off in every industry. Businesses can't just focus on creating products that solve specific problems. Instead, the focus has shifted to personalization in this competitive landscape. It's a must-have and non-negotiable. Businesses cannot afford to avoid it if they want to create excellent customer experiences.This is where Snowflake Data Cloud comes to the rescue. Leveraging this platform along with Generative AI can empower organizations in many ways and help craft tailored customer experiences at scale. If appropriately leveraged, businesses can gain a competitive edge and cater to customers' unique demands by delivering personalized solutions.In today's competitive era, only those businesses that invest in personalized technology and marketing efforts will survive. Thus, it's time to embrace these technologies and use them to your advantage to gain long-term success.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 6257

article-image-generative-ai-building-a-strong-data-foundation
Shankar Narayanan
15 Sep 2023
7 min read
Save for later

Generative AI: Building a Strong Data Foundation

Shankar Narayanan
15 Sep 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionGenerative AI has become increasingly popular among businesses and researchers, which has led to a growing interest in how data supports generative models. Generative AI relies heavily on the quality and diversity of its foundational data to generate new data samples from existing ones. In this blog post, I will explain why a strong data foundation is essential for Generative AI and explore the various methods used to build and prepare data systems. Why Data is Vital for Generative AI?Generative AI models can generate various outputs, from images to text to music. However, the accuracy and performance of these models depend primarily on the quality of the data they are trained on. The models will produce incorrect, biased, or unimpressive results if the foundation data is inadequate. The adage "garbage in, garbage out" is quite relevant here. The quality, diversity, and volume of data used will determine how well the AI system understands patterns and nuances. Methods of Building a Data Foundation for Generative AI To harness the potential of generative AI, enterprises need to establish a strong data foundation. But building a data foundation isn't a piece of cake. Like a killer marketing strategy, building a solid data foundation for generative AI involves a systematic collection, preparation, and management approach. Building a robust data foundation involves the following phases: Data Collection: Collecting data from diverse sources ensures variety. For example, a generative model that trains on human faces should include faces from different ethnicities, ages, and expressions. For example, you can run to collect data from a CSV file in Python.   import pandas as pd data = pd.read_csv('path_to_file.csv') print(data.head())  # prints first 5 rows To copy from a Database, you can use a Python code like this import sqlite3 DATABASE_PATH = 'path_to_database.db' conn = sqlite3.connect(DATABASE_PATH) cursor = conn.cursor() cursor.execute("SELECT * FROM table_name") rows = cursor.fetchall() for row in rows: print(row) conn.close()  Time-Series Data Time-series data is invaluable for generative models focusing on sequences or temporal patterns (like stock prices). Various operations can be performed with the Time series data, such as the one below.  import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load data (assuming a CSV file with 'date' and 'value' columns) df = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date') # Making the Time Series Stationary # Differencing df['first_difference'] = df['value'] - df['value'].shift(1) # Log Transformation (if data is non-stationary after differencing) df['log_value'] = np.log(df['value']) df['log_first_difference'] = df['log_value'] - df['log_value'].shift(1) # 3. Smoothing with Moving Average window_size = 5  # e.g., using a window size of 5 df['moving_avg'] = df['first_difference'].rolling(window=window_size).mean()  Data Cleaning Detecting and managing outliers appropriately is crucial as they can drastically skew AI predictions. Lets see an example of Data Cleaning using Python.  import pandas as pd # Sample data for demonstration data = {    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Alice'],    'Age': [25, 30, np.nan, 29, 25],    'Salary': [50000, 55000, 52000, 60000, 50000],    'Department': ['HR', 'Finance', 'Finance', 'IT', None] } df = pd.DataFrame(data) # Removing duplicates df.drop_duplicates(inplace=True) Handling Missing Values: Accuracy can only be achieved with complete data sets. Techniques like imputation can be used to address gaps. The missing values can be handled for the data, like the following example. import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load data (assuming a CSV file with 'date' and 'value' columns) df = pd.read_csv('time_series_data.csv', parse_dates=['date'], index_col='date') #  Handle Missing Values: Interpolation is one method df['value'].interpolate(method='linear', inplace=True)  Data AugmentationTransformations such as rotating, scaling, or flipping images can increase the volume and diversity of visual data. Sometimes, a little noise (random variations) is added to the data for robustness. We will do some essential data augmentation for the same data presented in the above example.  #  Correcting data types df['Age'] = df['Age'].astype(int)  # Convert float Age to integer # Removing outliers (using Z-score for Age as an example) from scipy import stats z_scores = np.abs(stats.zscore(df['Age'])) df = df[(z_scores < 3)] Data AnnotationAdding descriptions or tags helps AI understand the context. For example, in image datasets, metadata can describe the scene, objects, or emotions present. Having domain experts review and annotate data ensures high fidelity. Data Partitioning Segregating data ensures that models are not evaluated on the same data they are trained on. This technique uses multiple training and test sets to ensure generalized and balanced models. Data Storage & Accessibility Storing data in structured or semi-structured databases makes it easily retrievable. For scalability and accessibility, many organizations opt for cloud-based storage solutions. Generative AI's Need for Data Different Generative AI models require diverse types of data: Images: GANs, used to create synthetic images, rely heavily on large, diverse image datasets. They can generate artwork, fashion designs, or even medical images. Text: Models like OpenAI's GPT series require vast text corpora to generate human-like text. These models can produce news articles, stories, or technical manuals. Audio: Generative models can produce music or speech. They need extensive audio samples to capture nuances. Mixed Modalities: Some models integrate text, image, and audio data to generate multimedia content. ConclusionWe all know the capabilities and potential of generative AI models in various industries and roles like content creation, designing, and problem-solving. But to let it continuously evolve, improve, and generate better results, it's essential to recognize and leverage the correct data.  Enterprises that recognize the importance of data and invest in building a solid data foundation will be well-positioned to harness the creative power of generative AI in future years. As Generative AI advances, the role of data becomes even more critical. Just as a building requires a strong foundation to withstand the test of time, Generative AI requires a solid data foundation to produce meaningful, accurate, and valuable outputs. Building and preparing this foundation is essential, and investing time and resources into it will pave the way for breakthroughs and innovations in the realm of Generative AI. Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 6125
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-babyagi-empowering-task-automation-through-advanced-ai-technologies
Rohan Chikorde
15 Jun 2023
8 min read
Save for later

BabyAGI: Empowering Task Automation through Advanced AI Technologies

Rohan Chikorde
15 Jun 2023
8 min read
In today's rapidly evolving technological landscape, the demand for efficient task automation has grown exponentially. Businesses and individuals alike are seeking innovative solutions to streamline their workflows, optimize resource allocation, and enhance productivity. In response to this demand, a groundbreaking autonomous AI agent called BabyAGI has emerged as a powerful tool for task management.BabyAGI stands out from traditional AI agents by leveraging cutting-edge technologies from OpenAI, Pinecone, LangChain, and Chroma. These technologies work in synergy to provide a comprehensive and versatile solution for automating tasks and achieving specific objectives.This comprehensive blog post aims to delve into the world of BabyAGI, exploring its unique features, functionalities, and potential applications. Additionally, it will provide a step-by-step guide on how to set up and run BabyAGI effectively, allowing users to harness its full potential for task automation. By the end of this blog post, you will have a deep understanding of BabyAGI and the tools required to utilize it successfully. Whether you're a project manager looking to optimize workflows, a content creator seeking innovative ideas, or an individual aiming to streamline personal tasks, BabyAGI can revolutionize your approach to task management.Understanding BabyAGIBabyAGI is an autonomous AI agent that brings together the power of various advanced technologies to revolutionize task automation. By leveraging the capabilities of OpenAI's GPT3.5 or GPT4 models, BabyAGI possesses a remarkable ability to generate creative ideas and insights. One of the distinguishing features of BabyAGI is its focus on brainstorming and ideation. Unlike traditional AI agents that rely on browsing external resources, BabyAGI draws upon its existing knowledge base, which is built using the vast amount of information available up until 2021. By avoiding interruptions caused by web searches, BabyAGI can maintain its train of thought and provide uninterrupted suggestions and solutions.The integration of Pinecone's vector database service is another key component of BabyAGI's capabilities. Pinecone enables BabyAGI to store and manage information in a unique way by converting it into vectors. These vectors capture the essential details of the data, making it easier for BabyAGI to understand and work with. LangChain, another technology incorporated into BabyAGI, enhances its integration potential. This means that BabyAGI can be seamlessly integrated into existing systems, allowing users to leverage its functionalities alongside other tools and resources.Chroma, the final technology in BabyAGI's arsenal, brings an innovative approach to the table. Chroma focuses on efficient task management, helping users prioritize and optimize their workflows. By automating repetitive and time-consuming tasks, BabyAGI frees up valuable resources and allows users to focus on more strategic and value-added activities. The combination of these advanced technologies in BabyAGI makes it a versatile and powerful tool for various industries and individuals. Project managers can leverage BabyAGI's brainstorming capabilities to generate new ideas and innovative approaches. Content creators can tap into BabyAGI's creativity to enhance their creative process and produce engaging content. Individuals can rely on BabyAGI to automate mundane tasks, freeing up time for more fulfilling activities.BabyAGI represents a significant leap forward in task automation. By combining the capabilities of OpenAI's models, Pinecone's vector database, LangChain's integration potential, and Chroma's efficiency, BabyAGI offers a unique and powerful solution for brainstorming, ideation, and task management. Its ability to generate creative ideas, prioritize tasks, and optimize workflows sets it apart as a revolutionary tool in the realm of AI-driven automation.How It WorksThe script operates through an endless loop, executing the following sequence:Retrieves the initial task from the list of tasks.Dispatches the task to the execution agent, which leverages OpenAI's API to accomplish the task within the given context.Enhances the obtained result and archives it in Chroma/Weaviate.Chroma is an open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML models, and scale seamlessly into billions of data objects. Generates fresh tasks and reorganizes the task list based on the objective and the outcome of the preceding task.Image 1: Working, Pic Credits: https://github.com/yoheinakajima/babyagiThe function execution_agent() utilizes the OpenAI API, employing two parameters: the objective and the task. It sends a prompt to the OpenAI API, which in turn provides the task result. The prompt consists of a description of the AI system's task, the objective, and the task itself. The returned result is a string.In the function task_creation_agent(), OpenAI's API is employed to generate new tasks based on the objective and the result of the previous task. The function takes four parameters: the objective, the previous task's result, the task description, and the current task list. It sends a prompt to the OpenAI API, which response with a list of new tasks as strings. The function returns the new tasks as a list of dictionaries, where each dictionary contains the task name.The prioritization_agent() function utilizes the OpenAI API to rearrange the task list based on priority. It accepts the ID of the current task as a parameter. It sends a prompt to the OpenAI API, receiving the reprioritized task list as a numbered list.Lastly, the script utilizes Chroma/Weaviate to store and retrieve task results for context. It creates a Chroma/Weaviate collection using the specified table name in the TABLE_NAME variable. The script employs Chroma/Weaviate to store task results in the collection, including the task name and any additional metadata.Setting Up BabyAGITo unleash the power of BabyAGI, the first step is to ensure that Python and Git are installed on your system. Python serves as the programming language for BabyAGI, while Git enables easy access to the necessary files and updates. If you don't have Python and Git installed, you can obtain them from their respective sources. For Python, head over to python.org and download the latest version compatible with your operating system. Follow the installation instructions provided by the Python installer to set it up on your machine. It is recommended to choose the option to add Python to your system's PATH during installation for easier accessibility.For Windows users, Git can be obtained by visiting the official Git website. Download the appropriate installer for your system and execute it. The installer will guide you through the installation process, allowing you to choose the desired options. Select the option to add Git to your system's PATH for convenient usage. Once you have Python and Git installed, the next crucial step is to acquire an OpenAI API Key. This key grants you access to OpenAI's powerful GPT3.5 or GPT4 models, which are essential for BabyAGI's operation. To obtain an API Key, you need a paid OpenAI account.Visit the OpenAI website and log in to your account. Navigate to the API section and follow the instructions to obtain your API Key. It is crucial to keep this key secure as it provides access to powerful AI models and should not be shared with unauthorized individuals.  Image 2: OpenAI API KeyWith Python, Git, and the OpenAI API Key in place, you are now ready to proceed with setting up BabyAGI. The next step involves downloading the BabyAGI source code from its GitHub repository. Visit the GitHub page and locate the download option.Clone the repository via git clone https://github.com/yoheinakajima/babyagi.git and cd into the cloned repository. Install the required packages: pip install -r requirements.txtCopy the .env.example file to .env: cp .env.example .env. This is where you will set the following variables.Set your OpenAI API key in the OPENAI_API_KEY and OPENAPI_API_MODEL variables. In order to use with Weaviate you will also need to set up additional variables detailed here.             Set the name of the table where the task results will be stored in the TABLE_NAME variable.(Optional) Set the name of the BabyAGI instance in the BABY_NAME variable.(Optional) Set the objective of the task management system in the OBJECTIVE variable.(Optional) Set the first task of the system in the INITIAL_TASK variable.Run the script: python babyagi.pyNote: The intention behind this script is to run it continuously as an integral component of a task management system. It is crucial to exercise responsible usage due to the potential for high API utilization when running the script continuously.ConclusionBabyAGI is a revolutionary AI agent that brings together the strengths of OpenAI, LangChain, and Chroma to deliver a powerful tool for task automation. With its ability to generate creative ideas, optimize workflows, and enhance decision-making, BabyAGI is transforming the way professionals approach task management. By automating repetitive tasks and providing valuable insights, it empowers project managers, content creators, and decision-makers to achieve better outcomes and maximize productivity. As technology continues to evolve, BabyAGI stands at the forefront of innovation, paving the way for more efficient and intelligent task management in a wide range of industries.Author BioRohan is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.LinkedIn
Read more
  • 0
  • 0
  • 5914

article-image-automating-data-enrichment-with-snowflake-and-ai
Shankar Narayanan
30 Oct 2023
9 min read
Save for later

Automating Data Enrichment with Snowflake and AI

Shankar Narayanan
30 Oct 2023
9 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionIn today’s data-driven world, businesses constantly seek ways to extract more value from their data. One of the key strategies to accomplish this is Data Enrichment.Data Enrichment involves enhancing your existing datasets with additional information, which can lead to improved decision-making, customer engagement, and personalized experiences. In this blog, we’ll explore how to automate data enrichment using Snowflake, a powerful data warehousing platform, and Generative AI techniques.Understanding Data EnrichmentData Enrichment is simply the practice of enhancing your existing datasets with additional and relevant information. This supplementary data can include demographic data, geographic data, social media profiles, and much more. The primary goal is to improve the quality and depth of your data - making it more valuable for analytics, reporting, and decision-making.Why Automate Data Enrichment?Automating data enrichment not only saves time and resources but also improves data quality, supports real-time updates, and helps organizations stay competitive in an increasingly data-centric world. Whether in e-commerce, finance, healthcare, marketing, or any other industry, automation can be a strategic advantage that allows you to extract greater value from your data.EfficiencyManual data enrichment is time-consuming and resource-intensive. Automation allows you to process large volumes of data rapidly, reducing the time and effort required.ConsistencyHuman errors are common when manually enriching data. Automation ensures the process is consistent and accurate, reducing the risk of errors affecting decision-making.ScalabilityAs your organization grows and accumulates more data, automating data enrichment ensures you can handle larger datasets without a proportional increase in human resources.Enhanced Data QualityAutomated processes can validate and cleanse data, leading to higher data quality. High-quality data is essential for meaningful analytics and reporting.Competitive AdvantageIn a competitive business landscape, having access to enriched and up-to-date data can give you a significant advantage. It allows for more accurate market analysis, better customer understanding, and smarter decision-making.PersonalizationAutomated data enrichment can support personalized customer experiences, which are increasingly crucial for businesses. It allows you to tailor content, product recommendations, and marketing efforts to individual preferences and behaviors.Cost-EfficiencyWhile there are costs associated with setting up and maintaining automated data enrichment processes, these costs can be significantly lower in the long run compared to manual efforts, especially as the volume of data grows.Compliance and Data SecurityAutomated processes can be designed to adhere to data privacy regulations and security standards, reducing the risk of data breaches and compliance issues.ReproducibilityAutomated data enrichment processes can be documented, version-controlled, and easily reproduced, making it easier to audit and track changes over time.Data VarietyAs the sources and formats of data continue to expand, automation allows you to efficiently handle various data types, whether structured, semi-structured, or unstructured.Snowflake for Data EnrichmentSnowflake, a cloud-based data warehousing platform, provides powerful features for data manipulation and automation. Snowflake at the basic can be used to:Create tables for raw data and enrichment data.Load data into these tables using the COPY INTO command.Create views to combine raw and enrichment data based on common keys.Code Examples: Snowflake Data EnrichmentCreate TablesIn Snowflake, create tables for your raw data and enrichment data with: -  Create a table for raw data CREATE OR REPLACE TABLE raw_data (             Id INT,             name  STRING,             email  STRING ); -  Create a table for enrichment data CREATE OR REPLACE TABLE enrichment_data (         email  STRING,         location STRING,         age INT ); Load Data: Loading raw and enrichment data into their respective tables. -  Load raw data COPY INTO raw_data (id, name, email) FROM @<raw_data_stage>/raw_data.csv FILE_FORMAT = (TYPE = CSV); -  Load enrichment data COPY INTO enrichment_data (email, location, age) FROM @<enrichment_data_stage>/enrichment_data.csv FILE_FORMAT = (TYPE = CSV);Automate Data EnrichmentCreate a view that combines raw and enrichment data.-  Create a view that enriches the raw data CREATE OR REPLACE VIEW enriched_data AS SELECT    rd.id,    rd.name,    ed.location,    ed.age, -  Use generative AI to generate a description for the enriched date <Generative_AI_function> (ed.location, ed.age) AS description FROM       raw_data  rd JOIN      enrichment_data ed ON      rd.email = ed.email;Leveraging Snowflake for Data EnrichmentUsing Snowflake for data enrichment is a smart choice, especially if your organization relies on this cloud-based data warehousing platform. Snowflake provides a robust set of features for data manipulation and automation, making it an ideal environment to enhance the value of your data. Here are a few examples of how you can use Snowflake for data enrichment:Data Storage and ManagementSnowflake allows you to store and manage your data efficiently by separating storage and computing resources, which provides a scalable and cost-effective way to manage large data sets. You can store your raw and enriched data within Snowflake, making it readily accessible for enrichment processes.Data EnrichmentYou can perform data enrichment by combining data from your raw and enrichment tables. By using SQL JOIN operations to bring together related data based on common keys, such as email addresses.-        Create a view that enriches the raw data CREATE OR REPLACE VIEW enriched_data AS SELECT    rd.id,    rd.name,    ed.location,    ed.age, FROM       raw_data  rd JOIN      enrichment_data ed ON      rd.email = ed.email;Schedule UpdatesAutomating data enrichment by the creation of scheduled tasks within Snowflake. You can set up tasks to run at regular intervals, ensuring that your enriched data remains up to date.- Example: Creating a scheduled task to update enriched data CREATE OR REPLACE TASK update_enriched_data WAREHOUSE = <your_warehouse> SCHEDULE = ‘1 DAY’ AS INSERT INTO enriched_data (id, name, location, age) SELECT     rd.id,     rd.name,     ed.location,     ed.age FROM      raw_data rd JOIN     enrichment_data ed ON    rd.email = ed.email;Security and ComplianceSnowflake provides robust security features and complies with various data privacy regulations. Ensure that your data enrichment processes adhere to the necessary security and compliance standards to protect sensitive information.Monitoring and OptimizationRegularly monitoring the performance of your data enrichment processes. Snowflake offers tools for monitoring query execution so you can identify and address any performance bottlenecks. Optimization here is one of the crucial factors to ensure efficient data enrichment.Real-World ApplicationsData Enrichment is a powerful tool that stands for versatility in its real-world applications. Organizations across various sectors use it to improve their data quality, decision-making process, customer experiences, and overall operational efficiency. By augmenting their datasets with additional information, these organizations gain a competitive edge and drive innovation in their respective industries:E-commerce and RetailProduct Recommendations: E-commerce platforms use data enrichment to analyze customer browsing and purchase history. These enriched customer profiles help generate personalized product recommendations, increasing sales and customer satisfaction.Inventory Management: Retailers leverage enriched supplier data to optimize inventory management, ensuring they have the right products in stock at the right time.Marketing and AdvertisingCustomer Segmentation: Marketers use enriched customer data to create more refined customer segments. This enables them to tailor advertising campaigns and messaging for specific audiences, leading to higher engagement rates.Ad Targeting: Enriched demographic and behavioral data supports precise ad targeting. Advertisers can serve ads to audiences most likely to convert, reducing ad spend wastage.Financial ServicesCredit Scoring: Financial institutions augment customer data with credit scores, employment history, and other financial information to assess credit risk more accurately.Fraud Detection: Banks use data enrichment to detect suspicious activities by analyzing transaction data enriched with historical fraud patterns.HealthcarePatient Records: Healthcare providers enhance electronic health records (EHR) with patient demographics, medical histories, and test results. This results in comprehensive and up-to-date patient profiles, leading to better care decisions.Drug Discovery: Enriching molecular and clinical trial data accelerates drug discovery and research, potentially leading to breakthroughs in medical treatments.Social Media and Customer SupportSocial Media Insights: Social media platforms use data enrichment to provide businesses with detailed insights into their followers and engagement metrics, helping them refine their social media strategies.Customer Support: Enriched customer profiles enable support teams to offer more personalized assistance, increasing customer satisfaction and loyalty.ConclusionAutomating data enrichment with Snowflake and Generative AI is a powerful approach for businesses seeking to gain a competitive edge through data-driven insights. By combining a robust data warehousing platform with advanced AI techniques, you can efficiently and effectively enhance your datasets. Embrace automation, follow best practices, and unlock the full potential of your enriched data.Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.
Read more
  • 0
  • 0
  • 5146

article-image-improvising-your-infrastructure-as-code-using-ai-tools
Russ McKendrick
11 Jun 2023
6 min read
Save for later

Improvising your Infrastructure as Code using AI-tools

Russ McKendrick
11 Jun 2023
6 min read
In Infrastructure as Code for Beginners, I briefly discussed the role of AI tools such as ChatGPT. Since then, new features have emerged, making this the perfect time to revisit the topic and evaluate whether my initial recommendations have evolved. Let's quickly recap: GPT-powered chat services, including ChatGPT, certainly have their place in Infrastructure as Code (IaC) projects. However, we are not quite at the point where we can ask these tools to create our projects from scratch. The primary reason for this is straightforward: GPT, while impressive, can make mistakes. This is fine, except that when GPT errors, it often does so by inventing information, which it then presents in a highly convincing manner. On the one hand, you have Infrastructure as Code which has been taking away the heavy lifting of managing infrastructure for quite a while; a lot of the tooling is well established even though it is still considered a fast-moving technology to most.On the contrary,  artificial intelligence tools such as ChatGPT show up and quite dramatically change what we thought was possible even just a year ago, impacting how we create, test, and maintain code. Here are just some of the use cases I have been using the two together for in the last few months.●      Code Quality: You can have ChatGPT check your code and make recommendations,●      Cutting Down on Human Error: ChatGPT can double-check and make sure you are not doing something dumb if you ask it to.●      Boosting Productivity: You can ask ChatGTP to add inline comments to your code, update variables, and handle the boring stuff. Remember, we're just scratching the surface here. The combo of AI and IaC is still fresh, and there's a lot of territory left to explore.Improvising your Terraform code with ChatGPTRather than talk anymore, let’s take a look at implementing of the use cases mentioned above, to start with let’s have ChatGPT review an example Terraform block and make some recommendations on how it could be improved.Here is the prompt that we can use: I have the following Terraform code; it creates a Storage Account in Microsoft Azure; resource "azurerm_storage_account" "example" {name                     = "saiacforbeg2022111534"resource_group_name      = azurerm_resource_group.example.namelocation                 = azurerm_resource_group.example.locationaccount_tier             = "Standard"account_replication_type = "GRS"} Could you check it and provide recommendations on how it could be improved? The prompt above made the following recommendations, all of which are valid observations: Image 1: Recommendations that ChatGPT gave me.As well as rewriting the code considering its recommendations, it also recognized that we are deploying an Azure Storage Account and pointed out the following: Additionally, please remember that the storage account names in Azure need to be globally unique and must be between 3 and 24 characters in length, and can include numbers and lowercase letters only. Thus, you might want to consider adding checks to ensure that these constraints are met. Now let’s ask ChatGPT to add some inline comments which explain what is going to the main block of code using the following prompt :Please add inline comments explaining what is happening to the code below … ChatGPT gave the following response:Image 2: Fully commented codeFinally, we can ask ChatGTP to check and make some suggestions about how we could better secure our Storage Account using the following prompt:Image 3:  list of recommendations for improving our Terraform code.Let’s see if ChatGPT could update the code with some of those suggestions with the following prompt and also have a look at the result: Image 4:  recommendations within the context of the original Terraform code. GPT vs Bing vs BardHow do other AI Chatbots handle our original prompt, let’s start with Bing; Image 5: Bing Chat had to say about our Terraform code.Well, that wasn’t helpful at all - let’s now see if Google Bard fares any better:Image 6: Google Bard had to say about our Terraform code.It didn’t just give us a thumbs up, which is a good start. It made similar recommendations to ChatGPT. However, while Bard made some suggestions, they were not as detailed, and it didn’t point out anything outside of what we asked it to, unlike ChatGPT, which recognized we were creating a storage account and gave us some additional pointers.Now don’t write Microsoft off just yet. While Bing didn’t blow me away with its initial responses, Microsoft’s various Copilot services, all of which are launching soon, all of which are GPT4 based, I expect them to be up to the same level of responsiveness of ChatGTP and in the case of GitHub Copilot X I expect it to potentially surpass ChatGTP in terms of usefulness and convenience as it is going to be built directly into your IDE and have all of GitHub to learn from.Given the rate of change, which feels like new tools and features and power is being added every other day now - the next 6 to 12 months is going to be an exciting time for anyone writing Infrastructure as Code as the tools we are using day-to-day start get more direct access to the AI tools we have been discussing.SummaryIn conclusion, this article delved into the process of improving code quality through various means, including seeking recommendations, incorporating inline code comments, and leveraging the capabilities of ChatGPT, Bing, and Google Bard. The demonstration emphasized the importance of continuously enhancing code and actively involving AI-powered tools to identify areas for improvement. Additionally, the article explored how ChatGPT's insights and suggestions were utilized to enhance storage security through stronger and more reliable code implementation. By employing these techniques, developers can achieve code optimization and bolster the overall security of their storage systems.Author BioRuss McKendrick is an experienced DevOps practitioner and system administrator with a passion for automation and containers. He has been working in IT and related industries for the better part of 30 years. During his career, he has had responsibilities in many different sectors, including first-line, second-line, and senior support in client-facing and internal teams for small and large organizations.He works almost exclusively with Linux, using open-source systems and tools across dedicated hardware and virtual machines hosted in public and private clouds at Node4, where he holds the title of practice manager (SRE and DevOps). He also buys way too many records!Author of the book: Infrastructure as Code for Beginners
Read more
  • 0
  • 0
  • 4973

article-image-use-bard-to-implement-data-science-projects
Darren Broemmer
20 Jun 2023
7 min read
Save for later

Use Bard to Implement Data Science Projects

Darren Broemmer
20 Jun 2023
7 min read
Bard is a large language model (LLM) from Google AI, trained on a massive dataset of text and code. Bard can be used to generate Python code for data science projects. This can be extremely helpful for data scientists who want to save time on coding or are unfamiliar with Python. It also empowers those of us who are not full-time data scientists but have an interest in leveraging machine learning (ML) technologies.The first step is to define the problem you are trying to solve. In this article, we will use Bard to create a binary text classifier. It will take a news story as input and classify it as either fake or real. Given a problem to solve, you can brainstorm solutions. If you are familiar with machine learning technologies, you are able to do this yourself. Alternatively, you can ask Bard for help in finding an appropriate algorithm that meets your requirements. The classification of text documents often uses term frequency techniques. We don’t need to know more than that at this point, as we can have Bard help us with the details of the implementation.The overall design of your project could also involve feature engineering and visualization methods. As in most software engineering efforts, you will likely need to iterate on the design and implementation. However, Bard can help you do this much faster.Reading and parsing the training dataAll of the code from this article can be found on GitHub. Bard will guide you, but to run this code, you will need to install a few packages using the following commands.python -m pip install pandas python -m pip install scikit-learn To train our model, we can use the news.csv data set within this project found here, originally sourced from a Data Flair training exercise. It contains the title and text of almost 8,000 news articles labeled as REAL or FAKE.To get started , Bard can help us write code to parse and read this data file. Pandas is a popular open-source data analysis and manipulation tool for Python. We can prompt Bard to use this library to read the file.Image 1: Using Pandas to read the fileRunning the code shows the format of the csv file and the first few data rows, just as Bard described. It has an unnamed article id in the first column followed by the title, text, and classification label.broemmerd$ python test.py   Unnamed: 0                                              title                                              text label 0        8476                       You Can Smell Hillary’s Fear  Daniel Greenfield, a Shillman Journalism Fello...  FAKE 1       10294  Watch The Exact Moment Paul Ryan Committed Pol...  Google Pinterest Digg Linkedin Reddit Stumbleu...  FAKE 2        3608        Kerry to go to Paris in gesture of sympathy  U.S. Secretary of State John F. Kerry said Mon...  REAL 3       10142  Bernie supporters on Twitter erupt in anger ag...  — Kaydee King (@KaydeeKing) November 9, 2016 T...  FAKE 4         875   The Battle of New York: Why This Primary Matters  It's primary day in New York and front-runners...  REALTraining our machine learning modelNow that we can read and understand our training data, we can prompt Bard to write code to train an ML model using this data. Our prompt is detailed regarding the input columns in the file used for training. However, it specifies a general ML technique we believe is applicable to the solution.The text column in the news.csv contains a string with the content from a new article. The label column contains a classifier label of either REAL or FAKE. Modify the Python code to train a machine-learning model using term frequency based on these two columns of data. Image 2: Bard for training ML modelsWe can now train our model. The output of this code is shown below: broemmerd$ python test.py Accuracy: 0.9521704814522494We have our model working. Now we just need a function that will apply it to a given input text. We use the following prompt. Modify this code to include a function called classify_news that takes a text string as input and returns the classifier, either REAL or FAKE.Bard generates the following code for this function. Note that it also refactored the previous code to include the use of the TfidfVectorizor in order to support this function.Image 3: Including classify_news function Testing the classifierTo test the classifier with a fake story, we will use an Onion article entitled “Chill Juror Good With Whatever Group Wants To Do For Verdict.” The Onion is a satirical news website known for its humorous and fictional content. Articles in The Onion are intentionally crafted to appear as genuine news stories, but they contain fictional, absurd elements for comedic purposes.Our real news story is a USA Today article entitled “House blocks push to Censure Adam Schiff for alleging collusion between Donald Trump and Russia.”Here is the code that reads the two articles and uses our new function to classify each one. The results are shown below.with open("article_the_onion.txt", "r") as f:   article_text = f.read()   print("The Onion article: " + classify_news(article_text)) with open("article_usa_today.txt", "r") as f:   article_text = f.read()   print("USA Today article: " + classify_news(article_text)) broemmerd$ python news.py Accuracy: 0.9521704814522494 The Onion article: FAKE USA Today article: REALOur classifier worked well on these two test cases.Bard can be a helpful tool for data scientists who want to save time on coding or who are not familiar with Python. By following a process similar to the one outlined above, you can use Bard to generate Python code for data science projects.Additional guidance on coding with BardWhen using Bard to generate Python code for data science projects, be sure to use clear and concise prompts. Provide the necessary detail regarding the inputs and the desired outputs. Where possible, use specific examples. This can help Bard generate more accurate code. Be patient and expect to go through a few iterations until you get the desired result. Test the generated code at each step in the process. It will be difficult to determine the cause of errors if you wait until the end to start testing.Once you get familiar with the process, you can use Bard to generate Python code that can help you solve data science problems more quickly and easily.Author BioDarren Broemmer is an author and software engineer with extensive experience in Big Tech and Fortune 500. He writes on topics at the intersection of technology, science, and innovation.LinkedIn  
Read more
  • 0
  • 0
  • 4957
article-image-building-a-gan-model-in-tensorflow-easily-with-autogpt
Rohan Chikorde
08 Jun 2023
7 min read
Save for later

Building a GAN model in TensorFlow easily with AutoGPT

Rohan Chikorde
08 Jun 2023
7 min read
AutoGPT is a large language model that can be used to perform a variety of tasks, such as generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. To use AutoGPT, you first need to specify a goal in natural language. AutoGPT will then attempt to achieve that goal by breaking it down into sub-tasks and using the internet and other tools in an automatic loop.In this blog, we will be using godmode.space for the demo. Godmode is a web platform to access the powers of AutoGPT and Baby AGI. AI agents are still in their infancy, but they are quickly growing in capabilities, and hope that Godmode will enable more people to tap into autonomous AI agents even in this early stage.First, you would need to create your API key and Install AutoGPT as shown in this article (Set Up and Run Auto-GPT with Docker). Once you set up your API key, now click on the Settings tab in godmode.space website and put in your API key. Once you click on Done, the following screen should appear:Image 1: API KeySpecifying the GoalWhen you specify a goal in natural language, AutoGPT will attempt to understand what you are asking for. It will then break down the goal into sub-tasks and if needed it will use the internet and other tools to achieve the goal.In this example, we will tell AutoGPT to Build a Generative Adversarial Network (GAN) from Scratch with TensorFlow.Automatically breaking down a goal into sub-tasksAutoGPT breaks down a goal into sub-tasks by identifying the steps that need to be taken to achieve the goal. For example, if the goal is to build a GANs, AutoGPT will break the goal down into the following sub-tasks (these goals will change based on the user’s prompt):·        Define the generator and discriminator architectures.·        Implement the training loop for the GAN.·        Evaluate the performance of the GAN on a test dataset.Here is an example of how you can use AutoGPT to “Build a Generative Adversarial Network (GAN) from Scratch with TensorFlow”: Specify a goal by adding a simple prompt Build a Generative Adversarial Network (GAN) from Scratch with TensorFlow                                                       Image 2: Specifying the final goalThe provided functionality will present suggested options, and we will choose all the available options. Additionally, you have the flexibility to provide your own custom inputs if desired as shown in the preceding image:Image 3: Breaking down the goal into smaller tasks  Break down the goal into sub-tasks as follows:  Define the generator and discriminator architectures.Write the code for a Generative Adversarial Network (GAN) to a file named gan.py. Implement the training loop for the GAN. Write text to a file named 'gan.py' containing code for a GAN model.Evaluate the performance of the GAN on a test dataset.Image 4: Broken down sub-taskNext, we will launch the AutoGPT in the God Mode.Image 5: Launching AutoGPT in God ModeAs observed, the agent has initiated and commenced by sharing its thoughts and reasoning and the proposed action as shown in the two images below:Images 6: Plan proposed by Auto-GPTConsidering the provided insights and rationale, we are in favor of approving this plan. It is open to receiving input and feedback, which will be considered to refine the subgoals accordingly.After approving the plan, the following are the results:Image 7: AutoGPT providing thoughts and reasoning for the task 8.      According to the next task specified, the next plan suggested by the AutoGPT is to write the entire code in gan.py file:Image 8: Plan to copy the entire code to a file On approving this plan, the action will commence and we will find the following screen as output after the task completion:Image 9: Thoughts and reasoning for the previous taskUpon opening the file, we can observe that the initial structure has been created. The necessary submodules have been generated, and the file is now prepared to accommodate the addition of code. Additionally, the required libraries have been determined and imported for use in the file”Image 10: Initial structure created inside the fileWe will proceed with AutoGPT generating the complete code and then review and approve the plan accordingly.Following is the snapshot of the generated output code, we can observe that AutoGPT has successfully produced the complete code and filled in the designated blocks according to the initial structure provided:(Please note that the snapshot provided showcases a portion of the generated output code. It is important to keep in mind that the complete code is extensive and cannot be fully provided here due to its size)Image 11: Final output AutoGPT Industry-specific use casesData Science and Analytics: AutoGPT can be used to automate tasks such as data cleaning, data wrangling, and data analysis. This can save businesses time and money, and it can also help them to gain insights from their data that they would not have been able to obtain otherwise.Software Development: AutoGPT can be used to generate code, which can help developers to save time and improve the quality of their code. It can also be used to create new applications and features, which can help businesses to stay ahead of the competition.Marketing and Content Creation: AutoGPT can be used to generate marketing materials such as blog posts, social media posts, and email campaigns. It can also be used to create creative content such as poems, stories, and scripts. This can help businesses to reach a wider audience and to engage with their customers in a more meaningful way.Customer Service: AutoGPT can be used to create chatbots that can answer customer questions and resolve issues. This can help businesses to provide better customer service and to save money on labor costs.Education: AutoGPT can be used to create personalized learning experiences for students. It can also be used to generate educational content such as textbooks, articles, and videos. This can help students to learn more effectively and efficiently.ConclusionAutoGPT is a powerful AI tool that has the potential to revolutionize the way we work and live. It can be used to automate tasks, generate creative content, and solve problems in a variety of industries. As AutoGPT continues to develop, it will become even more powerful and versatile. This makes it an incredibly exciting tool with the potential to change the world.Author BioRohan Chikorde is an accomplished AI Architect professional with a post-graduate in Machine Learning and Artificial Intelligence. With almost a decade of experience, he has successfully developed deep learning and machine learning models for various business applications. Rohan's expertise spans multiple domains, and he excels in programming languages such as R and Python, as well as analytics techniques like regression analysis and data mining. In addition to his technical prowess, he is an effective communicator, mentor, and team leader. Rohan's passion lies in machine learning, deep learning, and computer vision.LinkedIn
Read more
  • 0
  • 0
  • 4629

article-image-simplifying-code-review-using-google-bard
Darren Broemmer
11 Jun 2023
7 min read
Save for later

Simplifying Code Review using Google Bard

Darren Broemmer
11 Jun 2023
7 min read
Code reviews are an essential part of the software development process. They help improve the quality of code by identifying potential bugs and suggesting improvements. They ensure that the code is readable and maintainable. However, conducting code reviews takes time out of an engineer's busy schedule. There is also the potential that latent issues in the code may be overlooked. Humans do make mistakes after all. This is where Google Bard comes into play.Bard is a tireless and efficient code reviewer that you can put to work anytime. It is a Large Language Model (LLM) that was trained on a massive dataset of text and code. White Bard is still listed as experimental, it has learned how to review and explain code written in many different programming languages.How can Google Bard be used for code reviews?Code reviews typically involve three basic steps: Understand: The reviewer first reads the code and comprehends its purpose and structure.Identify: Potential bugs or issues are noted. Gaps in functionality may be called out. Error handling is reviewed, including edge cases that may not be handled properly.Optimize: Areas of improvement are noted, and refactoring suggestions are made. These suggestions may be in terms of maintainability, performance, or other area Fortunately, Bard can help in all three code review phases.Ask Bard for an explanation of the codeBard can generate a summary description that explains the purpose of the code along with an explanation of how it works. Simply ask Bard to explain what the code does. Code can be directly included in the prompt, or you can reference a GitHub project or URL. Keep in mind, however, that you will get more detailed explanations when asking about smaller amounts of code. Likewise, if you use the following prompt asking about the BabyAGI repo, you will get a summary description.Image 1: Google Bard’s explanation on Baby AGI The interesting thing about Bard’s explanation is that not only did it explain the code in human terms, but it also provided a pseudo-code snippet to make it easy to comprehend. This is impressive and extremely useful as a code reviewer. You have a roadmap to guide you through the rest of the review. Developers who are unfamiliar with the code or who need to understand it better can leverage this Bard capability.Given that LLMs are very good at text summarization, it is not surprising that they can do the same for code. This gives you a foundation for understanding the AGI concept and how BabyAGI implements it. Here is a highlight of the overview code Bard provided:def create_plan(text):    """Creates a plan that can be executed by the agent."""    # Split the text into a list of sentences.    sentences = text.split(".")    # Create a list of actions.    actions = []    for sentence in sentences:        action = sentence.strip()       actions.append(action)    return actions def main():    """The main function."""    # Get the task from the user.    task = input("What task would you like to complete? ")    # Generate the text that describes how to complete the task.    text = get_text(task)    # Create the plan.    plan = create_plan(text)    # Print the plan.    print(plan) if __name__ == "__main__":    main()BabyAGI uses a loop to continue iterating on nested tasks until it reaches its goal or is stopped by the user. Given a basic understanding of the code, reviews become much more effective and efficient.Identifying bugs using BardYou can also ask Bard to identify potential bugs and errors. Bard will analyze the code and look for patterns associated with bugs. It will point out edge cases or validations that may be missing. It also looks for code that is poorly formatted, has unused variables, or contains complex logic. Let’s have Bard review its own summary code shown above. The get_text function has been modified to use the Bard API. Our review prompt is as follows. It includes the code to review inline. Review the following Python code for potential bugs or errors: import requests import json def get_text(task):    """Generates text that describes how to complete a given task."""     bard = Bard()     answer = bard.get_answer(task)     return answer['content'] def create_plan(text):    """Creates a plan that can be executed by the agent."""    # ... remainder of code omitted here for brevity ...Image 2: Google Bard’s explanation of reviewed code Bard pointed out a number of potential issues. It noted that error handling is missing around the Bard API call. It commented that input validations should be added, as well as default values that could make the code more user-friendly.Optimizations using Google BardBard already provided some suggested improvements during its review. You can also ask Bard specifically to suggest optimizations to your code. Bard does this by analyzing the code and looking for opportunities to improve its efficiency, reliability, or security. For example, Bard can suggest ways to simplify code, improve performance, or make it more secure. Let’s do this for the main BabyAGI Python script. Instead of referencing the GitHub repository as we did in the explanation phase, we provide the main loop code inline so that Bard can provide more detailed feedback. The structure of the prompt is similar to the prior phase. Review the following Python code for potential optimizations:def main():    loop = True    while loop:        # As long as there are tasks in the storage…        # ... remainder of function omitted here for brevity ... Image 3: Google Bard’s explanation of reviewed code While the print suggestion is not particularly helpful, Bard does make an interesting suggestion regarding the sleep function. Beyond pointing out that it may become a bottleneck, it highlights that the sleep function used is not threadsafe. While time.sleep is blocking, the use of asyncio.sleep is non-blocking. Using the asyncio the method asks the event loop to run something else while the await statement finishes its execution. If we were to incorporate this script into a more robust web application, this would be an important optimization to make.General tips for using Bard during code reviews.Here are some tips for using Bard for code reviews: Be specific where possible. For example, you can ask for specific types of improvements and optimizations during your code review. The more specific the request, the better Bard will be able to respond to it.Provide detailed context where needed. The more information Bard has, the more effective the review feedback will be.Be open to feedback and suggestions from Bard, even if they differ from your own opinions. Consider and evaluate the proposed changes before implementing them.ConclusionGoogle Bard is a powerful tool that can be used to improve the quality of code. Bard can help to identify potential bugs and errors, suggest improvements, and provide documentation and explanations. This can help to improve the efficiency, quality, and security of code.If you are interested in using Google Bard for code reviews, you can sign up for the early access program. To learn more, visit the Google Bard website.Author BioDarren Broemmer is an author and software engineer with extensive experience in Big Tech and Fortune 500. He writes on topics at the intersection of technology, science, and innovation. LinkedIn 
Read more
  • 0
  • 0
  • 4473

article-image-simplify-customer-segmentation-with-pandasai
Gabriele Venturi
31 Oct 2023
6 min read
Save for later

Simplify Customer Segmentation with PandasAI

Gabriele Venturi
31 Oct 2023
6 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Understanding customer needs is critical for business success. Segmenting customers into groups with common traits allows for targeting products, marketing, and services. This guide will walk through customer segmentation using PandasAI, a Python library that makes the process easy and accessible.OverviewWe'll work through segmenting sample customer data step-by-step using PandasAI's conversational interface. Specifically, we'll cover:Loading and exploring customer dataSelecting segmentation featuresDetermining the optimal number of clustersPerforming clustering with PandasAIAnalyzing and describing the segmentsFollow along with the explanations and code examples below to gain hands-on experience with customer segmentation in Python.IntroductionCustomer segmentation provides immense value by enabling tailored messaging and product offerings. But for many, the technical complexity makes segmentation impractical. PandasAI removes this barrier by automating the process via simple conversational queries.In this guide, we'll explore customer segmentation hands-on by working through examples using PandasAI. You'll learn how to load data, determine clusters, perform segmentation, and analyze results. The steps are accessible even without extensive data science expertise. By the end, you'll be equipped to leverage PandasAI's capabilities to unlock deeper customer insights. Let's get started!Step 1 - Load Customer DataWe'll use a fictional customer dataset customers.csv containing 5000 rows with attributes like demographics, location, transactions, etc. Let's load it with Pandas:import pandas as pd customers = pd.read_csv("customers.csv")Preview the data:customers.head()This gives us a sense of available attributes for segmentation.Step 2 - Select Segmentation FeaturesNow we need to decide which features to use for creating customer groups. For this example, let's select:AgeGenderCityNumber of TransactionsExtract these into a new DataFrame:segmentation_features = ['age', 'gender', 'city', 'transactions'] customer_data = customers[segmentation_features]Step 3 - Determine Optimal Number of ClustersA key step is choosing the appropriate number of segments k. Too few reduces distinction, too many makes them less useful.Traditionally, without using PandasAI, we should apply the elbow method to identify the optimal k value for the data. Something like this:from sklearn.cluster import KMeans from sklearn.preprocessing import OneHotEncoder from sklearn.impute import SimpleImputer import pandas as pd import matplotlib.pyplot as plt # Handle missing values by imputing with the most frequent value in each column imputer = SimpleImputer(strategy='most_frequent') df_imputed = pd.DataFrame(imputer.fit_transform(customers), columns=customers.columns) # Perform one-hot encoding for the 'gender' and 'city' columns encoder = OneHotEncoder(sparse=False) gender_city_encoded = encoder.fit_transform(df_imputed[['gender', 'city']]) # Concatenate the encoded columns with the original DataFrame df_encoded = pd.concat([df_imputed, pd.DataFrame(gender_city_encoded, columns=encoder.get_feature_names_out(['gender', 'city']))], axis=1) # Drop the original 'gender' and 'city' columns as they're no longer needed after encoding df_encoded.drop(columns=['gender', 'city'], inplace=True) # Calculate SSE for k = 1 to 9 sse = {} for k in range(1, 9): km = KMeans(n_clusters=k) km.fit(df_encoded) sse[k] = km.inertia_ # Plot elbow curve plt.plot(list(sse.keys()), list(sse.values())) plt.xlabel("Number of Clusters") plt.ylabel("SSE") plt.show()Examining the elbow point, 4 seems a good number of clusters for this data, so we’ll create 4 clusters.Too complicated? You can easily let PandasAI do it for you.customers.chat("What is the ideal amount of clusters for the given dataset?") # 4 PandasAI will use a silhouette score under the hood to calculate the optimal amount of clusters based on your data.Silhouette score is a metric used to evaluate the goodness of a clustering model. It measures how well each data point fits within its assigned cluster versus how well it would fit within other clusters.PandasAI leverages silhouette analysis to pick the optimal number of clusters for k-means segmentation based on which configuration offers the best coherence within clusters and distinction between clusters for the given data.Step 4 - Perform Clustering with PandasAINow we'll use PandasAI to handle clustering based on the elbow method insights.First import and initialize a SmartDataFrame:from pandasai import SmartDataframe sdf = SmartDataframe(customers)Then simply ask PandasAI to cluster:segments = sdf.chat(""" Segment customers into 4 clusters based on their age, gender, city and number of transactions. """)This performs k-means clustering and adds the segment labels to the original data. Let's inspect the results:print(segments)Each customer now has a cluster segment assigned.Step 5 - Analyze and Describe ClustersWith the clusters created, we can derive insights by analyzing them:centers = segments.chat("Show cluster centers") print(centers)Step 6 - Enrich Analysis with Additional DataOur original dataset contained only a few features. To enhance the analysis, we can join the clustered data with additional customer info like:Purchase historyCustomer lifetime valueEngagement metricsProduct usageRatings/reviewsBringing in other datasets allows drilling down into each segment with a deeper perspective.For example, we could join the review history and analyze customer satisfaction patterns within each cluster:# Join purchase data to segmented dataset enriched_data = pd.merge(segments, reviews, on='id') # Revenue for each cluster enriched_data.groupby('cluster').review_score.agg(['mean', 'median', 'count'])This provides a multidimensional view of our customers and segments, unlocking richer insights and enabling a more in-depth analysis for additional aggregate metrics for each cluster.ConclusionIn this guide, we worked through segmenting sample customer data step-by-step using PandasAI. The key aspects covered were:Loading customer data and selecting relevant featuresUsing the elbow method to determine the optimal number of clustersPerforming k-means clustering via simple PandasAI queriesAnalyzing and describing the created segmentsSegmentation provides immense value through tailored products and messaging. PandasAI makes the process accessible even without extensive data science expertise. By automating complex tasks through conversation, PandasAI allows you to gain actionable insights from your customer data.To build on this, additional data like customer lifetime value or engagement metrics could provide even deeper understanding of your customers. The key is asking the right questions – PandasAI handles the hard work to uncover meaningful answers from your data.Now you're equipped with hands-on experience leveraging PandasAI to simplify customer segmentation in Python.Author BioGabriele Venturi is a software engineer and entrepreneur who started coding at the young age of 12. Since then, he has launched several projects across gaming, travel, finance, and other spaces - contributing his technical skills to various startups across Europe over the past decade.Gabriele's true passion lies in leveraging AI advancements to simplify data analysis. This mission led him to create PandasAI, released open source in April 2023. PandasAI integrates large language models into the popular Python data analysis library Pandas. This enables an intuitive conversational interface for exploring data through natural language queries.By open-sourcing PandasAI, Gabriele aims to share the power of AI with the community and push boundaries in conversational data analytics. He actively contributes as an open-source developer dedicated to advancing what's possible with generative AI.
Read more
  • 0
  • 0
  • 4285
article-image-microsoft-fabric-revolutionizing-data-analytics-and-ai-integration
Julian Melanson
15 Jun 2023
7 min read
Save for later

Microsoft Fabric: Revolutionizing Data Analytics and AI Integration

Julian Melanson
15 Jun 2023
7 min read
In today's data-driven world, organizations are constantly seeking ways to unlock the full potential of their data and seamlessly integrate artificial intelligence into their operations. Addressing this need, Microsoft has introduced Microsoft Fabric at the Build 2023 developer conference. This cutting-edge data analytics platform brings together essential data and analytics tools into a unified software-as-a-service (SaaS) product. In this article, we will explore the features of Microsoft Fabric, uncovering how it simplifies AI integration and revolutionizes data analytics, empowering organizations to harness the power of their data.A Holistic Approach to Data AnalyticsMicrosoft Fabric offers a holistic solution by integrating platforms such as Data Factory, Synapse, and Power BI into a cohesive and comprehensive platform. By replacing disparate systems with a simplified and cost-effective platform, Microsoft Fabric provides organizations with a unified environment for AI integration. It consolidates essential components like data integration, data engineering, data warehousing, data science, real-time analytics, applied observability, and business intelligence into a single ecosystem, equipping data professionals with a comprehensive suite of tools.Core Workloads in Microsoft Fabric:Microsoft Fabric supports seven core workloads that cater to various aspects of data analytics and AI integration, ensuring a seamless end-to-end experience:Data Factory: This workload facilitates seamless data integration by offering over 150 connectors to popular cloud and on-premises data sources. With its user-friendly drag-and-drop functionality, Data Factory simplifies data integration tasks.Synapse Data Engineering: With a focus on data engineering, this workload promotes collaboration among data professionals, streamlining the data engineering workflow and ensuring efficient data processing.Synapse Data Science: Designed for data scientists, this workload enables the development, training, deployment, and management of AI models. It offers an end-to-end workflow and infrastructure support for streamlined AI development.Synapse Data Warehousing: This workload provides a converged lake house and data warehousing experience. Optimizing SQL performance on open data formats, it offers efficient and scalable data warehousing capabilities.Synapse Real-Time Analytics: Tailored for handling high volumes of streaming data, this workload enables developers to analyze semi-structured data in real time with minimal latency. It caters to IoT devices, telemetry, and log data.Power BI Integration: Microsoft Fabric seamlessly integrates powerful visualization capabilities and AI-driven analytics from Power BI. This integration empowers business analysts and users to derive valuable insights through visually appealing representations.Data Activator: This workload offers a no-code experience for real-time data detection and monitoring. By triggering notifications and actions based on predefined data patterns, Data Activator ensures proactive data management.Simplifying AI DevelopmentMicrosoft Fabric addresses the challenges organizations face in AI development by providing a unified platform that encompasses all necessary capabilities to extract insights from data and make them accessible to AI models or end users. To enhance usability, Fabric integrates its own copilot tool, allowing users to interact with the platform using natural language commands and a chat-like interface. This feature streamlines code generation, query creation, AI plugin development, custom Q&A, visualization creation, and more.The Power of OneLake and Integration with Microsoft 365Microsoft Fabric is built upon the OneLake open data lake platform, which serves as a central repository for data. OneLake eliminates the need for data extraction, replication, or movement, acting as a single source of truth. This streamlined approach enhances data governance while offering a scalable pricing model that aligns with usage. Furthermore, Fabric seamlessly integrates with Microsoft 365 applications, enabling users to discover and analyze data from OneLake directly within tools such as Microsoft Excel. This integration empowers users to generate Power BI reports with a single click, transforming data analysis into a more intuitive and efficient process. Additionally, Fabric enables users of Microsoft Teams to bring data directly into their chats, channels, meetings, and presentations, expanding the reach of data-driven insights. Moreover, sales professionals using Dynamics 365 can leverage Fabric and OneLake to unlock valuable insights on customer relationships and business processes.Getting Started with Microsoft FabricFollow these steps to start your Fabric (Preview) trial.Open the Fabric homepage and select the Account Manager.In the Account Manager, select Start Trial.If prompted, agree to the terms and then select Start Trial.Once your trial capacity is ready, you receive a confirmation message. Select Got it to begin working in Fabric.Open your Account manager again. Notice that you now have a heading for Trial status. Your Account manager keeps track of the number of days remaining in your trial. You also see the countdown in your Fabric menu bar when you work on a product experience.You now have a Fabric (Preview) trial that includes a Power BI individual trial (if you didn't already have a Power BI paid license) and a Fabric (Preview) trial capacity.Below is a list of end-to-end tutorials available in Microsoft Fabric with links to Microsoft’s website.Multi-experience tutorialsThe following table lists tutorials that span multiple Fabric experiences.Tutorial nameScenarioLakehouseIn this tutorial, you ingest, transform, and load the data of a fictional retail company, Wide World Importers, into the lakehouse and analyze sales data across various dimensions.Data ScienceIn this tutorial, you explore, clean, and transform a taxicab trip dataset, and build a machine-learning model to predict trip duration at scale on a large dataset.Real-Time AnalyticsIn this tutorial, you use the streaming and query capabilities of Real-Time Analytics to analyze the New York Yellow Taxi trip dataset. You uncover essential insights into trip statistics, taxi demand across the boroughs of New York, and other related insights.Data warehouseIn this tutorial, you build an end-to-end data warehouse for the fictional Wide World Importers company. You ingest data into a data warehouse, transform it using T-SQL and pipelines, run queries, and build reports.Experience-specific tutorialsThe following tutorials walk you through scenarios within specific Fabric experiences.Tutorial nameScenarioPower BIIn this tutorial, you build a dataflow and pipeline to bring data into a lakehouse, create a dimensional model, and generate a compelling report.Data FactoryIn this tutorial, you ingest data with data pipelines and transform data with dataflows, then use the automation and notification to create a complete data integration scenario.Data Science end-to-end AI samplesIn this set of tutorials, learn about the different Data Science experience capabilities and examples of how ML models can address your common business problems.Data Science - Price prediction with RIn this tutorial, you build a machine learning model to analyze and visualize the avocado prices in the US and predict future prices.SummaryMicrosoft Fabric revolutionizes data analytics and AI integration by providing organizations with a unified platform that simplifies the complex tasks involved in AI development. By consolidating data-related functionalities and integrating with Microsoft 365 applications, Fabric empowers users to harness the power of their data and make informed decisions. With its support for diverse workloads and integration with OneLake, Microsoft Fabric paves the way for organizations to unlock the full potential of data-driven insights and AI innovation. As the landscape of AI continues to evolve, Microsoft Fabric stands as a game-changing solution for organizations seeking to thrive in the era of data-driven decision-making.Author BioJulian Melanson is a full-time teacher and best-selling instructor dedicated to helping students realize their full potential. With the honor of teaching over 150,000 students from 130 countries across the globe, he has honed his teaching skills to become an expert in this field. He focuses on unlocking the potential of creativity and productivity with AI tools and filmmaking techniques learned over the years, creating countless content for clients from many industries.
Read more
  • 0
  • 0
  • 3971
Modal Close icon
Modal Close icon