Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7007 Articles
article-image-building-a-containerized-llm-chatbot-application
Alan Bernardo Palacio
21 Aug 2023
19 min read
Save for later

Building a Containerized LLM Chatbot Application

Alan Bernardo Palacio
21 Aug 2023
19 min read
In this hands-on tutorial, we will build a containerized LLM-powered chatbot application that uses examples to create a custom chatbot capable of answering deep philosophical questions and responding with profound questions in return. We will use Streamlit as the web application framework, PostgreSQL as the database to store examples, and OpenAI's GPT-3.5 "text-davinci-003" model for language processing.The application allows users to input philosophical questions, and the AI-powered chatbot will respond with insightful answers based on the provided examples. Additionally, the chatbot will ask thought-provoking questions in response to user input, simulating the behavior of philosophical minds like Socrates and Nietzsche.We'll break down the implementation into several files, each serving a specific purpose:Dockerfile: This file defines the Docker image for our application, specifying the required dependencies and configurations.docker-compose.yml: This file orchestrates the Docker containers for our application, including the web application (Streamlit) and the PostgreSQL database.setup.sql: This file contains the SQL commands to set up the PostgreSQL database and insert example data.streamlit_app.py: This file defines the Streamlit web application and its user interface.utils.py: This file contains utility functions to interact with the database, create the Da Vinci LLM model, and generate responses.requirements.txt: This file lists the Python dependencies required for our application.The DockerfileThe Dockerfile is used to build the Docker image for our application. It specifies the base image, sets up the working directory, installs the required dependencies, and defines the command to run the Streamlit application:FROM python:3 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["streamlit", "run", "streamlit_app.py"]In the Dockerfile, we define the base image to Python 3 using FROM python:3, which enables us to use Python and its packages. Next, we specify the working directory inside the container as /app where we will copy our application files. To ensure all required Python packages are installed, we copy the requirements.txt file, which lists the dependencies, into the container's and then, we run the command pip install --no-cache-dir -r requirements.txt to install the Python dependencies. We proceed to copy all the files from the current directory (containing our application files) into the container's /app directory using COPY . .. Finally, we define the command to run the Streamlit application when the container starts using CMD ["streamlit", "run", "streamlit_app.py"]. This command starts the Streamlit app, enabling users to interact with the philosophical AI assistant through their web browsers once the container is up and running.The requirements.txt file lists the Python dependencies required for our application:streamlit streamlit-chat streamlit-extras psycopg2-binary openai==0.27.8 langchain==0.0.225The requirement file uses the next packages:streamlit: The Streamlit library for creating web applications.streamlit-chat: Streamlit Chat library for adding chat interfaces to Streamlit apps.streamlit-extras: Streamlit Extras library for adding custom components to Streamlit apps.psycopg2-binary: PostgreSQL adapter for Python.openai==0.27.8: The OpenAI Python library for accessing the GPT-3.5 model.langchain==0.0.225: LangChain library for working with language models and prompts.Next, we will define the docker compose file which will also handle the deployment of the Postgres database where we will store our examples.Creating the docker-composeThe docker-compose.yml file orchestrates the Docker containers for our application: the Streamlit web application and the PostgreSQL database:version: '3' services: app:    build:      context: ./app    ports:      - 8501:8501    environment:      - OPENAI_API_KEY=${OPENAI_API_KEY}    depends_on:      - db db:    image: postgres:13    environment:      - POSTGRES_USER=your_username      - POSTGRES_PASSWORD=your_password      - POSTGRES_DB=chatbot_db      - POSTGRES_HOST_AUTH_METHOD=trust    volumes:      - ./db/setup.sql:/docker-entrypoint-initdb.d/setup.sqlThe docker-compose.yml file orchestrates the deployment of our LLM-powered chatbot applicationand defines the services, i.e., the containers, needed for our application.In the services section, we have two distinct services defined: app and db. The app service corresponds to our Streamlit web application, which will serve as the user interface for interacting with the philosophical AI assistant. To build the Docker image for this service, we specify the build context as ./app, where the necessary application files, including the Dockerfile, reside.To ensure seamless communication between the host machine and the app container, we use the ports option to map port 8501 from the host to the corresponding port inside the container. This allows users to access the web application through their web browsers.For the application to function effectively, the environment variable OPENAI_API_KEY must be set, providing the necessary authentication for our LLM model to operate. This is done using the environment section, where we define this variable.One of the critical components of our application is the integration of a PostgreSQL database to store the philosophical question-answer pairs. The db service sets up the PostgreSQL database using the postgres:13 image. We configure the required environment variables, such as the username, password, and database name, to establish the necessary connection.To initialize the database with our predefined examples, we leverage the volumes option to mount the setup.sql file from the host machine into the container's /docker-entrypoint-initdb.d directory. This SQL script contains the commands to create the examples table and insert the example data. By doing so, our PostgreSQL database is ready to handle the profound philosophical interactions with the AI assistant.In conclusion, the docker-compose.yml file provides a streamlined and efficient way to manage the deployment and integration of Language Model Microservices with a PostgreSQL database, creating a cohesive environment for our philosophical AI assistant application.Setting up examplesThe setup.sql file contains the SQL commands to set up the PostgreSQL database and insert example data. We use this file in the volumes section of the docker-compose.yml file to initialize the database when the container starts:-- Create the examples table CREATE TABLE IF NOT EXISTS examples ( id SERIAL PRIMARY KEY, query TEXT, answer TEXT ); -- Insert the examples INSERT INTO examples (query, answer) VALUES ('What is the nature of truth?', 'Truth is a mirror reflecting the depths of our souls.'), ('Is there an objective reality?', 'Reality is an ever-shifting kaleidoscope, molded by our perceptions.'), (' What is the role of reason in human understanding?', 'Reason illuminates the path of knowledge, guiding us towards self-awareness.'), ('What is the nature of good and evil?', 'Good and evil are intertwined forces, dancing in the eternal cosmic tango.'), ('Is there a purpose to suffering?', 'Suffering unveils the canvas of resilience, painting a masterpiece of human spirit.'), ('What is the significance of morality?', 'Morality is the compass that navigates the vast ocean of human conscience.'), ('What is the essence of human existence?', 'Human existence is a riddle wrapped in the enigma of consciousness.'), ('How can we find meaning in a chaotic world?', 'Meaning sprouts from the fertile soil of introspection, blooming in the garden of wisdom.'), ('What is the nature of love and its transformative power?', 'Love is an alchemist, transmuting the mundane into the divine.'), ('What is the relationship between individuality and society?', 'Individuality dances in the grand symphony of society, playing a unique melody of self-expression.'), ('What is the pursuit of knowledge and its impact on the human journey?', 'Knowledge is the guiding star, illuminating the path of human evolution.'), ('What is the essence of human freedom?', 'Freedom is the soaring eagle, embracing the vast expanse of human potential.');The setup.sql script plays a crucial role in setting up the PostgreSQL database for our LLM-powered chatbot application. The SQL commands within this script are responsible for creating the examples table with the necessary columns and adding the example data to this table.In the context of our LLM application, these examples are of great importance as they serve as the foundation for the assistant's responses. The examples table could be a collection of question-answer pairs that the AI assistant has learned from past interactions. Each row in the table represents a specific question (query) and its corresponding insightful answer (answer).When a user interacts with the chatbot and enters a new question, the application leverages these examples to create a custom prompt for the LLM model. By selecting a relevant example based on the length of the user's question, the application constructs a few-shot prompt that incorporates both the user's query and an example from the database.The LLM model uses this customized prompt, containing the user's input and relevant examples, to generate a thoughtful and profound response that aligns with the philosophical nature of the AI assistant. The inclusion of examples in the prompt ensures that the chatbot's responses resonate with the same level of wisdom and depth found in the example interactions stored in the database.By learning from past examples and incorporating them into the prompts, our LLM-powered chatbot can emulate the thought processes of philosophical giants like Socrates and Nietzsche. Ultimately, these examples become the building blocks that empower the AI assistant to engage in the profound realms of philosophical discourse with the users.The Streamlit ApplicationThe streamlit_app.py file defines the Streamlit web application and its user interface. It is the main file where we build the web app and interact with the LLM model:import streamlit as st from streamlit_chat import message from streamlit_extras.colored_header import colored_header from streamlit_extras.add_vertical_space import add_vertical_space from utils import * # Define database credentials here DB_HOST = "db" DB_PORT = 5432 DB_NAME = "chatbot_db" DB_USER = "your_username" DB_PASSWORD = "your_password" # Connect to the PostgreSQL database and retrieve examples examples = get_database_examples(DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD) # Create the Da Vinci LLM model davinci = create_davinci_model() # Create the example selector and few shot prompt template example_selector = create_example_selector(examples) dynamic_prompt_template = create_few_shot_prompt_template(example_selector) # Now the Streamlit app # Sidebar contents with st.sidebar:    st.title('The AI seeker of truth and wisdom')    st.markdown('''    ## About    This app is an LLM-powered chatbot built using:    - Streamlit    - Open AI Davinci LLM Model    - LangChain    - Philosophy    ''')    add_vertical_space(5)    st.write('Running in Docker!') # Generate empty lists for generated and past. ## generated stores AI generated responses if 'generated' not in st.session_state:    st.session_state['generated'] = ["Hi, what questions do you have today?"] ## past stores User's questions if 'past' not in st.session_state:    st.session_state['past'] = ['Hi!'] # Layout of input/response containers input_container = st.container() colored_header(label='', description='', color_name='blue-30') response_container = st.container() # User input ## Function for taking user provided prompt as input def get_text():    input_text = st.text_input("You: ", "", key="input")    return input_text ## Applying the user input box with input_container:    user_input = get_text() # Response output ## Function for taking user prompt as input followed by producing AI generated responses def generate_response(prompt):    response = davinci(        dynamic_prompt_template.format(query=prompt)    )    return response ## Conditional display of AI generated responses as a function of user provided prompts with response_container:    if user_input:        response = generate_response(user_input)        st.session_state.past.append(user_input)       st.session_state.generated.append(response)    if st.session_state['generated']:        for i in range(len(st.session_state['generated'])):            message(st.session_state['past'][i], is_user=True, key=str(i) + '_user',avatar_style='identicon',seed=123)            message(st.session_state["generated"][i], key=str(i),avatar_style='icons',seed=123)In this part of the code, we set up the core components of our LLM-powered chatbot application. We begin by importing the necessary libraries, including Streamlit, Streamlit Chat, and Streamlit Extras, along with utility functions from the utils.py file. Next, we define the database credentials (DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD) required for connecting to the PostgreSQL database.The application then establishes a connection to the database using the get_database_examples function from the utils.py file. This crucial step retrieves profound philosophical question-answer pairs stored in the examples table. These examples are essential as they serve as a knowledge base for the AI assistant and provide the context and wisdom needed to generate meaningful responses.To leverage the OpenAI Da Vinci LLM model, we create the model instance using the create_davinci_model function from utils.py. This model acts as the core engine of our chatbot, enabling it to produce thoughtful and profound responses.In order to create custom prompts for the LLM model, we utilize the create_example_selector and create_few_shot_prompt_template functions from the utils.py file. These functions help select relevant examples based on the length of the user's input and construct dynamic prompts that combine the user's query with relevant examples.The Streamlit web app's sidebar is then set up, providing users with information about the application's purpose and inspiration. Within the application's session state, two lists (generated and past) are initialized to store AI-generated responses and user questions, respectively.To ensure an organized layout, we define two containers (input_container and response_container). The input_container houses the text input box where users can enter their questions. The get_text function is responsible for capturing the user's input.For generating AI responses, the generate_response function takes the user's prompt, processes it through the Da Vinci LLM model, and produces insightful replies. The AI-generated responses are displayed in the response_container using the message function from the Streamlit Chat library, allowing users to engage in profound philosophical dialogues with the AI assistant. Overall, this setup lays the groundwork for an intellectually stimulating and philosophical chatbot experience.Crating the utils fileThe utils.py file contains utility functions for our application, including connecting to the database, creating the Da Vinci LLM model, and generating responses:from langchain import PromptTemplate, FewShotPromptTemplate from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain.llms import OpenAI from langchain import PromptTemplate, LLMChain from langchain.prompts.example_selector import LengthBasedExampleSelector from langchain import FewShotPromptTemplate import psycopg2 def get_database_examples(host, port, dbname, user, password):    try:        conn = psycopg2.connect(            host=host,            port=port,            dbname=dbname,            user=user,            password=password        )        cursor = conn.cursor()        cursor.execute("SELECT query, answer FROM examples")        rows = cursor.fetchall()        examples = [{"query": row[0], "answer": row[1]} for row in rows]        cursor.close()        conn.close()        return examples    except psycopg2.Error as e:        raise Exception(f"Error connecting to the database: {e}") def create_davinci_model():    return OpenAI(model_name='text-davinci-003') def create_example_selector(examples):    example_template = """    User: {query}    AI: {answer}    """    example_prompt = PromptTemplate(        input_variables=["query", "answer"],        template=example_template    )    if not examples:        raise Exception("No examples found in the database.")    return LengthBasedExampleSelector(        examples=examples,        example_prompt=example_prompt,        max_length=50    ) def create_few_shot_prompt_template(example_selector):    prefix = """The following are excerpts from conversations with a philosophical AI assistant.    The assistant is a seeker of truth and wisdom, responding with profound questions to know yourself    in a way that Socrates, Nietzsche, and other great minds would do. Here are some examples:"""    suffix = """    User: {query}    AI: """    return FewShotPromptTemplate(        example_selector=example_selector,        example_prompt=example_selector.example_prompt,        prefix=prefix,        suffix=suffix,        input_variables=["query"],        example_separator="\\\\n"    ) def generate_response(davinci, dynamic_prompt_template, prompt):    response = davinci(dynamic_prompt_template.format(query=prompt))    return responseThe get_database_examples function is responsible for establishing a connection to the PostgreSQL database using the provided credentials (host, port, dbname, user, password). Through this connection, the function executes a query to retrieve the question-answer pairs stored in the examples table. The function then organizes this data into a list of dictionaries, with each dictionary representing an example containing the query (question) and its corresponding answer.The create_davinci_model function is straightforward, as it initializes and returns the Da Vinci LLM model.To handle the selection of relevant examples for constructing dynamic prompts, the create_example_selector function plays a crucial role. It takes the list of examples as input and creates an example selector. This selector helps choose relevant examples based on the length of the user's query. By using this selector, the AI assistant can incorporate diverse examples that align with the user's input, leading to more coherent and contextually appropriate responses.The create_few_shot_prompt_template function is responsible for building the few-shot prompt template. This template includes a custom prefix and suffix to set the tone and style of the philosophical AI assistant. The prefix emphasizes the assistant's role as a "seeker of truth and wisdom" while the suffix provides the formatting for the user's query and AI-generated response. The custom template ensures that the AI assistant's interactions are profound and engaging, resembling the thought-provoking dialogues of historical philosophers like Socrates and Nietzsche.Finally, the generate_response function is designed to generate the AI's response based on the user's prompt. It takes the Da Vinci LLM model, dynamic prompt template, and the user's input as input parameters. The function uses the LLM model to process the dynamic prompt, blending the user's query with the selected examples, and returns the AI-generated response.Starting the applicationTo launch our philosophical AI assistant application with all its components integrated seamlessly, we can use Docker Compose. By executing the command docker-compose --env-file .env up, the Docker Compose tool will orchestrate the entire application deployment process.The --env-file .env option allows us to specify the environment variables from the .env file, which holds sensitive credentials and configuration details. This ensures that the necessary environment variables, such as the OpenAI API key and database credentials, are accessible to the application without being explicitly exposed in the codebase.When the docker-compose up command is initiated, Docker Compose will first build the application's Docker image using the Dockerfile defined in the ./app directory. This image will contain all the required dependencies and configurations for our Streamlit web application and the integration with the Da Vinci LLM model.Next, Docker Compose will create two services: the app service, which represents our Streamlit web application, and the db service, representing the PostgreSQL database. The app service is configured to run on port 8501, making it accessible through http://localhost:8501 in the browser.Once the services are up and running, the Streamlit web application will be fully operational, and users can interact with the philosophical AI assistant through the user-friendly interface. When a user enters a philosophical question, the application will use the Da Vinci LLM model, together with the selected examples, to generate insightful and profound responses in the style of great philosophers:With Docker Compose, our entire application, including the web server, LLM model, and database, will be containerized, enabling seamless deployment across different environments. This approach ensures that the application is easily scalable and portable, allowing users to experience the intellectual exchange with the philosophical AI assistant effortlessly.ConclusionIn this tutorial, we've built a containerized LLM-powered chatbot application capable of answering deep philosophical questions and responding with profound questions, inspired by philosophers like Socrates and Nietzsche. We used Streamlit as the web application framework, PostgreSQL as the database, and OpenAI's GPT-3.5 model for language processing.By combining Streamlit, PostgreSQL, and OpenAI's GPT-3.5 model, you've crafted an intellectually stimulating user experience. Your chatbot can answer philosophical inquiries with deep insights and thought-provoking questions, providing users with a unique and engaging interaction.Feel free to experiment further with the chatbot, add more examples to the database, or explore different prompts for the LLM model to enrich the user experience. As you continue to develop your AI assistant, remember the immense potential these technologies hold for solving real-world challenges and fostering intelligent conversations.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn
Read more
  • 0
  • 0
  • 15111

article-image-hands-on-tutorial-on-how-to-use-pinecone-with-langchain
Alan Bernardo Palacio
21 Aug 2023
17 min read
Save for later

Hands-On tutorial on how to use Pinecone with LangChain

Alan Bernardo Palacio
21 Aug 2023
17 min read
A vector database stores high-dimensional vectors and mathematical representations of attributes. Each vector holds dimensions ranging from tens to thousands, enhancing data richness. It operationalizes embedding models, aiding application development with resource management, security, scalability, and query efficiency. Pinecone, a vector database, enables a quick semantic search of vectors. Integrating OpenAI’s LLMs with Pinecone merges deep learning-based embedding generation with efficient storage and retrieval, facilitating real-time recommendation and search systems. Pinecone acts as long-term memory for large language models like OpenAI’s GPT-4.IntroductionThis tutorial will guide you through the process of integrating Pinecone, a high-performance vector database, with LangChain, a framework for building applications powered by large language models (LLMs). Pinecone enables developers to build scalable, real-time recommendation and search systems based on vector similarity search.PrerequisitesBefore you begin this tutorial, you should have the following:A Pinecone accountA LangChain accountA basic understanding of PythonPinecone basicsAs a starter, we will get familiarized with the use of Pinecone by exploring its basic functionalities of it. Remember to get the Pinecone access key.Here is a step-by-step guide on how to set up and use Pinecone, a cloud-native vector database that provides long-term memory for AI applications, especially those involving large language models, generative AI, and semantic search.Initialize Pinecone clientWe will use the Pinecone client, so this step is only necessary if you don’t have it installed already.pip install pinecone-clientTo use Pinecone, you must have an API key. You can find your API key in the Pinecone console under the "API Keys" section. Note both your API key and your environment. To verify that your Pinecone API key works, use the following command:import pinecone pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")If you don't receive an error message, then your API key is valid. This will also initialize the Pinecone session.Creating and retrieving indexesThe commands below create an index named "quickstart" that performs an approximate nearest-neighbor search using the Euclidean distance metric for 8-dimensional vectors.pinecone.create_index("quickstart", dimension=8, metric="euclidean")The Index creation takes roughly a minute.Once your index is created, its name appears in the index list. Use the following command to return a list of your indexes.pinecone.list_indexes()Before you can query your index, you must connect to the index.index = pinecone.Index("quickstart")Now that you have created your index, you can start to insert data into it.Insert the dataTo ingest vectors into your index, use the upsert operation, which inserts a new vector into the index or updates the vector if a vector with the same ID is already present. The following commands upsert 5 8-dimensional vectors into your index.index.upsert([    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]) ])You can get statistics about your index, like the dimensions, the usage, and the vector count. To do this, you can use the following command to return statistics about the contents of your index.index.describe_index_stats()This will return a dictionary with information about your index:Now that you have created an index and inserted data into it, we can query the database to retrieve vectors based on their similarity.Query the index and get similar vectorsThe following example queries the index for the three vectors that are most similar to an example 8-dimensional vector using the Euclidean distance metric specified above.index.query( vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], top_k=3, include_values=True )This command will return the first 3 vectors stored in this index that have the lowest Euclidian distance:Once you no longer need the index, use the delete_index operation to delete it.pinecone.delete_index("quickstart")By following these steps, you can set up a Pinecone vector database in just a few minutes. This will help you provide long-term memory for your high-performance AI applications without any infrastructure hassles.Now, let’s take a look at a bit more complex example, in which we embed text data and insert it into Pinecone.Preparing and Processing the DataIn this section, we will create a context for large language models (LLMs) using the OpenAI API. We will walk through the different parts of a Python script, understanding the purpose and function of each code block. The ultimate aim is to transform data into larger chunks of around 500 tokens, ensuring that the dataset is ordered sequentially.SetupFirst, we install the necessary libraries for our script. We're going to use OpenAI for AI models, pandas for data manipulation, and transformers for tokenization.!pip install openai pandas transformersAfter the installations, we import the necessary modules for our script.import pandas as pd import openaiBefore you can interact with OpenAI, you need to provide your API key. Make sure to replace <<YOUR_API_KEY>> with your actual API key.openai.api_key = ('<<YOUR_API_KEY>>')Now we are ready to start processing the data to be embedded and stored in Pinecone.Data transformationWe use pandas to load JSON data files related to different technologies (HuggingFace, PyTorch, TensorFlow, Streamlit). These files seem to contain questions and answers related to their respective topics and are based on the data in the Pinecone documentation. First, we will concatenate these data frames into one for easier manipulation.hf = pd.read_json('data/huggingface-qa.jsonl', lines=True) pt = pd.read_json('data/pytorch-qa.jsonl', lines=True) tf = pd.read_json('data/tensorflow-qa.jsonl', lines=True) sl = pd.read_json('data/streamlit-qa.jsonl', lines=True) df = pd.concat([hf, pt, tf, sl], ignore_index=True) df.head()We can see the data here:Next, we define a function to remove new lines and unnecessary spaces in our text data. The function remove_newlines takes a pandas Series object and performs several replace operations to clean the text.def remove_newlines(serie):    serie = serie.str.replace('\\\\n', ' ', regex=False)    serie = serie.str.replace('\\\\\\\\n', ' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    serie = serie.str.replace('  ',' ', regex=False)    return serieWe transform the text in our dataframe into a single string format combining the 'docs', 'category', 'thread', 'question', and 'context' columns.df['text'] = "Topic: " + df.docs + " - " + df.category + "; Question: " + df.thread + " - " + df.question + "; Answer: " + df.context df['text'] = remove_newlines(df.text)TokenizationWe use the HuggingFace transformers library to tokenize our text. The GPT2 tokenizer is used, and the number of tokens for each text string is stored in a new column 'n_tokens'.from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))We filter out rows in our data frame where the number of tokens exceeds 2000.df = df[df.n_tokens < 2000]Now we can finally embed the data using the OpenAI API.from openai.embeddings_utils import get_embedding size = 'curie' df['embeddings'] = df.text.apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001')) df.head()We will be using the text-search-curie-doc-001' Open AI engine to create the embeddings, which is very capable, faster, and lower cost than Davinci:So far, we've prepared our data for subsequent processing. In the next parts of the tutorial, we will cover obtaining embeddings from the OpenAI API and using them with the Pinecone vector database.Next, we will initialize the Pinecone index, create text embeddings using the OpenAI API and insert them into Pinecone.Initializing the Index and Uploading Data to PineconeThe second part of the tutorial aims to take the data that was prepared previously and upload them to the Pinecone vector database. This would allow these embeddings to be queried for similarity, providing a means to use contextual information from a larger set of data than what an LLM can handle at once.Checking for Large Text DataThe maximum size limit for metadata in Pinecone is 5KB, so we check if any 'text' field items are larger than this.from sys import getsizeof too_big = [] for text in df['text'].tolist():    if getsizeof(text) > 5000:        too_big.append((text, getsizeof(text))) print(f"{len(too_big)} / {len(df)} records are too big")This will filter out the entries whose metadata is larger than the one Pinecone can manage. The next step is to create a unique identifier for the records.There are several records with text data larger than the Pinecone limit, so we assign a unique ID to each record in the DataFrame.df['id'] = [str(i) for i in range(len(df))] df.head()This ID can be used to retrieve the original text later:Now we can start with the initialization of the index in Pinecone and insert the data.Pinecone Initialization and Index CreationNext, Pinecone is initialized with the API key, and an index is created if it doesn't already exist. The name of the index is 'beyond-search-openai', and its dimension matches the length of the embeddings. The metric used for similarity search is cosine.import pinecone pinecone.init(    api_key='PINECONE_API_KEY',    environment="YOUR_ENV" ) index_name = 'beyond-search-openai' if not index_name in pinecone.list_indexes():    pinecone.create_index(        index_name, dimension=len(df['embeddings'].tolist()[0]),        metric='cosine'    ) index = pinecone.Index(index_name)Now that we have created the index, we can proceed to insert the data. The index will be populated in batches of 32. Relevant metadata (like 'docs', 'category', 'thread', and 'href') is also included with each item. We will use tqdm to create a progress bar for the progress of the insertion.from tqdm.auto import tqdm batch_size = 32 for i in tqdm(range(0, len(df), batch_size)):    i_end = min(i+batch_size, len(df))    df_slice = df.iloc[i:i_end]    to_upsert = [        (            row['id'],            row['embeddings'],            {                'docs': row['docs'],                'category': row['category'],                'thread': row['thread'],                'href': row['href'],                'n_tokens': row['n_tokens']            }        ) for _, row in df_slice.iterrows()    ]    index.upsert(vectors=to_upsert)This will insert the records into the database to be used later on in the process:Finally, the ID-to-text mappings are saved into a JSON file. This would allow us to retrieve the original text associated with an ID later on.mappings = {row['id']: row['text'] for _, row in df[['id', 'text']].iterrows()} import json with open('data/mapping.json', 'w') as fp:    json.dump(mappings, fp)Now the Pinecone vector database should now be populated and ready for querying. Next, we will use this information to provide context to a question answering LLM.Querying and Answering QuestionsThe final part of the tutorial involves querying the Pinecone vector database with questions, retrieving the most relevant context embeddings, and using OpenAI's API to generate an answer to the question based on the retrieved contexts.OpenAI Embedding GenerationThe OpenAI API is used to create embeddings for the question.from openai.embeddings_utils import get_embedding q_embeddings = get_embedding(    'how to use gradient tape in tensorflow',    engine=f'text-search-curie-query-001' )A function create_context is defined to use the OpenAI API to create a query embedding, retrieve the most relevant context embeddings from Pinecone, and append these contexts into a larger string ready for feeding into OpenAI's next generation step.from openai.embeddings_utils import get_embedding def create_context(question, index, max_len=3750, size="curie"):    q_embed = get_embedding(question, engine=f'text-search-{size}-query-001')    res = index.query(q_embed, top_k=5, include_metadata=True)    cur_len = 0    contexts = []    for row in res['matches']:        text = mappings[row['id']]        cur_len += row['metadata']['n_tokens'] + 4        if cur_len < max_len:            contexts.append(text)        else:            cur_len -= row['metadata']['n_tokens'] + 4            if max_len - cur_len < 200:                break    return "\\\\n\\\\n###\\\\n\\\\n".join(contexts) We can now use this function to retrieve the context necessary based on a given question, as the question is embedded and the relevant context is retrieved from the Pinecone database:Now we are ready to start passing the context to a question-answering model.Querying and AnsweringWe start by defining the parameters that will take during the query, specifically the model we will be using, the maximum token length and other parameters. We can also define given instructions to the model which will be used to constrain the results we can get..fine_tuned_qa_model="text-davinci-002" instruction=""" Answer the question based on the context below, and if the question can't be answered based on the context, say \\"I don't know\\"\\n\\nContext:\\n{0}\\n\\n---\\n\\nQuestion: {1}\\nAnswer:""" max_len=3550 size="curie" max_tokens=400 stop_sequence=None domains=["huggingface", "tensorflow", "streamlit", "pytorch"]Different instruction formats can be defined. We will start now making some simple questions and seeing what the results look like.question="What is Tensorflow" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see that it's giving us the proper results using the context that it's retrieving from Pinecone:We can also inquire about Pytorch:question="What is Pytorch" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(        prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)The results keep being consistent with the context provided:Now we can try to go beyond the capabilities of the context by pushing the boundaries a bit more.question="Am I allowed to publish model outputs to Twitter, without a human review?" context = create_context(    question,    index,    max_len=max_len,    size=size, ) try:    # fine-tuned models requires model parameter, whereas other models require engine parameter    model_param = (        {"model": fine_tuned_qa_model}        if ":" in fine_tuned_qa_model        and fine_tuned_qa_model.split(":")[1].startswith("ft")        else {"engine": fine_tuned_qa_model}    )    #print(instruction.format(context, question))    response = openai.Completion.create(       prompt=instruction.format(context, question),        temperature=0,        max_tokens=max_tokens,        top_p=1,        frequency_penalty=0,        presence_penalty=0,        stop=stop_sequence,        **model_param,    )    print( response["choices"][0]["text"].strip()) except Exception as e:    print(e)We can see in the results that the model is working according to the instructions provided as we don’t have any context on Twitter:Lastly, the Pinecone index is deleted to free up resources.pinecone.delete_index(index_name)ConclusionThis tutorial provided a comprehensive guide to harnessing Pinecone, OpenAI's language models, and HuggingFace's library for advanced question-answering. We introduced Pinecone's vector search engine, explored data preparation, embedding generation, and data uploading. Creating a question-answering model using OpenAI's API concluded the process. The tutorial showcased how the synergy of vector search engines, language models, and text processing can revolutionize information retrieval. This holistic approach holds potential for developing AI-powered applications in various domains, from customer service chatbots to research assistants and beyond.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn 
Read more
  • 0
  • 0
  • 25167

article-image-getting-started-with-google-makersuite
Anubhav Singh
08 Aug 2023
14 min read
Save for later

Getting Started with Google MakerSuite

Anubhav Singh
08 Aug 2023
14 min read
MakerSuite, essentially a developer tool, enables everyone with a Google Account to access the power of PaLM API with a focus on building products and services using it. The MakerSuite interface allows rapid prototyping and testing of the configurations that are used while interacting with the PaLM API. Once the user is satisfied with the configurations, they can very easily port them to their backend codebases. We’re now ready to dive into exploring the MakerSuite interface. To get started, head over to https://makersuite.google.com/ on your browser. Make sure you’re logged in to your Google Account to be able to access the interface. You’ll be able to see the welcome dashboard.The available options on MakerSuite as of the date of writing this article are - Text Prompts, Data Prompts, and Chat Prompts. Let’s take a brief look at what each of these does.Text PromptsText prompts are the most basic and customizable form of prompts that can be provided to the models. You can choose to set it to any task or ask any question in a stateless manner. The user prompt and input are ingested by the model every time it is run and the model itself does not hold any context. Thus, text prompts are a great starting point and can be made as deterministic or creative in their output as required by the user.Let us create a Text prompt in MakerSuite. Click on the Create button on the Text prompt card and you’ll be presented with the prompt testing UI. On the top, MakerSuite allows users to save their prompts by name. It also provides starter samples which allow one to quickly test and understand how the product works. Below that, is the main working area where the users can define their own prompts and by adjusting the configuration parameters of the model at the bottom, run the prompts to produce an output.First, Click on the Pencil icon on the top left to give this prompt a suitable name. For our example, we’ll be building a prompt that asks the model to produce the etymology of any given word. We’re using the following valuesfield     valuename     Word Etymologydescription     Asking PaLM API to provide word etymologies.Click on “Save” to save these values and close the input modal. Kindly note that these values do not affect the model in any manner and are simply present for user convenience.Now, in the main working area below, we’ll write the required prompt. For our example, we write the prompt given below:For any given word that follows, provide its etymology in no more than 300 words.Aeroplane.Etymology: Now, let’s adjust the model parameters. Click on the button next to the Run button to change the model settings. For our example, we shall set the following values to the parameters: field    value       remarkmodel    Text Bison       Use defaultTemperature    0       Since word etymologies are dry facts and are not expected to be creativeAdd stop sequence        Use defaultMax outputs    1       Word etymologies are usually not going to benefit from variations of telling themDepending on the use case you’re building your generative AI-backed software for, you may wish to change the Safety settings of the model response. To do so, click on the Edit safety settings button. You can see the following options and can change them as per your requirement. For our use case, we shall leave it to default.At the bottom of the configuration menu, you can choose to adjust further advanced settings of the model output. Let’s see what these options are: We shall leave these options on default for now.Great, we’re now all set to run the prompt. Click on the Run button on the bottom and wait for the model to produce the output. In our case, the model outputs:The word "aeroplane" is derived from the Greek words "aēr" (air) and "planē" (to wander). The term was first used in the 1860s to describe a type of flying machine that was powered by a steam engine. In 1903, the Wright brothers made the first successful flight of a powered aeroplane.Note that, for you, the response might come out slightly different due to the inherently non-deterministic nature of how generative AI works. At this point, you might want to experiment by erasing the model output and running the prompt again. Does the output change? Re-run it several times to observe changes in the model output. Then, try adjusting the values of the model configuration and see how that affects the output of the model. If you had set the temperature configuration to 0, you will notice that the model likely produces the same output many times. Try increasing it to 1 and then run the model a few times. Does the output generated in each iteration remain the same now? It is highly possible that you’ll observe that the model output changes every time you re-run the prompt.It is interesting to note here that the prompt you provide to the model does not contain any examples of how the model should respond. This method of using the model is called Zero-shot learning in which the trained model is asked to produce predictions for an input that it may not have seen before. In our example, it is the task of providing word etymologies, which the model may or may not have been trained on.This makes us wonder if we gave the model an input that it has absolutely not trained on, is it likely to produce the correct response? Let us try this out. Change the word in our etymology prompt example to “xakoozifictation”. Hit the Run button to see what the model outputs. Instead of telling us that the word does not exist and thus, has no meaning, the model attempts to produce an etymology of the word. The output we got was: Instead of telling us that the word does not exist and thus, has no meaning, the model attempts to produce an etymology of the word. The output we got was: Xakoozifictation is a portmanteau of the words "xakooz" and "ification". Xakooz is a nonsense word created by combining the sounds of the words "chaos" and "ooze". ification is a suffix that can be added to verbs to create nouns that describe the process of doing something. In this case, xakoozifictation means the process of making something chaotic or oozy.What we observe here is called “model hallucination” - a phenomenon common among large language models wherein the model tries to produce output contrary to common logic or is inaccurate in real-world knowledge. It is highly recommended here to read more about Model Hallucations in the “Challenges in working with LLMs” section.Let us continue our discussion about Zero shot learning. We saw that when we provide only a prompt to the model and no examples of how to produce responses, the model tries its best to produce a response and in most general cases it succeeds. However, if we were to provide some examples to the model of the expected input-output pairs, can we program the model to perform more accurately and do away with the model hallucinations? Let us give this a try by providing some input-output examples of the model. Update your model prompt to the following:For any given word that follows, provide its etymology in no more than 300 words.Examples: Word: aeroplaneReasoning: Since it's a valid English word, produce an output.Etymology: Aeroplane is a compound word formed from the Greek roots "aer" (air) and "planus" (flat). Word: balloonReasoning: Since it's a valid English word, produce an output.Etymology: The word balloon comes from the Italian word pallone, which means ball. The Italian word is derived from the Latin word ballare, which means to dance. Word: oungopoloctousReasoning: Since this is not a valid English word, do not produce an etymology and say it's "Not available".Etymology: Not availableWord: kaploxicatingReasoning: Since this is not a valid English word, do not produce an etymology and say it's "Not available".Etymology: Not availableWord: xakoozifictationEtymology: In the above prompt, we have provided 2 examples of words that exist and 2 examples of words that do not exist. We expect the model to learn from these examples and produce output accordingly. Hit Run to see the output of the model, remember to set the temperature configuration of the model back to 0.You will see that the model responds with the “Not available” output for non-existent words now and with etymologies only for words that exist in the English dictionary. Hence, by providing a few examples of how we expect the model to behave, we were able to stop the model hallucination problem.This method of providing some samples of the expected input-output to the model in the prompt is called Few shot learning. In Few shot learning, the model is expected to predict output on unknown input based on a few similar samples it has received prior to the task of prediction. In special cases, the number of samples might be exactly one, which gets termed as “One-shot learning”.Now, let us explore the next type of prompt available on the MakerSuite - Data Prompt.Data PromptsIn Data prompts, the user is expected to use the model to generate more samples of data based on provided samples. The MakerSuite data prompt interface defines two sections of the prompt - the prompt itself which is now options and the samples of the data that the prompt has to work on, which is a required section.It is important to note here that at the bottom of the page, the model is still the Text Bison model. Thus, the Data prompts can be understood as specific use cases of the text generation using the Text Bison model.Further, there is no way to test the data prompts without specifying the inputs as one or more columns of the to-be-generated rows of the dataset. Let us build a prompt for this interface. Since providing a prompt text is now not necessary, we’ll skip it and instead fill the table as shown below: In order to add more columns than the number of columns present by default, use the Add button on the top right.Once this is done, we are now ready to provide the input column for the test inputs below. In the Test your prompt section at the bottom, fill in only the INPUT number column as shown below:Now, click on the Run button to see how the model produces outputs for this prompt. We see that the model produces the rest of the data for those rows correctly and using the format that we provided it with. This makes us wonder that if we provide historical data to the Data prompt, will it be able to predict future trends? Let us give this a try.Create a new Data prompt and on the data examples table, on the top right click on Add -> Import examples. You may choose any existing Google Sheets from the dialog box, or upload any supported file. We choose to upload a CSV file, notably the Iris flower dataset’s CSV. We use the one found at https://gist.github.com/netj/8836201/On selecting the file, the interface will ask you to assign the columns in the CSV to columns in your data examples. We choose to create new input columns for all the feature columns of the Iris dataset, and keep the labels column as an output column, as shown below:After importing the examples, let us manually move a few examples to the Test your prompt section. Remember to remove these examples from the data examples section above to ensure the model is not training on the same data that it is being tested on. Now, click the Run button to get the model’s output.We observe that the model is able to correctly output the label column values as per the examples it has received. Hence, besides generating more examples for a given dataset, the model is also capable of making predictions about the inputs to a degree. One would require a much more extensive testing of the same to determine the accuracy of the model, which is beyond the scope of this article.Finally, let us explore the Chat prompts.Chat promptsChatting with generative AI models is a form in which most people have interacted with them first. Made popular once more by the advent of ChatGPT, the concept of AI being able to have intelligent conversations has been around for a very long time and has been a part of popular culture on a regular basis. One of the most well-known examples of an AI being able to take instructions and produce output accordingly is the JARVIS from Iron Man series of comics. With the latest possibilities enabled by generative AI, building such systems is very much a realistic task, with efforts already underway.In this section, we shall see how we can have conversations with generative AI models that mimic human-like understanding and decision making skills.First, click on the Create New button on the top left of the MakerSuite interface and select the Chat prompt. You will be able to see the blank interface for designing a Chat prompt, as shown below: One immediate change to notice is that there is no longer a Run button at the bottom of the UI. This has been shifted to the Test your prompt section with a chat box-like interface, and the message send button for the chat box functions as the Run button.On the left the interface section reads “Write your prompt examples”, we’ll call this the prompts examples section. Also take note of the Context field available in this section, which can be used to set the rules of the interaction and the format in which the output is expected. Now, let us design a chat prompt, as shown in the table below:field    valueContext    You're a banker at the Gringotts bank, set in the Wizarding world of Harry Potter.User    I wish to access my accountModel    Very well, please present your key.User    How safe are the vaults at Gringotts?Model   Gringotts' vaults are considered extremely safe. Protected by complex magic, various creatures, and intricate security measures, these vaults are nearly impenetrable. Unauthorized access is extraordinarily challenging and dangerous, as demonstrated multiple times in the series. This reputation contributes to the bank's trustworthiness among wizards.We expect that the model will pretend to be a banker at the Gringotts bank, which is referenced from the popular book series Harry Potter. Since it's a fictional world and we expect the conversation to be similarly not bound to the real world, we should increase the model temperature, allowing it to be more creative. For this example, let’s set the model temperature to 0.7.Let us try having a conversation with the model. This is the conversation that happens with the model, in our case:We observe that although we have not provided the model with an example of how to respond when the user says they do not have the key, it correctly handles the response based on its existing knowledge about Gringotts Bank’s policies. Now that we have covered the different types of prompts available in MakerSuite, let’s explore how we can use them via code, making direct calls to the PaLM API.Author BioAnubhav Singh, Co-founder of Dynopii & Google Dev Expert in Google Cloud, is a  seasoned developer since the pre-Bootstrap era, Anubhav has extensive experience as a freelancer and AI startup founder. He authored "Hands-on Python Deep Learning for Web" and "Mobile Deep Learning with TensorFlow Lite, ML Kit, and Flutter." A Google Developer Expert in GCP, he co-organizes for TFUG Kolkata community and formerly led the team at GDG Cloud Kolkata. Anubhav is often found discussing System Architecture, Machine Learning, and Web technologies 
Read more
  • 0
  • 0
  • 14877

article-image-ai-in-the-real-world-insurance
Julian Melanson
21 Jul 2023
5 min read
Save for later

AI in the Real World: Insurance

Julian Melanson
21 Jul 2023
5 min read
As the relentless tide of technological advancement swells, the insurance industry, among many others, is facing a pivotal transformation. The inception and evolution of insurance technology or "insurtech" mandates that insurance agents, brokers, and companies diligently adapt and assimilate novel tools and methodologies to augment their operational efficiency and competitiveness. Of the emerging technologies, the innovative language model ChatGPT, conceived and developed by OpenAI, is showing significant potential to redefine the landscape of the insurance industry.This powerful AI model offers a diverse suite of services, each capable of improving the lives of insurance agents in numerous ways. From perfecting customer service and streamlining underwriting to improving data analytics and fraud detection, ChatGPT opens a Pandora's box of possibilities. Yet, the efficacy and feasibility of these innovative solutions call for a judicious understanding of the technology's strengths and limitations.The advantages of AI in InsuranceFirstly, customer service, the linchpin of the insurance business, is an area that stands to gain substantially from the implementation of ChatGPT. Insurance products and processes, notoriously labyrinthine to the average consumer, are sources of frequent queries and uncertainties. By employing ChatGPT, insurance firms can automatically answer routine questions related to policy details, billing, claims statuses, and more, in an array of languages. In doing so, it significantly alleviates the burden on customer service agents and concurrently boosts customer engagement.Such automated systems also find favor among modern consumers, with reports suggesting a notable preference for chatbot interactions. ChatGPT, with its impressive capabilities in generating human-like text responses, can amplify the effectiveness of customer service chatbots. These enhancements invariably lead to increased customer satisfaction, freeing up human agents to tackle more complex customer concerns. Furthermore, ChatGPT's natural language processing prowess can be harnessed to guide customers on suitable insurance products and services, digitizing sales and distribution.The underwriting process, a traditionally time-consuming task characterized by risk evaluation, is another sector ripe for the automation that ChatGPT brings. While Artificial Intelligence (AI) and Machine Learning (ML) models have previously been employed to improve the accuracy of risk assessment, gaps in data and information remain problematic. ChatGPT addresses this issue by enhancing data collection and analysis, investigating digital resources for analogous cases, and speeding up the identification of risk patterns.Through this sophisticated data analysis, ChatGPT can evaluate factors like a customer's age, financial status, occupation, and lifestyle, thereby determining their risk profile. This information enables insurers to offer personalized coverage and pricing, improving customer experience, and streamlining underwriting. In addition, it can alert insurers about high-risk individuals or circumstances, proactively averting potential losses. This automatic evaluation brings with it many questions around AI and ethics (which you can read more about here) but the advantages of getting such a system working are clear.Claims processing, an insurance operation infamous for its high cost and low level of digitization, is another area primed for disruption by ChatGPT. The AI model can proficiently extract and categorize information from claims forms and other documents, drastically reducing the laborious and time-intensive task of manual data entry.A significant advantage arising from automating claims processing is its potential in fraud detection. With estimates by the FBI suggesting that insurance fraud costs American families hundreds of dollars each year in premium hikes, the value of early fraud detection cannot be overstated. ChatGPT can help in showing patterns of inconsistencies in claim forms, then flagging suspicious entries for human review. By alerting insurers to both overt and covert attempts at fraud, ChatGPT can help save billions of dollars annually.Reasons to be carefulAs a caveat, while the utility and advantages of ChatGPT in the insurance industry are substantial, one must consider the nascent stage of this technology. Its real-world impact will be contingent on its successful integration within existing processes and wide-ranging adoption. Moreover, while AI systems offer remarkable capabilities, they are not infallible and require human supervision. Thus, the technology's explainability, its transparency, and its limitations should be carefully considered and understood.In summary, the potential of ChatGPT to transform the insurance industry is vast, promising efficiency gains, cost reductions, and enhanced customer service. But realizing these advantages requires industry-wide receptiveness, careful integration, and judicious application, along with a respect for the limitations of the technology.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 4225

article-image-ai-in-the-real-world-real-estate
Julian Melanson
20 Jul 2023
4 min read
Save for later

AI in the Real World: Real Estate

Julian Melanson
20 Jul 2023
4 min read
The fast-paced development of Artificial Intelligence has already started reshaping various sectors, with the real estate industry standing out as a prominent beneficiary. Of particular interest is the potential AI presents in streamlining property valuation, a critical process that underlies a myriad of real estate activities, including setting sale prices, making investment decisions, and optimizing home insurance premiums. While the conventional means of property valuation have their merits, they are far from perfect. This article delves into the potential of AI, specifically OpenAI's ChatGPT, in transforming property valuation in the real estate sector, discussing the challenges inherent to traditional approaches and exploring the benefits offered by this AI-driven approach.The Current State of Property ValuationProperty valuation is a meticulous process that draws on a variety of data sources, both public and private. Depending on the valuation's purpose, the time and effort committed to research can differ significantly. For instance, real estate brokers might base their Broker Price Opinions on a limited set of comparable properties, while appraisers might undertake a thorough firsthand inspection to understand a property's condition, quality, and value comprehensively.Despite the evolution of valuation methodologies over the years, traditional approaches still grapple with certain obstacles. One of the primary issues is data inconsistency, mainly arising from the dispersed and scattered nature of relevant property data across various sources. While attempts have been made to centralize information on property features, ownership changes, and other key insights, consistency in data remains elusive. The result is disparities in the Automated Valuation Models (AVMs) currently used, which can lead to divergent valuations for the same property.Moreover, human bias forms a significant challenge in property appraisals. It's often difficult to find identical properties for comparison, leading to inevitable subjectivity in adjustments made to reconcile price differences. Studies show that appraised values fall below the agreed purchase price in just 10% of cases, suggesting a propensity towards price confirmation bias, a situation that calls for greater objectivity in home appraisals.Integrating AI into Property Valuation: The Role of ChatGPTIn response to these challenges, the integration of AI into the property valuation process presents a promising solution. The application of AI, especially advanced language models like ChatGPT, can offer consistent examinations of a property’s condition and quality, mitigating issues associated with data inconsistencies and human bias.ChatGPT, a generative pre-trained transformer, has been designed to understand and generate human-like text based on given input. In the context of real estate, it offers tremendous potential in data analysis and, consequently, in generating accurate property valuations. Traditionally, property valuations have been conducted by human appraisers who assess a property’s worth based on a range of factors such as location, size, and condition. However, this approach can be time-consuming, costly, and susceptible to human error.By incorporating ChatGPT into the valuation process, real estate professionals can input relevant data into the AI model, which can then analyze the data and supply a detailed valuation report. The implications of this are transformative for the industry: it offers considerable time savings, reduces the potential for errors, and enhances the transparency of the valuation process.A Practical Application of ChatGPT in Property ValuationHere’s a very simple prompt that illuminates how ChatGPT can be a great guide in the property valuation process:    The evolution of AI has unlocked numerous opportunities for innovation and efficiency across a variety of sectors, with the real estate industry being no exception. Particularly, the advent of AI models like ChatGPT has opened new avenues for enhancing the accuracy and efficiency of property valuations. By surmounting the obstacles inherent to traditional valuation methodologies, such as data inconsistencies and human bias, AI offers a more streamlined, transparent, and precise approach to property valuation. Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 2429

article-image-mitigating-the-risks-of-chatgpt-in-finance
Julian Melanson
13 Jul 2023
5 min read
Save for later

Mitigating the Risks of ChatGPT in Finance

Julian Melanson
13 Jul 2023
5 min read
The application of advanced AI tools, such as ChatGPT, in various industries, particularly finance, has proven transformative due to its extensive language processing capabilities. ChatGPT's functions within the financial sector are diverse and impressive. It can understand financial market dynamics, suggest products, identify specific entities, and generate financial summaries, reports, and forecasts. Furthermore, the potential of training ChatGPT for fraud prevention and detection is an exciting prospect.However, as the integration of ChatGPT into the financial services realm becomes more prevalent, it brings to the fore several ethical challenges. Therefore, the onus is on both researchers and practitioners to ensure that the technology's use is responsible and advantageous to all parties involved. The solutions to these ethical challenges often require a multi-faceted approach, focusing on data exposure, misinformation, technology dependency, privacy concerns, and social engineering.The Ethical Challenges InvolvedOne of the paramount ethical challenges is data exposure. For example, ChatGPT users working with financial data might unintentionally disclose sensitive information. Additionally, during the AI model's training phase, there's a risk of exposing confidential elements such as proprietary code snippets, API keys, or login credentials.ChatGPT can sometimes generate biased or inaccurate responses, causing misinformation. The tool, at present, operates based on data sets that only run up to September 2021, which are sourced online and not always accurate. Therefore, financial professionals must exercise caution while using such advice to prevent the propagation of misinformation.Furthermore, while AI can be a powerful tool for financial decision-making, relying solely on technology can undermine human judgment and intuition. Financial professionals could fall into the trap of misinterpreting or overly depending on ChatGPT's advice, thereby overlooking the importance of human expertise in the financial sector. Therefore, it is crucial to strike a balance between utilizing AI's efficiency and maintaining human critical thinking.As ChatGPT requires an extensive amount of data for training, this raises significant privacy concerns. The information collected could pose serious risks to both individuals and organizations if exposed or used maliciously. In tandem with privacy concerns, social engineering issues arise as well. There is a potential for cybercriminals to misuse ChatGPT, impersonating individuals or organizations to conduct successful phishing attacks.Solving the ProblemAddressing these ethical challenges requires robust solutions. Firstly, the co-creation approach, which emphasizes public participation and stakeholder involvement in designing the AI algorithm. This strategy includes key choices in the algorithm, from the scope of its use to mitigating biases and tackling misinformation. It also ensures that humans keep a certain level of control over the AI tool, thus preventing total dependency on the technology.Secondly, the institutional approach can ensure the ethical use of ChatGPT in finance. This approach demands the establishment of concrete rules for managing ChatGPT, including training policy regulators to scrutinize and audit the AI algorithm and developing regulations. The focus is on creating transparent tools that ensure user privacy and constantly upgrade security measures to prevent breaches by cybercriminals.Lastly, it's vital to maintain a harmonious blend of AI-based decision-making and human intuition. While ChatGPT can crunch data and analyze trends with efficiency, human professionals have the experiential knowledge to make intuitive financial decisions. The amalgamation of both AI and human insight can lead to mutual learning and overall improvement in financial decision-making. It can also help address legal obstacles in financial domains that AI might overlook, thus ensuring the accuracy and reliability of financial decisions.The UK Finance paper on AI Fairness in Financial Services recommends a multi-disciplinary approach:Frontline business must be clear on the objective of the use of AI, the risks to individuals and to the business, and the extent to which risks of unfair treatment will be managed and explained to stakeholders.Data scientists are central to the technical aspects of the use, testing and monitoring of AI.Legal and Compliance need to be involved (including in any preliminary stages) to provide appropriate challenge, to oversee testing and to assist with fair process and related transparency principle.In addition, human application can mitigate the looming threat of job loss due to automation. While technology like ChatGPT can automate many functions, it is essential to preserve roles where human intuition, expertise, and judgment are irreplaceable.While the adoption of ChatGPT in finance is indeed a technological advancement, it comes with ethical challenges that require strategic and thoughtful solutions. Companies must adopt strategies such as co-creation and institutional approaches to ensure ethical usage. Furthermore, they need to strike a balance between AI and human insight to maintain the integrity of financial decisions. By addressing these challenges and implementing relevant strategies, we can ensure a future where AI not only augments the financial sector but also respects the values that we hold dear.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 5677
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-build-enterprise-ai-workflows-with-airops
Julian Melanson
12 Jul 2023
4 min read
Save for later

Build Enterprise AI Workflows with AirOps

Julian Melanson
12 Jul 2023
4 min read
In the realm of Artificial Intelligence, immense potential is no longer an abstract concept but a palpable reality, and businesses are increasingly seizing the opportunities this technology affords. AirOps, a new player in the AI sphere, has emerged as a remarkable conduit for businesses to harness the transformative abilities of AI within their operations. The company has announced a $7 million seed funding round, showing the confidence investors place in its unique proposition.Founded by Alex Halliday, Berna Gonzalez, and Matt Hammel, the company encapsulates a blend of technological knowledge and industry expertise. Their collective backgrounds span a diverse range of sectors, including MasterClass, Bungalow, and more. This multifaceted perspective fuels the vision of AirOps, allowing it to offer dynamic and adaptable solutions tailored to a multitude of business needs.AirOps deploys a platform leveraging large language models (LLMs) such as GPT-3, GPT-4, and Claude, each with its unique capabilities and merits. The AI-driven tools developed by AirOps can be integrated within existing business systems, speeding up processes, revealing deep insights from data, and generating custom content. These services are readily available across various interfaces, including Google Sheets, web apps, data warehouses, or APIs, thereby allowing businesses to embed AI capabilities directly into their established workflows.Airops' Main FeaturesDespite the impressive abilities of LLMs like GPT-4, the challenge for businesses lies in their practical deployment. AirOps mitigates this hurdle, offering a robust platform that enables businesses to use these AI models in addressing their specific needs. The platform helps users automate laborious tasks, generate personalized content, extract valuable insights from data, and leverage natural language processing techniques.One of the salient features of AirOps' value proposition is cost efficiency. Utilizing AI models can often be a costly endeavor, but the AirOps platform presents an innovative solution. The system employs larger models such as GPT-4 for initial training, then switches to smaller, fine-tuned, open-source models for regular operations, significantly reducing the financial burden.As AI evolves, the demand for nuanced and adaptable models increases. AirOps is at the forefront of these developments, continually learning and adapting to offer the most suitable solutions for its customers. AirOps aids businesses in creating AI experiences and generating new content from their existing data corpus, paving the way for a streamlined and efficient approach to making the most of AI capabilities.The company's strategic vision is also worth noting. Initially, AirOps set out to help businesses in extracting value from their data. However, as large language models have gained public recognition, the company has astutely shifted its focus. Today, AirOps aims to facilitate businesses in merging their data with LLMs, leading to the creation of custom workflows and applications.As AI continues to permeate the professional sphere, AirOps is showing how businesses can capitalize on this trend. Their AI-powered tools are being used across a variety of sectors, such as real estate, e-learning, and financial services, among others. By automating complex tasks, streamlining workflows, and generating custom content at scale, AirOps is empowering businesses to harness the transformative capabilities of AI effectively and efficiently.With its recent seed funding, the company aims to expand its product suite, bolster its team, and extend its customer base. As Halliday, the CEO, stated, the company's goal is to enable businesses to bridge the gap between the theoretical prowess of AI and its practical implementation. Through its groundbreaking work, AirOps is ensuring that the AI revolution in the business world is not merely a utopian vision, but an attainable reality.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 11898

article-image-ai-powered-stock-selection
Julian Melanson
12 Jul 2023
4 min read
Save for later

AI-Powered Stock Selection

Julian Melanson
12 Jul 2023
4 min read
Artificial Intelligence continues to infiltrate every facet of modern life, from daily chores to complex decision-making procedures, including the stock market. The recent advent of AI-powered language models like ChatGPT by OpenAI serves as a notable testament to this statement. The potential of these models transcends conversational prowess, delving into the ability to guide investment decisions.A case in point is an experiment conducted by Finder.com, an international financial comparison site. The test pitted an AI-constructed portfolio against some of the most renowned investment funds in the United Kingdom, seeing the AI-curated selection outstrip its counterparts. The portfolio, an assortment of 38 stocks picked by ChatGPT, manifested a gain of 4.9% between March 6 and April 28. In comparison, ten top-tier investment funds noted an average decline of 0.8% in the same period. To put this into perspective, the S&P 500 index, an esteemed gauge of the American market, marked a rise of 3%, and the Stoxx Europe 600, its European equivalent, noted a modest increase of 0.5%.The experiment's dynamics are as intriguing as its outcome. Investment funds aggregate capital from a multitude of investors, a fund manager administering the investment decisions. However, Finder's analysts asked the AI chatbot to construct a stock portfolio based on prevalent selection criteria - low indebtedness and a solid growth trajectory. Noteworthy picks included industry behemoths like Microsoft, Netflix, and Walmart.This process's ingenuity lies in its accessibility. While AI has pervaded major funds for years, supplementing investment decisions, the advent of ChatGPT has democratized this expertise. Now, the public can use this technology, thereby revolutionizing retail investment.How dependable are these AI-driven stock predictions? A study by the University of Florida supplies an answer. Published in April, the study posits that ChatGPT could forecast specific companies' stock price movements more accurately than some fundamental analysis models.In fact, the democratization of AI, characterized by models like ChatGPT and BERT, could potentially upend the financial industry. Researchers across the globe have corroborated this sentiment. In two separate studies, researchers found that large language models (LLMs) can enhance stock market and public opinion predictions, evidenced by historical data.University of Florida professors Alejandro Lopez-Lira and Yuehua Tang further validated this argument in their paper "Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models". They utilized ChatGPT to assess news headlines' sentiment, a metric that has become indispensable for quantitative analysis algorithms employed by stock traders.Sentiment analysis discerns whether a text, such as a news headline, conveys a positive, neutral, or negative sentiment about a subject or company. This evaluation enhances the accuracy of market predictions.Lopez-Lira and Tang applied ChatGPT to gauge the sentiment manifested in news headlines. Upon comparing ChatGPT's assessment of these news stories with the subsequent performance of company shares in their sample, they discovered statistically significant predictions, a feat unachieved by other LLMs.The professors asserted, "Our analysis reveals that ChatGPT sentiment scores exhibit a statistically significant predictive power on daily stock market returns." This statement, substantiated by their findings, shows a strong correlation between the ChatGPT evaluation and the subsequent daily returns of the stocks in their sample. It underscores the potential of ChatGPT as a potent tool for predicting stock market movements based on sentiment analysis.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 12331

article-image-chatgpt-and-ai-in-the-crypto-market
Julian Melanson
12 Jul 2023
6 min read
Save for later

ChatGPT and AI in the Crypto Market

Julian Melanson
12 Jul 2023
6 min read
OpenAI’s ChatGPT has gained significant attention since it first launched, and with its versatile capabilities and high accuracy, it has the potential to make a substantial impact on the crypto market. It’s crucial that we explore how AI and Natural Language Processing (NLP) can assist in fraud detection and prevention, understand the capabilities and limitations of ChatGPT in the crypto industry and trading, highlight the importance of AI in safeguarding the crypto market, and discuss ChatGPT's role in crypto compliance, AML, security, and its future implications.AI and NLP for Fraud Detection and PreventionNLP is a branch of AI that enables machines to read, understand, and draw conclusions from human languages. By using computational linguistics and statistical models, NLP can show suspicious behavior patterns and uncover fraud in financial transactions. For instance, NLP can detect inconsistencies in credit applications or identify suspicious transactions on credit cards.Capabilities and Limitations of ChatGPT in the Crypto IndustryIn the crypto industry, ChatGPT has various applications, particularly in trading. It can supply a historical overview of a certain type of cryptocurrency, analyze market data, forecast price movements, create strategies, and find trading opportunities. By leveraging ChatGPT, traders can make better-informed decisions and capitalize on emerging possibilities.Here is a very simple example of how ChatGPT can aid in creating a strategy for identifying Chainlink bottoms using the relative strength index (RSI), support and resistance levels, and moving averages:While ChatGPT can elucidate various aspects of the crypto arena, it's imperative to recognize its potential limitations, particularly pertaining to the source reliability of its information. The internet is fraught with misinformation, and since the advent of GPT-4 that offers web-browsing capabilities, such misinformation could inadvertently affect AI tools like ChatGPT. Within the volatile crypto market, such unreliable information can lead to imprudent investments. It’s advisable to fact-check the data ChatGPT provides to mitigate the risk of utilizing information from dubious sources.The Importance of AI in Safeguarding the Crypto MarketThe adoption of blockchain technology has brought benefits such as increased openness, data consistency, and security. By integrating AI with blockchain, a more secure and intelligent system can be established. Blockchain ensures the integrity of shared information and models used by AI, while AI enhances fraud detection capabilities. The combination of AI and blockchain creates a more resilient system that is resistant to attacks and fraud.ChatGPT in Crypto Fraud DetectionChatGPT, with its NLP capabilities, can contribute to fraud detection in the crypto market in several ways:Identifying Suspicious Transactions and Activities: By analyzing emails for suspicious language patterns and detecting anomalies, ChatGPT can help identify potential fraud. It can compare email texts to earlier communications from the same individual, ensuring consistency and detecting deviations.Analyzing Patterns and Anomalies in Crypto Trading Data: ChatGPT can analyze market data and find significant patterns and trends that can aid traders in making informed decisions.Monitoring Social Media and External Sources: ChatGPT can help compliance teams in monitoring chat and social networking platforms for suspicious activities, such as market manipulation and insider trading.Utilizing Advanced Machine Learning Algorithms for Risk Assessment: Machine learning algorithms can predict the likelihood of default on loans or identify risky transactions. This information helps lenders make more informed decisions and manage risks effectively.ChatGPT in Crypto Compliance and AMLIdentifying and Verifying the Identity of Crypto Traders and Investors: ChatGPT excels in identifying and verifying the identity of traders and investors, ensuring the authenticity of individuals involved in crypto transactions.Monitoring for Money Laundering and Financial Crimes: By leveraging AI capabilities, compliance teams can monitor transactions and identify suspicious patterns indicative of money laundering or other financial crimes.Keeping Up with Regulatory Changes and Compliance Requirements: AI chatbots like ChatGPT can adapt to regulatory changes and comply with requirements set by authorities to ensure seamless operations within legal frameworks.Developing and Implementing Effective KYC and AML Procedures: NLP and monitored machine learning techniques play a vital role in streamlining Know Your Customer (KYC) procedures. These technologies facilitate efficient identity verification and analysis of unstructured content.ChatGPT in Crypto SecurityProtecting Crypto Assets and Digital Wallets: AI tools like ChatGPT enhance security measures in crypto exchanges and platforms, safeguarding digital assets and wallets from potential threats.Enhancing Security in Crypto Exchanges and Platforms: ChatGPT helps in verifying the identities of investors, bolstering the overall security mechanism of crypto exchanges and platforms.Identifying and Preventing Phishing and Hacking Attempts: AI algorithms can block unauthorized smart contracts and reduce the risk of phishing and hacking attacks, thereby enhancing the security of the crypto industry.Developing and Implementing Advanced Security Protocols: AI algorithms and machine learning techniques help organizations identify vulnerabilities in their security architecture and improve overall system security.Future Developments and Implications of ChatGPT in CryptoAdvancements in NLP and AI are expected to have a significant impact on fraud detection and prevention. As society moves toward a cashless economy, the role of AI in identifying and preventing digital fraud becomes increasingly critical. ChatGPT's ability to fine-tune popular themes enables traders to stay updated on crypto news, retrieve relevant data, and generate trading strategies based on historical information. However, it is crucial to consider the ethical implications and potential risks associated with using AI for fraud detection and in financial systems. Responsible and informed use of AI technologies can contribute to building trust and credibility in the crypto market.ChatGPT, with its advanced NLP capabilities, offers exciting possibilities for the crypto market. Its potential to enhance fraud detection, bolster security measures, and build trust and credibility is promising. However, it is essential to approach AI adoption in the crypto market cautiously, taking into account ethical considerations and potential implications. As technology continues to evolve, the responsible and informed use of AI can pave the way for a safer and more efficient crypto ecosystem.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 12427

article-image-data-analysis-made-easy-with-chatgpt
Sagar Lad
02 Jul 2023
5 min read
Save for later

Data Analysis Made Easy with ChatGPT

Sagar Lad
02 Jul 2023
5 min read
Are you weary of trawling through heaps of analysis data in search of meaningful insights? With ChatGPT, the rules will soon alter. ChatGPT may reveal hidden patterns and trends in your data that you never imagined were there because of its sophisticated natural language processing skills. In this blog article, we'll look at how exploratory data analysis with ChatGPT can revolutionize your data and change the way you conduct business.Data Analysis with ChatGPTFor data analysts, ChatGPT can be a useful tool for processing, exploring, communicating, and collaborating on their data-driven ideas. Large volumes of data can be analyzed and processed by ChatGPT fast and effectively. Written inquiries can be interpreted and understood by ChatGPT through its language processing skills, which also allow it to extract pertinent insights from the data. Here are a few benefits that ChatGPT can provide: Data analysts can use ChatGPT to study their data, spot trends, and even produce useful data visualizations. The data is clearly outlined in these graphics, which makes it simpler for analysts to spot trends and insights. Data analysts can utilize ChatGPT to explain their findings to non-technical stakeholders. The chatbot can assist data analysts in providing simple explanations of complicated data ideas and insights by using natural language. Data analysts might benefit from ChatGPT's help in coming up with fresh, insightful queries to pose to their data. Analysts can investigate novel lines of inquiry and unearth previously unconsidered hidden insights by using natural language queries. Let's look at how chatGPT can make data analysis easy and straightforward. As a data modeler, I want to investigate the data's dictionary and metadata first. Image 1: : Data Dictionary Using ChatGPT, Part 1Image 2 : Data Dictionary Using ChatGPT, Part 2ChatGPT gives us thorough details about the data dictionary for each column, including a complete description of each column. The final user will benefit from this guidance on when and how to use the data.Asking chatGPT about the dataset's number of rows and columns will help you better grasp the overall statistics.  Image 2 : Dataset Statistics  As seen in the image above, chatGPT gives us a precise estimate of the dataset's number of rows and columns. After getting a broad overview of the dataset, let's examine the data's quality: Image 3 : Exploratory Data Analysis - Null Value StatisticsHere, we've given the chatGPT an input containing the dataset and requested it to determine the percentage of null values therein in order to determine whether the data can be used for analytics. The dataset does not contain any null values, hence chanGPT responds that the given dataset contains no missing values.Now, we can observe that the data set's header information is absent. Before we can use the data, the columns must contain meaningful data. Image 4 : Dataset Column Naming ConventionLet's ask chatGPT how it can deliver valuable header data. As you can see, the output of chatGPT is a column header with a description and business-specific naming standards. The technical team's and business users' lives are made easier in terms of using this data.We now know that the data quality is good. As this will affect the results of the data analysis, let's look for any outliers in the dataset. Image 5 : Detect Outliers in the DatasetIn this case, chatGPT is carrying out an in-depth analysis at the column level to see whether any outliers are there. It's okay if it doesn't exist. If it does, it also offers advice on what kind of outlier is present and how it can affect the entire data analysis procedure.Let's now look at how to use chatGPT to eliminate those outliers.Image 7 : Remove Outliers from the dataset using python, Part 1 Image 8 : Remove Outliers from the dataset using python, Part 2Therefore, for a given sample dataset, ChatGPT offers a thorough Python code that can be used to automatically eliminate the observed outliers. The team may have business analysts who are unfamiliar with Python. Let's see how chatGPT can assist business analysts with their data analysis work.Image 7 : SQL Query to calculate monthly revenue, Part 1Image 8 : SQL Query to calculate monthly revenueIn this case, chatGPT offers a default query that the business analyst may utilize to figure out the monthly income for a particular dataset. Let's then ask chatGPT to take on the role of a data analyst and offer further insights for a certain dataset. Image 8 : Step by Step Data Analysis using chatGPT, Part 1  Image 9 : Step-by-Step Data Analysis using ChatGPT, Part 2As we can see from the chatGPT's results, it offers us step-by-step advice on various studies and results that may be applied on top of this particular dataset. The execution of each of these tasks using chatGPT is possible for each phase of the overall data analysis process.Let's ask chatGPT to undertake this data analysis work so that it may use Python to analyze prices for the given dataset:Image 9 : Price Analysis using python, Part 1Image 10 : Price Analysis using python, Part 2Image 11 : Price Analysis using Python, Part 2For the purpose of doing price analysis on a given dataset, ChatGPT has developed a Python code and sample output. We can draw a judgment about how the prices are changing over time based on the data points at hand from this output.ConclusionIn this article, we go into great detail on how to use chatGPT for a variety of exploratory data analysis tasks. Additionally, we looked closely at different approaches to carrying out data analysis tasks using Python and SQL. ChatGPT is, in a word, a very useful tool for performing exploratory data analysis tasks while working with massive volumes of data.Author BioSagar Lad is a Cloud Data Solution Architect with a leading organization and has deep expertise in designing and building Enterprise-grade Intelligent Azure Data and Analytics Solutions. He is a published author, content writer, Microsoft Certified Trainer, and C# Corner MVP. Link - Medium , Amazon , LinkedIn 
Read more
  • 0
  • 0
  • 4898
article-image-generating-text-effects-with-adobe-firefly
Joseph Labrecque
02 Jul 2023
9 min read
Save for later

Generating Text Effects with Adobe Firefly

Joseph Labrecque
02 Jul 2023
9 min read
Adobe Firefly Text EffectsAdobe Firefly is a new set of generative AI tools which can be accessed via https://firefly.adobe.com/ by anyone with an Adobe ID. To learn more about Firefly… have a look at their FAQ.  Image 1: Adobe FireflyOne of the more unique aspects of Firefly that sets it apart from other generative AI tools is Adobe’s exploration of procedures that go beyond prompt-based image generation. A good example of this is what is called Text Effects in Firefly.Text effects are also prompt-based… but use a scaffold determined by font choice and character set to constrain a generated set of styles to these letterforms. The styles themselves are based on user prompts – although there are other variants to consider as well.In the remainder of this article, we will focus on the text-to-image basics available in Firefly.Using Text Effects within FireflyAs mentioned in the introduction, we will continue our explorations of Adobe Firefly with the ability to generate stylized text effects from a text prompt. This is a bit different from the procedures that users might already be familiar with when dealing with generative AI – yet retains many similarities with such processes.When you first enter the Firefly web experience, you will be presented with the various workflows available.Image 2: Firefly modules can be either active and ready to work with or in explorationThese appear as UI cards and present a sample image, the name of the procedure, a procedure description, and either a button to begin the process or a label stating that it is “in exploration”. Those which are in exploration are not yet available to general users.We want to locate the Text Effects module and click Generate to enter the experience.Image 3: The Text effects module in FireflyFrom there, you’ll be taken to a view that showcases text styles generated through this process. At the bottom of this view is a unified set of inputs that prompt you to enter the text string you want to stylize… along with the invitation to enter a prompt to “describe the text effects you want to generate”.Image 4: The text-to-image prompt requests your input to beginIn the first part that reads Enter Text, I have entered the text characters “Packt”. For the second part of the input requesting a prompt, enter the following: “futuristic circuitry and neon lighting violet”Click the Generate button when complete. You’ll then be taken into the Firefly text effects experience.  Image 5: The initial set of four text effect variants is generated from your prompt with the characters entered used as a scaffoldWhen you enter the text effects module properly, you are presented in the main area with a preview of your input text which has been given a stylistic overlay generated from the descriptive prompt. Below this are a set of four variants, and below that are the text inputs that contain your text characters and the prompt itself.To the right of this are your controls. These are presented in a user-friendly way and allow you to make certain alterations to your text effects. We’ll explore these properties next to see how they can impact our text effect style.Exploring the Text Effect PropertiesAlong the right-hand side of the interface are properties that can be adjusted. The first section here includes a set of Sample prompts to try out.Image 6: A set of sample prompts with thumbnail displaysClicking on any of these sample thumbnails will execute the prompt attributed to it, overriding your original prompt. This can be useful for those new to prompt-building within Firefly to generate ideas for their own prompts and to witness the capabilities of the generative AI. Choosing the View All option will display even more prompts.Below the sample prompts, we have a very important adjustment that can be made in the form of Text effects fit.Image 7: Text effects fit determines how tight or loose the visuals are bound to the scaffoldThis section provides three separate options for you to choose from… Tight, Medium, or Loose. The default setting is Medium and choosing either of the other options will have the effect of either tightening up all the little visual tendrils that expand beyond the characters – or will let them loose, generating even more beyond the bounds of the scaffold.Let’s look at some examples with our current scaffold and prompt:Image 8: Tight - will keep everything bound within the scaffold of the chosen charactersImage 9: Medium - is the default and includes some additional visuals extending from the scaffoldImage 10: Loose - creates many visuals beyond the bounds of the scaffoldOne of the nice things about this set is that you can easily switch between them to compare the resulting images and make an informed decision.Next, we have the ability to choose a Font for the scaffold. There are currently a very limited set of fonts to use in Firefly. Similar to the sample prompts, choosing the View All option will display even more fonts.Image 11: The font selection propertiesWhen you choose a new font, it will regenerate the imagery in the main area of the Firefly interface as the scaffold must be rebuilt.I’ve chosen Source Sans 3 as the new typeface. The visual is automatically regenerated based on the new scaffold created from the character structure.Image 12: A new font is applied to our text and the effect is regeneratedThe final section along the right-hand side of the interface is for Color choices. We have options for Background Color and for Text Color. Image 13: Color choices are the final properties sectionThere are a very limited set of color swatches to choose from. The most important is whether you want to have the background of the generated image be transparent or not.Making Additional ChoicesOkay – we’ll now look to making final adjustments to the generated image and downloading the text effect image to our local computer. The first thing we’ll choose is a variant – which can be found beneath the main image preview. A set of 4 thumbnail previews are available to choose from.Image 14: Selecting from the presented variantsClicking on each will change the preview above it to reveal the full variant – as applied to your text effect.For instance, if I choose option #3 from the image above, the following changes would result:Image 15: A variant is selected and the image preview changes to matchOf course, if you do not like any of the alternatives, you can always choose the initial thumbnail to revert back.Once you have made the choice of variant, you can download the text effect as an image file to your local file system for use elsewhere. Hover over the large preview image and an options overlay appears.Image 16: A number of options appear in the hover overlay, including the download optionWe will explore these additional options in greater detail in a future article. Click the download icon to begin the download process for that image.As Firefly begins preparing the image for download, a small overlay dialog appears.Image 17: Content credentials are applied to the image as it is downloadedFirefly applies metadata to any generated image in the form of content credentials and the image download process begins.What are content credentials? They are driven as part of the Content Authenticity Initiative to help promote transparency in AI. This is how Adobe describes content credentials in their Firefly FAQ:Content Credentials are sets of editing, history, and attribution details associated with content that can be included with that content at export or download. By providing extra context around how a piece of content was produced, they can help content producers get credit and help people viewing the content make more informed trust decisions about it. Content Credentials can be viewed by anyone when their respective content is published to a supporting website or inspected with dedicated tools. -- AdobeOnce the image is downloaded, it can be viewed and shared just like any other image file.Image 18: The text effect image is downloaded and ready for useAlong with content credentials, a small badge is placed upon the lower right of the image which visually identifies the image as having been produced with Adobe Firefly (beta).There is a lot more Firefly can do, and we will continue this series in the coming weeks. Keep an eye out for an Adobe Firefly deep dive… exploring additional options for your generative AI creations!Author BioJoseph is a Teaching Assistant Professor, Instructor of Technology, University of Colorado Boulder / Adobe Education Leader / Partner by DesignJoseph Labrecque is a creative developer, designer, and educator with nearly two decades of experience creating expressive web, desktop, and mobile solutions. He joined the University of Colorado Boulder College of Media, Communication, and Information as faculty with the Department of Advertising, Public Relations, and Media Design in Autumn 2019. His teaching focuses on creative software, digital workflows, user interaction, and design principles and concepts. Before joining the faculty at CU Boulder, he was associated with the University of Denver as adjunct faculty and as a senior interactive software engineer, user interface developer, and digital media designer.Labrecque has authored a number of books and video course publications on design and development technologies, tools, and concepts through publishers which include LinkedIn Learning (Lynda.com), Peachpit Press, and Adobe. He has spoken at large design and technology conferences such as Adobe MAX and for a variety of smaller creative communities. He is also the founder of Fractured Vision Media, LLC; a digital media production studio and distribution vehicle for a variety of creative works.Joseph is an Adobe Education Leader and member of Adobe Partners by Design. He holds a bachelor’s degree in communication from Worcester State University and a master’s degree in digital media studies from the University of Denver.Author of the book: Mastering Adobe Animate 2023 
Read more
  • 0
  • 0
  • 18080

article-image-everything-you-need-to-know-about-agentgpt
Avinash Navlani
02 Jul 2023
4 min read
Save for later

Everything You Need to Know about AgentGPT

Avinash Navlani
02 Jul 2023
4 min read
Advanced language models have been used in the last couple of years to create a variety of AI products, including conversational AI tools and AI assistants. A web-based platform called AgentGPT allows users to build and use AI agents right from their browsers. Making AgentGPT available to everyone and promoting community-based collaboration are its key goals.ChatGPT provides accurate, meaningful, in-depth specific answers and discussion to given input questions while AgentGPT, on the other hand, is an AI agent platform that takes an objective and achieves the goal by thinking, learning, and taking actions.AgentGPT can assist you with your goals without installing and downloading. You just need to create an account and get the power of AI-enabled Conversational AI. You have to provide a name and objective for your agent, and the agent will achieve the goal.What is AgentGPT?AgentGPT is an open-source platform that is developed by openAI and uses the GPT3.5 architecture. AgentGPT is an NLP-based technology that generates human-like text with accuracy and fluency. It can engage in conversations, question-answers, generative content, and problem-solving assistance.How does AgentGPT work?AgentGPT breaks down a given prompt into smaller tasks, and the agent completes these specific tasks in order to achieve the goal. Its core strength is engaging in real and contextual conversation. It generates dynamic discussions while learning from the large dataset. It recognizes the intentions and responds in a way that is human-like.How to Use Agent GPT?Let’s first create an account on reworkd.ai. After creating the account, deploy the agent by providing the agent's name and objective.In the snapshot below, you can see that we are deploying an agent for Fake News Detection. As a user, we just need to provide two inputs: Name and Goal. For example, in our case, we have provided Fake News Detection as the name and Build Classifier for detecting fake news articles as a goal.Image 1: AgentGPT pageOnce you click the deploy agent. It starts identifying the task and add all the task in the queue. After that one by one, it executes all the tasks.Image 2: Queue of tasksIn the below snapshot, you can see it has completed the 2 tasks and working on the third task(Extract Relevant features). In all the tasks, it has also provided the code samples to implement the task.Image 3: Code samplesOnce your goal is achieved, you can save the results by clicking on the save button in the top-right corner.You can also improve the performance by providing relevant examples, using the ReAct approach for improving the prompting, and upgrading the version from local to Pro version.You also set up AgentGPT on the local machine. For detailed instructions, you can follow this link.SummaryCurrently, AgentGPT is in the beta phase, and the developer community is actively working on its features and use cases. It is one of the most significant milestones in the era of advanced large-language models. Its ability to generate human-like responses opens up potential opportunities for industrial applications such as customer service, content generation, decision support systems, and personal assistance.Author BioAvinash Navlani has over 8 years of experience working in data science and AI. Currently, he is working as a senior data scientist, improving products and services for customers by using advanced analytics, deploying big data analytical tools, creating and maintaining models, and onboarding compelling new datasets. Previously, he was a university lecturer, where he trained and educated people in data science subjects such as Python for analytics, data mining, machine learning, database management, and NoSQL. Avinash has been involved in research activities in data science and has been a keynote speaker at many conferences in India.Link - LinkedIn    Python Data Analysis, Third edition                                            
Read more
  • 0
  • 0
  • 35626

article-image-how-open-source-language-models-could-reshape-the-tech-industry
Julian Melanson
30 Jun 2023
5 min read
Save for later

How Open-Source Language Models Could Reshape the Tech Industry

Julian Melanson
30 Jun 2023
5 min read
The world of technology, characterized by an incessant and rapid pace of evolution, is on the cusp of a seismic shift. Historically, the development and control of large language models—a key component in modern artificial intelligence systems—have been dominated by tech industry giants. However, emerging developments show that this might not be the status quo for much longer. The burgeoning field of open-source LLMs presents a potential disruption to the current balance of power in the tech industry, signaling a shift towards a more democratic and inclusive AI landscape.Major tech firms like Microsoft and Google, armed with vast financial resources, have long held the reins of the LLM market. Their position seemed unassailable as recent earnings calls indicated a thriving business built around their AI services. Yet, a leaked internal document from Google has cast a shadow of uncertainty over this seemingly secure stronghold. The central idea gleaned from this document? No company has an unassailable fortress against competition in the realm of LLMs, not even the mighty OpenAI, the organization responsible for the groundbreaking GPT-3.The story of GPT-3 is a pivotal chapter in the annals of AI history. Its 2020 release ignited a spark in the research community, illuminating the tantalizing promise of scale. With 175 billion parameters, GPT-3 showed capabilities that stretched beyond its initial training data. The success of this LLM prompted a surge of interest in the creation of larger, more complex models. This development led to an arms race among AI research labs, producing increasingly massive models such as Gopher, LaMDA, PaLM, and Megatron-Turing.However, this race towards larger LLMs engendered a substantial increase in research and development costs. The staggering financial demands associated with training and running models like GPT-3 created an environment where LLM innovation was essentially confined to the wealthiest entities in tech. With this economic pressure to recoup their considerable investment, these companies began to commercialize their technology, leading to the erection of protective "moats" around their products. These mechanisms of defensibility safeguarded their investments against the competition, obscuring their research and constraining the sharing of intellectual resources.Key elements of these moats included the proprietary control over training data, model weights, and the costs associated with training and inference. With their deep pockets, big tech companies kept the upper hand in managing the expenses tied to training and running large LLMs. This dominance rendered even open-source alternatives such as BLOOM and OPT175-B largely inaccessible to organizations without the fiscal means to support the hefty demands of these advanced models.The Coming of Open-Source Language ModelsFor a time, this state of affairs painted a bleak picture for the democratization of LLMs, with the field becoming increasingly exclusive and secretive. However, the ebb and flow of innovation and competition that define the tech industry were bound to respond. The open-source community rose to the challenge, their endeavors intensifying following the release of OpenAI's ChatGPT, an instruction-following language model that illustrated the vast potential of LLMs in a multitude of applications.These open-source alternatives are changing the game by proving that performance is not solely a function of scale. Small, nimble LLMs trained on expansive datasets have proven the ability to compete head-to-head with their larger counterparts. Moreover, the open-source models, often consisting of 7-13 billion parameters, can be fine-tuned to remarkable degrees on a modest budget and can run on consumer-grade GPUs.One such example, the open-source LLM developed by Meta, known as LLaMA, sparked a wave of similar models like Alpaca and Vicuna. These models, constructed on top of LLaMA, displayed an impressive capability for instruction-following akin to ChatGPT. The subsequent release of Dolly 2.0 by Databricks and Open Assistant further enriched the field by providing commercially usable, instruction-following LLMs that organizations can tailor to their specific needs.The impact of these open-source models is profound. They potentially democratize access to advanced AI systems, reducing the cost of training by using techniques like low-rank adaptation (LoRA) and allowing businesses to incorporate LLMs into their operations at an affordable price. This development poses a significant challenge to the established order, undermining the monopoly of tech giants on LLMs.Nonetheless, the rise of open-source models does not spell the end of cloud-based language models. Despite the democratization they promise, open-source LLMs face significant hurdles, including the prohibitive costs of pre-training. Furthermore, they may not be the best choice for all businesses. Companies without in-house machine learning expertise may still prefer the convenience of out-of-the-box, serverless solutions provided by the likes of Microsoft and Google. The entrenched distribution channels of these tech behemoths also present a formidable barrier for open-source LLMs to overcome.However, the broader implications of the open-source movement in LLMs are unmistakable. It expands the market, opens up novel applications, and puts pressure on tech giants to offer more competitive pricing. By democratizing access to advanced AI, it allows for broader participation in the AI revolution, reducing the concentration of power and innovation within a few wealthy tech companies. As the LLM landscape continues to evolve rapidly, the rise of open-source models will leave an indelible mark on the tech industry.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 12271
article-image-revolutionizing-business-productivity-using-cassidyai
Julian Melanson
28 Jun 2023
6 min read
Save for later

Revolutionizing Business Productivity using CassidyAI

Julian Melanson
28 Jun 2023
6 min read
In recent times, the entrepreneurial environment has seen a surge in innovation, particularly in the realm of artificial intelligence. Among the stalwarts of this revolution is Neo, a startup accelerator masterminded by Silicon Valley investor Ali Partovi. In a groundbreaking move in March, Neo entered a strategic partnership with renowned AI research organization OpenAI, and tech giant Microsoft Corp. The objective was clear: to offer no-cost software and expert advice to startups orienting their focus towards AI. This partnership's results are already tangible, with CassidyAI, a startup championed by content creator Justin Fineberg, being one of the companies benefiting from this initiative.CassidyAI: A Pioneer in AI-Driven Business AutomationFineberg recently announced that CassidyAI is stepping out from the shadows, shedding its stealth mode. CassidyAI's primary function is an embodiment of innovation: facilitating businesses to create customized AI assistants, thus automating tasks, optimizing productivity, and integrating AI across entire organizations. With this aim, CassidyAI is at the forefront of a paradigm shift in business process automation and management.Amplifying Productivity: CassidyAI's VisionAt its core, CassidyAI embraces an ambitious vision: to multiply the productivity of every team within an organization by a factor of ten. This tenfold increase isn't just a lofty goal; it is a transformative approach that involves deploying AI technology across business operations. CassidyAI accomplishes this by providing a platform for generating bespoke AI assistants that cater to individual departmental needs. This process involves training these virtual assistants using the specific knowledge base and data sets of each department.Harnessing AI Across Departments: Versatile Use-CasesThe potential applications of CassidyAI's platform are practically limitless. The diversity of use cases underscores the flexibility and versatility of the AI-assistant creation process. In marketing, for instance, teams can train CassidyAI on their unique writing style and marketing objectives, thereby crafting content that aligns perfectly with the brand image. Similarly, sales teams can enhance their outreach initiatives by leveraging CassidyAI's understanding of the sales pitch, process, and customer profiles.In customer service, AI assistants can respond to inquiries accurately and efficiently, with CassidyAI's ability to access comprehensive support knowledge. Engineering teams can train CassidyAI on their technical stack and engineering methods and architecture, enabling more informed technical decisions and codebase clarity. Product teams can use CassidyAI's profound understanding of their team dynamics and user experience principles to drive product ideation and roadmap collaboration. Finally, HR departments can provide employees with quick access to HR documentation through AI assistants trained to handle such inquiries.Data Security and Transparency: CassidyAI's AssuranceBeyond its vast application range, CassidyAI distinguishes itself through its commitment to data security and transparency. The platform's ability to import knowledge from various platforms ensures a deep understanding of a company's brand, operations, and unique selling propositions. Equally important, all interactions with CassidyAI remain reliable and secure due to their stringent data handling practices and clear citation of sources.Setting Up AI Automation: A No-Code ApproachCassidyAI's approach to implementing AI in businesses is straightforward and code-free, catering to those without programming skills. Businesses begin by securely uploading their internal data and knowledge to train CassidyAI on their unique products, strategies, processes, and more. They then construct AI assistants that are fine-tuned for their distinct use cases, without the need to write a single line of code. Once the AI assistants are ready, they can be shared across the team, fostering an atmosphere of AI adoption and collaboration throughout the organization.Interestingly, the onboarding process for each company joining CassidyAI is currently personally overseen by Fineberg. Although this may limit the pace of early access, it provides a personalized and detailed introduction to CassidyAI’s capabilities and potential. Companies interested in exploring CassidyAI's offerings can request a demo through their website.CassidyAI represents a revolutionary approach to adopting AI technology in businesses. By creating tailored AI assistants that cater to the specific needs of different departments, it offers an opportunity to substantially improve productivity and streamline operations. Its emergence from stealth mode signals a new era of AI-led business automation and provides an exciting glimpse into the future of work. It is anticipated that as CassidyAI gains traction, more businesses will leverage this innovative tool to their advantage, fundamentally transforming their approach to task automation and productivity enhancement.You can browse the website and request a demo here: https://www.cassidyai.comReal-World Use casesHere are some specific examples of how CassidyAI is being used by real businesses:Centrifuge: Centrifuge is using CassidyAI to originate real-world assets and to securitize them. This is helping Centrifuge to provide businesses with access to financing and to reduce risk.Tinlake: Tinlake is using CassidyAI to automate the process of issuing and managing loans backed by real-world assets. This is helping Tinlake to provide a more efficient and cost-effective lending solution for businesses.Invoice Finance: Invoice Finance is using CassidyAI to automate the process of processing invoices and to provide financing to businesses based on the value of their invoices. This is helping Invoice Finance to provide a more efficient and timely financing solution for businesses.Bondora: Bondora is using CassidyAI to assess the risk of loans and to provide investors with more information about the loans they are considering investing in. This is helping Bondora to provide a more transparent and efficient investment platform for investors.Upstart: Upstart is using CassidyAI to assess the creditworthiness of borrowers and to provide them with more personalized lending terms. This is helping Upstart to provide a more inclusive and affordable lending solution for borrowers.These are just a few examples of how CassidyAI is being used by real businesses to improve their operations and to provide better services to their customers. As CassidyAI continues to develop, it is likely that even more use cases will be discovered.SummaryCassidyAI, a startup in partnership with Neo, OpenAI, and Microsoft, is revolutionizing business productivity through AI-driven automation. Their platform enables businesses to create customized AI-assistants, optimizing productivity and integrating AI across departments. With a no-code approach, CassidyAI caters to various use-cases, including marketing, sales, customer service, engineering, product, and HR. The platform emphasizes data security and transparency while providing a personalized onboarding process. As CassidyAI emerges from stealth mode, it heralds a new era of AI-led business automation, offering businesses the opportunity to enhance productivity and streamline operations.Author BioJulian Melanson is one of the founders of Leap Year Learning. Leap Year Learning is a cutting-edge online school that specializes in teaching creative disciplines and integrating AI tools. We believe that creativity and AI are the keys to a successful future and our courses help equip students with the skills they need to succeed in a continuously evolving world. Our seasoned instructors bring real-world experience to the virtual classroom and our interactive lessons help students reinforce their learning with hands-on activities.No matter your background, from beginners to experts, hobbyists to professionals, Leap Year Learning is here to bring in the future of creativity, productivity, and learning!
Read more
  • 0
  • 0
  • 9462

article-image-practical-ai-in-excel-create-a-linear-regression-model
M.T White
28 Jun 2023
12 min read
Save for later

Practical AI in Excel: Create a Linear Regression Model

M.T White
28 Jun 2023
12 min read
AI is often associated with complex algorithms and advanced programming, but for basic linear regression models, Excel is a suitable tool. While Excel may not be commonly linked with AI, it can be an excellent option for building statistical machine-learning models. Excel offers similar modeling capabilities as other libraries, without requiring extensive setup or coding skills. It enables leveraging machine learning for predictive analytics without writing code. This article focuses on using Excel to build a linear regression model for predicting story points completed by a software development team based on hours worked.What is Linear Regression?Before a linear regression model can be built it is important to understand what linear regression is and what it's used for.  For many, their first true shake with linear regression will come in the form of a machine learning library or machine learning cloud service. In terms of modern machine learning, linear regression is a supervised machine learning algorithm that is used for predictive analytics.  In short, linear regression is a very common and easy-to-use machine learning model that is borrowed from the field of statistics.  This means, at its core, linear regression is a statistical analysis technique that models a relationship between two or more variables.  In the most rudimentary sense, linear regression boils down to the following equation,y = mx + bAs can be seen, the equation (that is the linear regression model) is little more than the equation for a line.  No matter the library or machine learning service that is used, in its purest form linear regression will boil down to the above equation.  In short, linear regression is used for predictive, numerical models.  In other words, linear regression produces models that attempt to predict a numerical value.  This could be the weight of a person in relation to their height, the value of a stock in relation to the Dow, or anything similar to those two applications.  As stated before, the model that will be produced for this article will be used to predict the number of story points for a given number of hours worked.Why should Excel be used?Due to the statistical nature of linear regression, Excel is a prime choice for creating linear regression models.  This is especially true if (among other things) one or more of the following conditions are met,The person creating the model does not have a strong computer science or machine learning background. The person needs to quickly produce a model.The data set is very small.If a person simply needs to create a forecasting model for their team, forecast stocks, customer traffic, or whatever it may be, Excel will oftentimes be a better choice than creating a traditional program or using complex machine learning software. With that being established, how would one go about creating a linear regression model?Installing the Necessary Add-insTo build a linear regression model the following will be needed,A working copy of Excel.Analysis ToolPak add-in for Excel.The Analysis ToolPak is the workhorse for this tutorial.  As such, if it is not installed follow the steps in the next section; however, if the add-in is already installed the following section can be skipped.Installing Data Analysis ToolPak1. Click,  File -> Option -> Add-insOnce done the following wizard should appear:Figure 1 – Options Wizard2. Locate Analysis ToolPak and select it.  Once that is done the following popup will appear.Figure 2 – Add-ins WizardFor this tutorial, all that is technically needed is the Analysis ToolPak but it is a good idea to install the VBA add-in as well. 3. Verify the installation by navigating to the Data tab and verifying that the Data Analysis tools are installed.  If everything is installed properly, the following should be visible.  Figure 3 – Data Analysis ToolOnce the Analysis ToolPak is installed a linear regression model can be generated with a few clicks of the mouse. Building a Linear Regression Model to Predict Story Points. Once all the add-ins are installed, create a workbook and copy in the following data:HoursStory Points161315121511134228281830191032114117129251924172315 Before the model can be built the independent and dependent variables must be chosen.  This is a fancy way of determining which column is going to be the input and which is going to be the output for the model.  In this case, the goal is to predict the number of story points for a given number of hours worked. As such, when the model is created the number of hours will be inputted to return the number of predicted story points. This means that the number of hours worked will be the independent variable which will be on the X-Axis of the graph and the number of story points will be the dependent variable which will be on the Y-Axis. As such, to generate the model perform the following steps,1. Navigate to the Data tab and click Data Analysis.  When complete the following popup should appear.Figure 4 – Regression Analysis  Scroll down and select Regression then press the OK button.2. Once step 1 is completed the following wizard should appear.Figure 5 – Regression Setup Input the data the same way it is presented in Figure 5.  Once done The data should be rendered as in Figure 6.Figure 6 – Linear Regression Output.At this point, the linear regression model has been produced.  To make a prediction all one has to do is multiply the number of hours worked by the Hours value in the Coefficient column and add the Intercept value in the Coefficient column to that product. However, it is advisable to generate a trendline and add the line’s equation and the R-Squared value to the chart to make things easier to see.  This can be remedied by simply deleting the predicted dots and adding a trendline like in Figure 7.Figure 7 – TrendlineThe trendline will show the best fit for the model.  In other words, the model will use the equation that governs the trendline to predict a value.  To generate the line’s equation click the arrow button by Trendline and click More Options.  When this is done a sidebar should appear similar to the one in Figure 8.Figure 8 – Format Trendline MenuFrom here select the R-square value checkbox and the Display Equation on chart checkbox. When this is done those values should be displayed on the graph like in Figure 9. Figure 9 – Regression Model with line equation and R-squared valueTo create a prediction, all one has to do is plug in the number of hours for x in the equation and the computed value will be an approximation for the number of story points for the hours worked. Interperting the ModelRegression StatisticsMultiple R0.862529R Square0.743956Adjusted R Square0.722619Standard Error2.805677Observations14Now that the model is generated, how good is it?  This question can be answered with the data that was produced in Figure 6.  However, a whole book could be dedicated to interpreting those outputs, so for this article, the data in the observation group which can be thought of as the high-level summary of the model will be explored.   Consider, the following data:Regression StatisticsMultiple R0.862529R Square0.743956Adjusted R Square0.722619Standard Error2.805677Observations14 The first value is Multiple R or as it is sometimes called the Correlation Coefficient.  This value can range from -1 to 0 or 0 to 1 depending on whether the correlation is negative or positive respectively.  The closer the coefficient is to either -1 or 1 the better. With that, what is the difference between a negative and positive correlation?  Whether a correlation is negative or positive depends on the graph’s orientation which in turn means whether the correlation coefficient is positive or negative.  If the graph is downward oriented the correlation is negative. For these models, the correlation coefficient will be less than 0.  On the other hand, if the graph is upward oriented like the graph produced by the model it is said to have a positive correlation which in turn means the coefficient will be greater than 0.  Consider Figure 10,Figure 10 – Negative and Positive Correlation Ultimately it doesn’t matter if the model has a positive or negative correlation.  All the correlation means is that as one value rises the other will either rise with it or fall.  In terms of the model produced, the Multiple R-value is .86.  All things considered that is a really good correlation coefficient. The next important value to look at is the R-Squared value or the Coefficient of Determination.  This value describes how well the model fits the data.  In other words, it determines how many data points fall on the line.  The R-Squred value will range from 0 to 1.  As such, the closer the value is to 1 the better the model will be.  Though a value as close to 1 is desirable it is naïve to assume that an R-Squared of 1 will ever be achievable.  However, a lower R-Squared value is not necessarily a bad thing.  Depending on what is being measured, what constitutes a “good” R-Squared value will vary.  In the case of this model, the R-Squared is about .74 which means about 74% of the data can be explained by the model.  Depending on the context of the application that can be considered good, but it should be remembered that at most the model is only predicting 74% of what makes up the number of completed story points. Adjusted R-Squred is simply a more precise view of the R-Squared value. In simple terms, the adjusted R-Squared value determines how much of a variation in the dependent variables can be explained by the independent variables. The Adjusted R for this model is .72 which is in line with the R-Squard value.Finally, the Standard Error is the last fitting metric.  In a very simplistic sense, this metric is a measure of precision for the model.  As such, the standard error for this model is about 2.8.  Much like other metrics what constitutes good is subjective.  However, the closer the value is to 0 the more concise the model is. Using the modelNow that the model has been created, what would someone do with it, that is how would they use it?  The answer is surprisingly simple.  The whole model is a line equation.  That line will give an approximation of a value based on the given input.  In the case of this model, a person would input the number of hours worked to try to predict the number of story points. As such, someone could simply input the number of hours in a calculator, add the equation to a spreadsheet, or do anything they want with it.  Put simply, this or any other linear regression model is used by inputting a value or values and crunching the numbers.  For example, the equation rendered was as follows:y = 0.6983x - 1.1457The spreadsheet could be modified to include the followingIn this case, the user would simply have to input the number of hours worked to get a predicted number of story points. The important thing to remember is that this model along with any other regression model is not gospel.  Much like in any other machine learning system, these values are simply estimates based on the data that was fed into it.  This means if a different data set or subset is used, the model can and probably will be different. ConclusionIn summary, a simple Excel spreadsheet was used to create a linear regression model.  The linear regression model that was utilized will probably be very similar to a model generated with dedicated machine learning software.  Does this mean that everyone should abandon their machine-learning software packages and libraries and solely use Excel?  The long and the short of it is no! Excel, much like a library like Scikit-learn or any other, is a tool.  However, for laypersons that don’t have a strong computer science background and need to produce a quick regression model, Excel is an excellent tool to do so. Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming
Read more
  • 0
  • 0
  • 66166
Modal Close icon
Modal Close icon